Wednesday, September 10, 2003

AMBER7: XLF performance on powerbook G4, PowerMac G5 and Pentium4

I did a performance comparison of AMBER 7 main program sander compiled with GNU fortran compiler (g77 3.1) and IBM XLF v8.1beta on my powerbook G4-400 and from other machines including new G5. WARNING: the time measured may not be accuray and should be just for your information. (I don't have time to use longer time steps.)

Speed comparisons (the more the better)

g77 (G4-400) : xlf (G4-400) = 1 : 1.337
g77 (G4-400) : ifc (P4-1.7G) = 1 : 3.213
g77 (G4-400) : ifc (Opteron-1.4G) = 1 : 4.473
g77 (G4-400) : ifc (Xeon-2G) = 1 : 4.562
g77 (G4-400) : g77 (G5-2G) = 1 : 4.808
g77 (G4-400) : ifc (P4-2.4G) = 1 : 5.810
g77 (G4-400) : xlf (G5-2G) = 1 : 7.118

The binaries on G4-400 are not optimized for vecLib framework on BLAS and LAPACK, but it seems that sander uses very few BLAS and LAPACK functions. It doesn't make any difference in running time on G4 and G5 after replacing BLAS/LAPACK with vecLib. Advanced analysis of the code would be necessary if we want to improve it.

Hopefully Apple can have time to help us to optimize the sander code.

Technical Details

g77: MACHINE file (no altivec)
XLF: MACHINE file for g4 and MACHINE file for g5. We also need to setenv XLFRTEOPTS NAMELIST=OLD in order to run sander.
And, of course, the MACHINE file of Intel compiler optimized for Petium4/Opteron.

Molecular dynamics Input file: benchmd.in
g77_g4_0.4.out:       352.98 (G4-400)
xlf_g4_0.4.out:       245.16 (G4-400)
ifc_p4_1.7.out:       102.03 (P4-1.7G)
ifc_p4_2.4.out:       60.75 (P4-2.4G)
g77_g5.out:           73.42 (G5-2Gx2 on single cpu)
xlf_g5_2.out:         54.79 (G5-2Gx2 on single cpu)
xlf_g5_2_unroll.out:  49.59 (G5-2Gx2 on single cpu)
ifc_opteron_1.4.out:  78.91 (Opteron-1.4Gx2 on single cpu)
ifc_xeon_2.out:       77.37 (Xeon-2Gx2 on single cpu)
Speed is refered to be inverse of time in seconds.
Special thank to professor Eric Mockensturm at Penn State University for his help on G5 machine.