Sunday, May 07, 2006

Building AMBER 9 Parallel Ports with Pre-built MPI Libraries with XLF90

For years my impression was that prebuilt binaries for something like MPI library only bring me troubles, therefore I normally recommend compiling MPI library by ourselves. Take this prebuilt mpich package from hpc.sourceforge.net for example:
Downloading MPICH

This doesn't work with AMBER 9. Several reasons that this is not working with env MPI_HOME=/usr/local/mpich ./configure -mpich xlf90_macosx are:

  1. the library function names have double-underscore tail xlf90 can not produce more than one underscore.
  2. /usr/local/mpich/bin/mpif90 -show is not working, the configure script is not happy. (i.e. This mpich has to be configured with f90 option.)

When it goes to LAM-MPI, you might think that the official build from lam-mpi homepage should be working:
Downloading LAM-MPI
Unfortunately this doesn't work with AMBER 9, either. The reason is similar, no FORTRAN options were compiled into the package. (see the output of /usr/local/bin/mpif77 -showme)

Then I stumbled across another build from York University:
Downloading LAM-MPI-xlf90
This seems to be working. These steps are required:

  1. mkdir -p /Applications/Darwin; tar zxvf lam-7.1-xlf.tar.gz -C /Application/Darwin/ (Not everyone can write to /Applications/, so it's not very ideal for ordinary endusers of a Macintosh cluster.)
  2. If you got this error when doing make parallel:
    /usr/bin/ld: table of contents for archive: /Applications/Darwin/lam-7.1-xlf/lib/liblammpio.a is out of date; rerun ranlib(1) (can't load from it)
    /usr/bin/ld: table of contents for archive: /Applications/Darwin/lam-7.1-xlf/lib/liblamf77mpi.a is out of date; rerun ranlib(1) (can't load from it)
    /usr/bin/ld: table of contents for archive: /Applications/Darwin/lam-7.1-xlf/lib/libmpi.a is out of date; rerun ranlib(1) (can't load from it)
    /usr/bin/ld: table of contents for archive: /Applications/Darwin/lam-7.1-xlf/lib/liblam.a is out of date; rerun ranlib(1) (can't load from it)
    make[1]: *** [sander.MPI] Error 1
    make: *** [parallel] Error 2

    Don't panic, just do "ranlib /Applications/Darwin/lam-7.1-xlf/lib/*.a".
  3. Please remember to put /Applications/Darwin/lam-7.1-xlf/bin in the path or lamboot won't start.

Some friends might find that the AMBER 9 compilation fails and gives some errors like this:
xlf90_r -qfree=f90 -o sander.MPI [skipped a lot of *.o obj] ../lmod/lmod.a ../lapack/lapack.a ../blas/blas.a ../lib/nxtsec.o ../lib/sys.a -L/usr/local/mpich/lib -lpmpich -lmpich -lpmpich -lmpich -Wl,-framework -Wl,Accelerate
/usr/bin/ld: Undefined symbols:
_mpi_bcast
_mpi_irecv
**********
***OMIT***
**********
_mpi_waitany
_mpi_waitall
_mpi_type_free
make[1]: *** [sander.MPI] Error 1
make: *** [parallel] Error 2

Here we need to check first of all if the library we are linking (red colored in the error log above.) exists. But in case you don't even have those red colored argument, your pre-built MPI library is not working. If the library files indeed exist, use "nm" command to check the names of the functions. For example:

nm /usr/local/mpich/lib/libmpich.a | grep mpi_type_free
If the result is 00000000 T _mpi_type_free_, then you can add some argument to tell xlf90 you need to add a trailing underscore to the function name you want to.
make -e AMBERBUILDFLAGS="-qextname=mpi_bcast:mpi_irecv:...skipped..."
but for 00000000 T _mpi_type_free__ (two trailing underscores), please just give up and build your own library instead.