Wednesday, May 10, 2006

Running Parallelly

    In order to use MPI programs, you need to make SSH works for you:

  • You need to connect to the computing nodes from job schedular server without entering password. So check out this great stuff that Warner wrote at his HPC homepage.
  • You also need to check out if your ssh is happy with the ssh fingerprint of every computing nodes. You can check this by manually ssh connecting every node you might use later for computation.

    Using mpirun

  • lam-mpi
    1. Make sure that lamboot/lamhalt is in your path
    2. Manage machine file
      # From $LAM_HOME/lam-7.1-xlf/etc/lam-bhost.def
      #farq046c cpu=1
      #murasaki cpu=2
      localhost
      # It's somehow okay to be lazy about cpu count, but it's
      # highlyrecommended to add cpu={cpucount} when having 
      # multiple nodes defined
      FYI: the host definition can be added to the /etc/hosts file or $LAM_HOME/etc/lam-hostmap.txt file.
    3. % lamboot -v machinefile
    4. SYSV (shared memory)
      % mpirun -ssi rpi sysv -np 2 {command}
    5. Overthenetwork
      % mpirun -ssi rpi tcp -np {# of cpus to run} {command}
      For manual test:
      % cd $AMBERHOME/test/4096wat
      % env TESTsander=../../exe/sander.MPI DO_PARALLEL="mpirun -ssi rpi sysv -np 2" ./Run.pure_wat

      or
      % cd $AMBERHOME/test
      % env DO_PARALLEL="mpirun -ssi rpi sysv -np 2" make test.parallel
    6. Please remember to clean your mess with lamhalt
  • mpich
    1. Still, make sure mpirun is in your path settings on the nodes. (just in case)
    2. Write a machinefile defining available resources:
      # hostname:{# of processors you want to run for this host}
      # like:
      # 192.168.0.1:2
      # or just use multiple hostname like:
      192.168.0.1
      192.168.0.1
      These hostnames also imply the routing to the nodes in the case of running across the network...
    3. shmem (shared memory) example:
      % cd $AMBERHOME/test % env DO_PARALLEL="/opt/mpich/bin/mpirun -arch smp -machinefile /tmp/machinefile -np 2" make test.sander.BASIC.MPI
    4. Overthenetwork: normally you don't need to specify the arch argument, just to make sure that machinefile is correct.
    5. Pass some options to mpirun for optimal socket efficiency, like adding 'env P4_SOCKBUFSIZE=524288' in front of the mpirun argument like this:
      % cd $AMBERHOME/test
      % env P4_SOCKBUFSIZE=524288 DO_PARALLEL="/opt/mpich/bin/mpirun -machinefile /tmp/machinefile -np 2" make test.sander.BASIC.MPI
    6. use /opt/mpich/sbin/cleanipcs if your mpich program crashes in the middle.
While there may be a possibility using xgrid but since I am not familiar with the xgrid, I am not going to talk about it.