MPI Over Condor with Grid Appliance

From Grid-Appliance Wiki

Jump to: navigation, search

Contents

Introduction

In a typical setup of MPI over Condor (using Condor's parallel universe), each worker need to be configured with a dedicated scheduler. The scheduler's name for each worker must also be provided in the worker's configuration file. This approach creates an inflexible cluster that has to be manually setup by the user. This MPI over Condor example is an experiment on dynamic MPI cluster setup via Condor that does not require a dedicated scheduler.

Pre-release Testing

Please note that this example still hasn't been integrated into the main Grid Appliance package (as of version 2.04.20). If you want to try it out, please follow this link to download the source package. Then copy all the files under Grid-Appliance/user/examples/mpi into your local machine at /opt/grid-appliance/user/examples/mpi. Create a new directory if necessary.

Preparation

  • Install and deploy Grid Appliance with a working Condor pool
  • Install grid-appliance-nfs (only work with version 0.02 or newer) and grid-appliance-autofs package in all nodes. For each node run the following command under root privilege.
    apt-get install gcc
    apt-get install make
    apt-get install grid-appliance-nfs
    apt-get install grid-appliance-autofs
  • If you are using a 64 bit machine, you will also need to install gcc-multilib package if you want to submit mpi jobs to 32 bit machines.
    apt-get install gcc-multilib
  • At the client node, install MPI into local nfs directory (/mnt/local/mpich2) by using the provided script. (the script will automatically download mpich2 1.3.1 source package)
    cd /opt/grid_appliance/user/examples/mpi
    ./setup.sh -m32
    • "-m32" option is for 32 bit machine compatibility.
  • Once the installation is finished, all MPI executable files will be located inside the directory /mnt/local/mpich2/bin
  • You can now use the MPI C compiler (mpicc) to prepare a program for MPI execution.
    /mnt/local/mpich2/bin/mpicc -m32 -o <executable.filename> <source.filename.c>
  • Or you can test it with our included sample program HelloWorld.c by using
    /mnt/local/mpich2/bin/mpicc -m32 -o HelloWorld HelloWorld.c
    • The -m32 is used so the executable can be run on both 32 bit and 64 bit machines.

Submit MPI jobs

  • To submit a job, run the following commands
    cd /opt/grid_appliance/user/examples/mpi
    ./mpi_submit.py -n <no.of.proc> <executable.filename>
    • <no.of.proc> is the number of parallel copies to be scheduled and run. (This option is similar to -np option in mpich2)
    • <executable.filename> is the executable file that you compiled from the previous section.
  • Also similar to mpich2, the number of parallel copies always include the current node i.e. only n-1 jobs will be run on other nodes if n jobs are scheduled and the current node has only 1 slot.
  • The script will continue to wait and hold on to the resources until Condor can secure enough available workers to satisfy the <no.of.proc> requirement.
Personal tools