MPI Over Condor with Grid Appliance
From Grid-Appliance Wiki
Contents |
Introduction
In a typical setup of MPI over Condor (using Condor's parallel universe), each worker need to be configured with a dedicated scheduler. The scheduler's name for each worker must also be provided in the worker's configuration file. This approach creates an inflexible cluster that has to be manually setup by the user. This MPI over Condor example is an experiment on dynamic MPI cluster setup via Condor that does not require a dedicated scheduler.
Pre-release Testing
Please note that this example still hasn't been integrated into the main Grid Appliance package (as of version 2.04.20). If you want to try it out, please follow this link to download the source package. Then copy all the files under Grid-Appliance/user/examples/mpi into your local machine at /opt/grid-appliance/user/examples/mpi. Create a new directory if necessary.
Preparation
- Install and deploy Grid Appliance with a working Condor pool
- Install
grid-appliance-nfs(only work with version 0.02 or newer) andgrid-appliance-autofspackage in all nodes. For each node run the following command under root privilege.apt-get install gccapt-get install makeapt-get install grid-appliance-nfsapt-get install grid-appliance-autofs
- If you are using a 64 bit machine, you will also need to install
gcc-multilibpackage if you want to submit mpi jobs to 32 bit machines.apt-get install gcc-multilib
- At the client node, install MPI into local nfs directory (
/mnt/local/mpich2) by using the provided script. (the script will automatically download mpich2 1.3.1 source package)cd /opt/grid_appliance/user/examples/mpi./setup.sh -m32
- "
-m32" option is for 32 bit machine compatibility.
- Once the installation is finished, all MPI executable files will be located inside the directory
/mnt/local/mpich2/bin - You can now use the MPI C compiler (
mpicc) to prepare a program for MPI execution./mnt/local/mpich2/bin/mpicc -m32 -o <executable.filename> <source.filename.c>
- Or you can test it with our included sample program
HelloWorld.cby using/mnt/local/mpich2/bin/mpicc -m32 -o HelloWorld HelloWorld.c
- The
-m32is used so the executable can be run on both 32 bit and 64 bit machines.
Submit MPI jobs
- To submit a job, run the following commands
cd /opt/grid_appliance/user/examples/mpi./mpi_submit.py -n <no.of.proc> <executable.filename>
<no.of.proc>is the number of parallel copies to be scheduled and run. (This option is similar to-npoption in mpich2)<executable.filename>is the executable file that you compiled from the previous section.
- Also similar to mpich2, the number of parallel copies always include the current node i.e. only
n-1jobs will be run on other nodes ifnjobs are scheduled and the current node has only 1 slot. - The script will continue to wait and hold on to the resources until Condor can secure enough available workers to satisfy the
<no.of.proc>requirement.

