MPI Virtual Cluster Appliance
From Grid-Appliance Wiki
Contents |
Introduction
The purpose of this tutorial is to explain the process of installing mpich2 (a popular open source implementation of MPI - Message Passing Interface) and quickly bring up a virtual cluster to run MPI jobs. We also have a video tutorial explaining the below steps, in our youtube channel.
This version of the Grid appliance is tailored to educational applications, where it enables instructors and students to effortlessly create virtual clusters for individuals or small groups ready to run MPI jobs. Using the virtual cluster as a "sandbox" you can then follow MPI tutorials such as the CI-Tutor TeraGrid tutorial
Pre-requisites
1. Introduction to Grid-Appliance
2. Creating Grid-Appliance clusters
Installing the MPI package and configuring the Virtual Cluster
1. Download a Grid Appliance (v 2.04 or higher). (Alternatively, you can create your own Grid Appliance starting from a fresh Ubuntu 9.10 image by following the steps described in TestingGridAppliance).
2. Follow instructions to create your own GroupVPN and appliance pool (Deploying independent appliance pools - PlanetLab). (Alternatively, you may also use our public Grid appliance pool for testing purposes, but be aware that if you do so your nodes will not be secure and remote users will be able to ssh into them).
3. Install the mpi package for grid-appliance by using the commands:
% sudo bash # apt-get update # apt-get install grid-appliance-mpi
This will also create a new user called 'mpi' in your system.
4. A secret word is used to distinguish the nodes of your virtual cluster from other mpi nodes in the same virtual network. This secret word is saved in a file at /home/mpi/.mpd.conf . It is advised to change the secret word from the default value ("DefaultSecretWord"), using the user 'mpi'. While changing the .mpd.conf file, avoid blank spaces and blank lines. For example, to change the secret word to "MyNewSecretWord", use the below commands:
# su - mpi $ echo "MPD_SECRETWORD=MyNewSecretWord" > /home/mpi/.mpd.conf
5. Repeat the above steps for all the nodes you want in the virtual cluster.
Starting mpd ring
At this instance, you should have your Grid-Appliance nodes running and they should have the mpi package installed. All the nodes should have acquired their GroupVPN IP and be configured with the same secret word. To run MPI jobs, you need to bring up a ring of deamon process called mpd on all nodes. Run the start_mpd_ring.sh script using the 'mpi' user on a single node. You do not have to run it on all nodes. This script scans for other mpi appliances in the same GroupVPN and boots the mpd ring.
$ cd /opt/grid_appliance/scripts $ ./start_mpd_ring.sh
You are now ready to run MPI jobs.
To stop the mpd ring, runthe command:
$ mpdallstop
To add more nodes to the mpd ring, stop the ring and rerun the script start_mpd_ring.sh
Compiling and Running MPI programs
Some examples mpi codes from the tera-grid tutorial are included in /opt/grid_appliance/user/examples/mpi_teragrid/
In order to prepare your appliance to run these examples, you need to install the gcc compiler and the Grid appliance NFS packages:
$ sudo apt-get install gcc $ sudo apt-get install grid-appliance-nfs $ sudo apt-get install grid-appliance-autofs
To compile a mpi program use the command mpicc.
To run the mpi program, either copy the compiled executable to the working directory of all the nodes or run it from a shared location. If you have the auto-configured Network File System (NFS and autofs) packages of grid-appliance installed, then you can make directories/files at the submit node available (for reading and/or execution - it is a read-only file system) to other nodes in the virtual cluster. This is shown below:
$ cd /opt/grid_appliance/user/examples/mpi_teragrid/ $ mpicc -o /mnt/local/HelloWorld HelloWorld.c $ mpiexec -np <No_of_processes> /mnt/ganfs/CXXXYYYZZZ/HelloWorld
where CXXXYYYZZZ is the hostname of the node having the compiled executable.

