MPI Virtual Cluster Appliance

From Grid-Appliance Wiki

Jump to: navigation, search

Contents

Introduction

The purpose of this tutorial is to explain the process of installing mpich2 (a popular open source implementation of MPI - Message Passing Interface) and quickly bring up a virtual cluster to run MPI jobs.

This version of the Grid appliance is tailored to educational applications, where it enables instructors and students to effortlessly create virtual clusters for individuals or small groups ready to run MPI jobs. Using the virtual cluster as a "sandbox" you can then follow MPI tutorials such as the CI-Tutor TeraGrid tutorial

You can also view our online video tutorial on Youtube.

Pre-requisites

  1. Introduction to Grid-Appliance
  2. Creating Grid-Appliance clusters
  3. Grid Appliance On Science Clouds (for Science Cloud users)

Installing the MPI package and configuring the Virtual Cluster

  1. Download a Grid Appliance (v 2.04 or higher). Alternatively, you can create your own Grid Appliance starting from a fresh Ubuntu image by following the steps described in Testing Grid Appliance.
  2. Follow instructions to create your own GroupVPN and appliance pool (Deploying independent appliance pools - PlanetLab).
  3. Install the mpi package for grid-appliance by using the commands:
    sudo bash
    apt-get update
    apt-get install grid-appliance-mpi
  4. Change to run under mpi user by running the commands:
    su - mpi
  5. A secret word is used to distinguish the nodes of your virtual cluster from other mpi nodes in the same virtual network. This secret word is saved in a file located at /home/mpi/.mpd.conf . It is advised to change the secret word from the default value ("DefaultSecretWord"). You can edit the configuration file directly or using the command:
    echo "MPD_SECRETWORD=YourSecretWord" > /home/mpi/.mpd.conf
  6. Repeat step 3 to 5 for all the nodes you want in the virtual cluster.

Starting mpd ring

  • At this instance, you should have your Grid-Appliance nodes running and they should have the mpi package installed. All the nodes should have acquired their GroupVPN IP and be configured with the same secret word. To run MPI jobs, you need to bring up a ring of deamon process called mpd on all nodes.
  • Run the start_mpd_ring.sh script using the root user on one of the nodes. You do not have to run it on all nodes. This script scans for other mpi appliances in the same GroupVPN and boots the mpd ring. It will also replace the ssh keys used by the nodes in the virtual cluster with new ones, making it secure.
    /opt/grid-appliance/scripts/start_mpd_ring.sh
    • Note that this script configures the cluster for non-root users. Before running MPI jobs, you would have to create or configure users as mentioned below.
  • To list all the machines that connect to your mpd ring, run the command:
    mpdtrace
  • To stop the mpd ring, run the command:
    mpdallexit

Compiling and Running MPI programs

  • Some examples mpi codes from the tera-grid tutorial are included in /opt/mpi/teragrid_examples
  • In order to prepare your appliance to run these examples, you need to install the gcc compiler and the Grid appliance NFS packages:
    apt-get install gcc
    apt-get install grid-appliance-nfs
    apt-get install grid-appliance-autofs
  • To compile a mpi program use the command mpicc.
  • To run the mpi program, either copy the compiled executable to the working directory of all the nodes or run it from a shared location.
  • If you have the auto-configured Network File System (NFS and autofs) packages of grid-appliance installed, then you can make directories/files at the submit node available (for reading and/or execution - it is a read-only file system) to other nodes in the virtual cluster. This is shown below:
    su - mpiuser
    mpdtrace #Verify the nodes in the mpd ring
    cd /opt/grid_appliance/user/examples/mpi_teragrid/
    mpicc -o /mnt/local/HelloWorld HelloWorld.c
  • After compiling the program, confirm the availability of the executable (In this case,"HelloWorld") and run by using the commands:
    ls -lrt /mnt/ganfs/CXXXYYYZZZ/HelloWorld
    • The result should be
      total 12
      -rwxr-xr-x 1 mpi mpi 9996 2010-08-26 18:37 HelloWorld
    • Now you can run the excutable
      mpiexec -np <No_of_processes> /mnt/ganfs/CXXXYYYZZZ/HelloWorld
      • where CXXXYYYZZZ is the hostname of the node having the compiled executable.


Adding nodes to an existing virtual cluster

Configure the new nodes with the same secret word as in the existing nodes. Stop the mpd ring and run the start_mpd_ring.sh in one of the pre-existing nodes.

Personal tools