IOOS:Index

From Grid-Appliance Wiki

Jump to: navigation, search

Contents

IOOS Testbed Virtual Grid

This page will provide pointers and reference material on how to access and use the UF/UNC/NCSU/USF IOOS Testbed virtual Grid.

The Testbed Grid consists of a pool of virtual machines running the Linux operating system and the Condor batch job scheduler. Cluster resources at the University of Florida support the execution of models, while researchers at the testbed institutions are able to conveniently access the infrastructure from their own desktops by downloading pre-packaged system that we call "Grid appliance". Grid appliances run on virtual machine software that is freely available and simple to install in desktops running Windows, Linux or MacOS (with Intel processors).

Virtual Grid self-training

In order to get acquainted with the virtual Grid infrastructure, we have prepared a tutorial that can be followed by users in a self-paced manner, with questions/milestones to help you assess your progress. If you cannot successfully complete a step for any reason, or if you have questions that prevent you from continuing, please stop and send an email with your question to renato at acis dot ufl dot edu.

  1. Go through steps 1-3 of the Grid appliance quick start guide to learn about the basics of the Grid Appliance.
    1. Have you successfully installed the VM software?
    2. Have you successfully started up the Grid appliance?
    3. Did your machine connect to the Condor pool? (I.e. did condor_status return a list of nodes - if it did not, please wait for a few minutes and retry)
    4. Were you able to successfully run the demonstration MontePi job?
    5. Were you able to transfer files from/to your host to the appliance using drag-and-drop (and/or sftp)?
  2. Try to execute a sample "canned" model (CH3D) from your appliance. Click on the following link for step-by-step instructions: IOOS:CH3D sample
    1. Were you able to successfully run the demonstration WW3 job?
  3. Try to execute your own model within the appliance, locally. Copy binaries (statically-linked Linux executables are the simplest to work; dynamically-linked may require libraries to be installed) and input files with the mechanisms described earlier.
    1. Were you able to successfully run your model within the appliance?
  4. Try to execute your model remotely using Condor. Create a condor submit script based on the examples above; do not worry about linking with Condor libraries, instead use the "vanilla" universe which supports unmodified binaries. Make sure you specify input files to be transfered and any environment variables needed to run the job
    1. Were you able to successfully run your model remotely through Condor?
  5. (Optional) Go through the Condor compilation tutorial to learn how to link an application to Condor. This enables transparent checkpoint/restart functionality which is useful for increased reliability for long-running jobs; it is not required that your applications are condor-compiled to run in the testbed, but you may consider using this feature for long-running jobs.
    1. Were you able to condor-compile the test application?
    2. Were you able to successfully submit and view results from the test application?
  6. (Optional) Go through the Condor DAGman tutorial. This describes how you can schedule workflows in Condor; it is not required for the IOOS testbed, but it may help you to schedule a batch of model runs with dependences
    1. Were you able to compile the application?
    2. Were you able to successfully submit and view results from the DAGman application?

Installing IOOS Testbed Appliance

The following instructions are on how to deploy new resources to connect to the IOOS pool; you don't need to go through this material if you are just following the self-training tutorial.

We have created a virtual grid appliance for the IOOS project which runs on a private, independent resource pool. If you plan to run an IOOS appliance, please send an email request to ptony82@ufl.edu to obtain IOOS-specific floppy images.

Quick instructions for VMware desktop appliances

Click here for instructions on how to set up a desktop with VMware and the IOOS appliance.

General instructions for VMware and KVM

  1. Download the Grid Appliance for VMware from this link
  2. Extract the zipped file
  3. Send an email request to ptony82@ufl.edu to obtain IOOS-specific floppy images (all configurations for the virtual machines are stored on floppy images, so floppy images for the IOOS project have to be acquired before running the virtual appliance)
  4. Once you've obtained the zipped floppy images, extract the file.
  5. The file contains three floppy images (Server.img, Client.img, Worker.img)
  6. To create a server virtual machine (runs Condor Manager):
    1. Delete fdb.img, rename Server.img to fdb.img, move to grid-appliance folder
  7. To create a client virtual machine (able to submit and run jobs):
    1. Repeat the steps above, but use the Client.img file instead of the Server.img file
  8. To create a worker virtual machine (able to only run jobs):
    1. Repeat the steps above, but use the Worker.img file instead of the Server.img file
  9. Change how much memory (RAM) you want to allocate to your virtual machine (the default is 256 MB)
  10. For VMware:
    1. Start your node through the vmware console
  11. For KVM:
    1. You should be able to start up the appliance with the following command;
kvm -M pc -m 256 -smp 2 -hda ga-flat.vmdk -hdb opt.vmdk -hdc home.vmdk -fda fdb.img -net user,vlan=0 -net nic,vlan=0,model=pcnet -no-acpi -boot a

Verify your installation

You can verify that your installation is correct by run the command below

condor_status

You should see a list of machines running condor, if it fails, do the following commands:

 sudo bash
 dhclient tapipop
 condor_status

Once you are able to see this list, then you are ready to use your system.

Personal tools