TestingGridAppliance

From Grid-Appliance Wiki

Jump to: navigation, search

This document describes how to create a Debian / Ubuntu Grid Appliance and the necessary steps to verify a Grid Appliance in general prior to releasing it for public use. The minimal system for verification is 3 machines, a client, worker, and a server. Large scale, long term tests are recommended on significant changesets.

Contents

Creating the Grid Appliance

Creating an Appliance Using Ubuntu 10.04

  1. Download the Ubuntu 10.04 Server ISO
  2. Prepare a system with at least a 2 GB HDD and 512 MB of RAM, 32 GB HDD and 2 GB of RAM preferred for job execution machines (workers)
  3. Upon boot, press F4 and select minimal virtual machine
  4. Install wget: apt-get install wget
  5. At this point, follow the steps outline below

Creating an Appliance Using EC2

  1. Prepare a floppy less than 16 KB compressed (note the grid-appliance.org does not generate floppy disks that meet this requirement), call it floppy.zip, contents:
    1. groupvpn.zip -- contains your group vpn information, remove bootstrap.config and trim node.config if you need to reduce space, download this for a clean, light-weight floppy
    2. group_appliances.config -- contains your group appliances information
    3. authorized_keys -- contains the authorized ssh keys for root admin access
  2. Execute the following command:
    ec2-run-instances ami-fd4aa494 -f floppy.zip --instance-type m1.large -k keypair
    1. AMIs for Canonical's official Ubuntu 10.04 (Lucid Lynx) x64 Server
    2. -f uploads the file floppy.zip to http://169.254.169.254/latest/user-data (a path relevant only to the VM)
    3. instance type must be large due to 64-bit instance
    4. -k specifies your key pair which will allow you to login
  3. Ubuntu 10.04 now comes packaged with Condor and removed libstdc++5, due to these two reasons we reuse the Condor supplied by Ubuntu
  4. At this point, follow the steps outline below
  5. If you use the -f floppy, you can skip the configuration mode

Creating an AMI

After you have installed the packages and got the appliance running, you may want to create an AMI for it. If that's the case, remember to run the following commands before you create the AMI.

/etc/init.d/grid_appliance.sh stop
/opt/grid_appliance/scripts/clean.sh
/opt/grid_appliance/etc/floppy.img

These commands clean up the configure files and make sure new instances start correctly.

Creating an Appliance Using a Working Environment

  1. If using x64, consider installing:
    1. ia32-libs - enables running of 32-bit applications
    2. libc6-dev-i386 - enables compiling of 32-bit applications (requires gcc be installed)
  2. Add the Grid Appliance Repository
echo "deb http://www.grid-appliance.org/files/packages/deb/  lucid contrib" >> /etc/apt/sources.list
wget http://www.grid-appliance.org/files/packages/deb/repo.key
apt-key add repo.key
apt-get update
  1. Selecting packages (grid-appliance-base is required, others are optional):
    1. Use apt-get install $packagename or aptitude to find packages
    2. grid-appliance-base: (we recommend restarting the appliance after installing this package):
      1. Condor -- Batch task management
      2. GroupVPN (IPOP) -- Virtual Networking Stack for decentralized, distributed LAN
      3. Base configuration scripts for performing all basic tasks
    3. grid-appliance-nfs: creates a read-only, public mount at /mnt/local
    4. grid-appliance-autofs: allows auto-mounting of remote nfs repositories at /mnt/ganfs/[hostname or ip]
    5. grid-appliance-ssh: makes it possible for admins to ssh into the machine using PKI only and LAN hosts (172.16/16 and 192.168/16) using password or PKI
    6. grid-appliance-public-pool: Adds the floppy image for the public pool to the appliance -- default configuration
    7. grid-appliance-samba (in development): allows users to access their home directory via Samba (Windows file sharing)
    8. grid-appliance-client: adds an X experience tailored for the Grid Appliance
    9. grid-appliance-cow: allows using UnionFS-Fuse for stackable file systems (i.e., copy-on-write -- cow)
  2. [optional] Add VM guest drivers
  3. Determine configuration mode
    1. Use grid-appliance-public-pool for connecting to the default public pool
    2. Package with an external floppy at floppy.img
    3. Package with an internal floppy at /opt/grid_appliance/etc/floppy.img
    4. Use EC2 with ec2-run-instances -f
  4. start grid_appliance -> /etc/init.d/grid_appliance.sh restart

Testing the Grid Appliance

  1. Create a Group Appliance and download floppies for client, worker, and server
  2. Start 3 VMs
  3. Verify IP communication amongst the 3
  4. Run condor_status at the client, ensure that the client and worker show up in the result (this can be combined with the previous step)
  5. Wait for worker to become unclaimed, idle - should take 10 to 20 minutes
  6. Submit some Monte Carlo PI jobs (examples/montepi) and verify they execute on the worker
  7. Verify the autofs is working, execute ls /mnt/ganfs/[hostname or ip]
  8. Verify ssh is working
    1. SSH into the clients eth1 with key or without password (should pass)
    2. SSH into clients eth0, tap with password (should fail)
    3. SSH into clients eth0, tap with key (should pass)
  9. Mount samba from host \\gridappliance\$username should mount to /home/$username, it should only work on LAN IP addresses (192\8)

Wide Area Debugging

  1. IP Allocations are of the form dhcp:$IPOP_NAMESPACE:$IP
    1. Address can be verified by using bget (bget.py dhcp:$IPOP_NAMESPACE:$IP)
    2. Addresses can be inserted by using bput (bput.py dhcp:$IPOP_NAMESPACE:$IP brunet:node:$ADDRESS)
  2. Python / XmlRpc makes the Swiss army knife for debuggin Brunet
    1. Setup the server rpc = xmlrpclib.Server("http://127.0.0.1:10000/xm.rem")
    2. Local calls rpc.localproxy("class.method", ["optional", "parameters"])
    3. Remote calls rpc.proxy("brunet:node:$ADDRESS", [5 | 3], 1, "class.method", ["optional", "parameters"])
      1. 5 - Exact routing
      2. 3 - Greedy routing
    4. Important RPC Methods
      1. Information.Info -- connection type count, neighbors, VPN info
      2. sys:link.GetNodeInfo -- node id and TAs used for creating connections
  3. Logging
    1. Always check logging first, many bugs will make themselves known through an "unhandled" exception
    2. Don't let symptoms fool you into thinking it is something it isn't, code that has worked, doesn't magically stop working, it must be new environmental features causing them to not act as expected!

Testing Simics on the appliance

In this test, the x86_tlb module is compiled and used with the provided fc5 checkpoint. The result of the test (the state of the tlb before and after the test) is logged to test_screen_dump and the 'diff' with the orig_screen_dump is printed. This diff should have only one line "Compiled x86_tlb module" (ignore warnings about search path not existing or SLAs in the diff).

(I assume that simics is installed in $SIMICS)

mkdir -p /home/griduser/test_workspace; $SIMICS/bin/workspace-setup /home/griduser/test_workspace
cd home/griduser/test_workspace
wget www.acis.ufl.edu/~girish/test.tar.gz; tar -xzf test.tar.gz; rm -rf test.tar.gz
mv x86_tlb modules; make x86_tlb
./run_simics_test.sh

To check if the binaries compiled on 32 bit Grid Appliance work on a 64 bit Grid Appliance (assuming they have the same glib version etc), copy the test_workspace/x86-linux contents to the test_workspace/amd64-linux (on the 64 bit appliance) and run run_simics_test.sh. Similarly, copy the contents of the amd64-linux/ to x86-linux/ on a 32 bit GA and run the test.

Personal tools