Testing Grid Appliance

From Grid-Appliance Wiki

(Redirected from TestingGridAppliance)
Jump to: navigation, search

This document describes how to create a Debian/Ubuntu Grid Appliance, as well as verify a Grid Appliance in general prior to releasing it for public use. The minimal system for verification is three machines, a client, worker, and a server. Large scale, long term tests are recommended on significant changesets.

Contents

Creating the Grid Appliance

Creating an Appliance Using Ubuntu 10.04 / Debian Squeeze

  1. Download the Ubuntu 10.04 Server ISO.
  2. Prepare a system with at least a 2 GB HDD and 512 MB of RAM. For job execution machines (workers), a system with at least 32 GB HDD and 2 GB of RAM is preferred.
  3. Upon boot, press F4 and select "minimal virtual machine".
  4. Install wget: apt-get install wget
  5. Follow the steps outlined below.

Creating an Appliance Using EC2

  1. Obtain a GroupAppliances floppy image
  2. Optionally modify authorized keys inside the floppy image to contain a SSH key for root admin access.
  3. Execute the following command: ec2-run-instances ami-fd4aa494 -f floppy.zip --instance-type m1.large -k keypair
  4. Notes:
    1. To access AMIs for Canonical's official Ubuntu 10.04 (Lucid Lynx) x64 Server, see the Ubuntu 10.04 LTS page.
    2. -f uploads floppy.zip to http://169.254.169.254/latest/user-data (a path accessible only to the instance).
    3. The instance type must be large due to a 64-bit instance.
    4. -k specifies your key pair, which will allow you to log in.
  5. While Ubuntu 10.04 comes with Condor, it is old and has had many features removed, therefore we use the Condor group's Debian repository
  6. At this point, follow the steps outline below
  7. If you use the -f floppy, you can skip the configuration mode

Creating an Appliance Using a Working Environment

  1. If using x64, consider installing:
    1. ia32-libs - enables running of 32-bit applications
    2. libc6-dev-i386 - enables compiling of 32-bit applications (requires gcc be installed)
  2. Add the Grid Appliance Repository
    echo "deb http://www.grid-appliance.org/files/packages/deb/ stable contrib" >> /etc/apt/sources.list
    echo "deb http://www.cs.wisc.edu/condor/debian/stable/ lenny contrib" >> /etc/apt/sources.list
    wget http://www.grid-appliance.org/files/packages/deb/repo.key
    apt-key add repo.key
    apt-get update
  3. Selecting packages (grid-appliance-base is required, others are optional):
    1. Use apt-get install $packagename or aptitude to find packages
    2. grid-appliance-base: (we recommend restarting the appliance after installing this package):
      1. Condor -- Batch task management
      2. GroupVPN (IPOP) -- Virtual Networking Stack for decentralized, distributed LAN
      3. Base configuration scripts for performing all basic tasks
    3. grid-appliance-nfs: creates a read-only, public mount at /mnt/local
    4. grid-appliance-autofs: allows auto-mounting of remote nfs repositories at /mnt/ganfs/[hostname or ip]
    5. grid-appliance-ssh: makes it possible for admins to ssh into the machine using PKI only and LAN hosts (172.16/16 and 192.168/16) using password or PKI
    6. grid-appliance-public-pool: Adds the floppy image for the public pool to the appliance -- default configuration
    7. grid-appliance-samba (in development): allows users to access their home directory via Samba (Windows file sharing)
    8. grid-appliance-client: adds an X experience tailored for the Grid Appliance
  4. [optional] Add VM guest drivers
  5. Determine configuration mode
    1. Use grid-appliance-public-pool for connecting to the default public pool
    2. Package with an external floppy at floppy.img
    3. Package with an internal floppy at /opt/grid_appliance/etc/floppy.img
    4. Use EC2 with ec2-run-instances -f
  6. start grid-appliance using the command:
    /etc/init.d/grid_appliance.sh restart

Cloud specific issues

Nimbus

Nimbus clouds typically use an old kernel and do not support pygrub or kernel selection. Ubuntu doesn't work on these kernels due to incompatibilities with udev and those kernels. Ubuntu has built features upon these incompatible versions of udev and therefore Ubuntu does not work on the Nimbus clouds we have encountered.

Debian Squeeze does run on Nimbus; however, AutoFS 5 bundled with Debian does not. We have compiled a version of AutoFS 4 that does and provide it at the following apt repository: deb http://www.grid-appliance.org/files/packages/deb/ stable contrib

Packaging for an Reusable Instance (AMI / EMI)

To clean an existing instance to create a new machine image (MI), do the following:

  1. Installed the Grid Appliance packages and any that you may want
  2. Turn off any Grid Appliance applications: /etc/init.d/grid_appliance.sh stop
  3. Optional, disable removing of SSH keys from /opt/grid_appliance/scripts/clean.sh or you won't be able to log into this instance
  4. Optional, remove the GroupAppliance floppy from your instance: rm /opt/grid_appliance/etc/floppy.img
  5. Rebundle your instance per your Clouds instructions

Testing the Grid Appliance

  1. Create a Group Appliance and download floppies for client, worker, and server
  2. Start 3 VMs
  3. Verify IP communication amongst the 3
  4. Run condor_status at the client, verify that both the client and worker appear (this can be combined with the previous step)
  5. Wait for worker to become unclaimed, idle - should take 10 to 20 minutes
  6. Submit a Monte Carlo PI job (examples/montepi) and verify they execute on the worker
  7. Verify the autofs is working, execute ls /mnt/ganfs/[hostname or ip]
  8. Verify ssh is working
    1. SSH into the clients eth1 with key or without password (should pass)
    2. SSH into clients eth0, tap with password (should fail)
    3. SSH into clients eth0, tap with key (should pass)
  9. Mount samba from host \\gridappliance\$username should mount to /home/$username, it should only work on LAN IP addresses (192\8)

Wide Area Debugging

  1. IP Allocations are of the form dhcp:$IPOP_NAMESPACE:$IP
    1. Address can be verified by using bget (bget.py dhcp:$IPOP_NAMESPACE:$IP)
    2. Addresses can be inserted by using bput (bput.py dhcp:$IPOP_NAMESPACE:$IP brunet:node:$ADDRESS)
  2. Python / XmlRpc makes the Swiss army knife for debugging Brunet
    1. Setup the server rpc = xmlrpclib.Server("http://127.0.0.1:10000/xm.rem")
    2. Local calls rpc.localproxy("class.method", ["optional", "parameters"])
    3. Remote calls rpc.proxy("brunet:node:$ADDRESS", [5 | 3], 1, "class.method", ["optional", "parameters"])
      • 5 - Exact routing
      • 3 - Greedy routing
    4. Important RPC Methods
      • Information.Info -- connection type count, neighbors, VPN info
      • sys:link.GetNodeInfo -- node id and TAs used for creating connections
  3. Logging
    1. Always check logging first, many bugs will make themselves known through an "unhandled" exception
    2. Don't let symptoms fool you into thinking it is something it isn't, code that has worked, doesn't magically stop working, it must be new environmental features causing them to not act as expected!

Testing Simics on the appliance

In this test, the x86_tlb module is compiled and used with the provided fc5 checkpoint. The result of the test (the state of the tlb before and after the test) is logged to test_screen_dump and the 'diff' with the orig_screen_dump is printed. This diff should have only one line "Compiled x86_tlb module" (ignore warnings about search path not existing or SLAs in the diff).

(I assume that simics is installed in $SIMICS)

mkdir -p /home/griduser/test_workspace; $SIMICS/bin/workspace-setup /home/griduser/test_workspace
cd home/griduser/test_workspace
wget www.acis.ufl.edu/~girish/test.tar.gz; tar -xzf test.tar.gz; rm -rf test.tar.gz
mv x86_tlb modules; make x86_tlb
./run_simics_test.sh

To check if the binaries compiled on 32 bit Grid Appliance work on a 64 bit Grid Appliance (assuming they have the same glib version etc), copy the test_workspace/x86-linux contents to the test_workspace/amd64-linux (on the 64 bit appliance) and run run_simics_test.sh. Similarly, copy the contents of the amd64-linux/ to x86-linux/ on a 32 bit GA and run the test.

Personal tools