EEL6892 Class Project 2

From Grid-Appliance Wiki

Jump to: navigation, search

Contents

Condor tips

Configuration needed to set up a Condor job - example cache simulation

You need the following files to run a Simics batch job using Condor in the Grid Appliance:

  • A wrapper script. This script is the first thing that is executed when a job submitted through Condor finds a destination host to run. There are many ways to create such a wrapper script, and this is dependent upon the application you are running. The following is a template using Bash that works with Simics (simics_wrapper.sh):
#!/bin/sh

mkdir new-workspace   
cd new-workspace
tgt_wrk_spc=`pwd`                 # create a workspace directory and store path in tgt_wrk_spc
cd /opt/virtutech/simics-3.0.31/bin
./workspace-setup $tgt_wrk_spc    # setup workspace
cd $tgt_wrk_spc
cp ../setup-gcache.simics .       # copy Simics configuration files into workspace
cp ../batch_script.simics .
                                  # Run Simics; -no-win and -batch-mode are required
                                  # for batch execution, and -stall for cache simulation
                                  # Follow commands from batch_script.simics and write standard out to screen_dump.out
./simics /mnt/ganfs/C124004016/checkpoint/xen-dom0-domU -no-win -batch-mode -stall batch_script.simics > ../screen_dump.out
  • Simics configuration file. This is provided to Simics to drive what commands and simulation models Simics should execute on the remote machine. Here is a sequence of commands for the simulation of Nbench for 2 billion cycles using the cache model setup by the setup-gcache.simics file, and print out cache statistics at the end (batch_script.simics):
run-command-file setup-gcache.simics
con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT\n"
c 2_000_000_000
cpu0_l1_uc.statistics
quit
  • A condor submit file. This file tells Condor which files to transfer before and after running a job, which binary to execute, and requirements for the machines that will run the job. Here is a template Condor submit file for this exercise (condor_script):
# specify the executable and job type (Simics is always vanilla in Condor)
universe = vanilla
executable = simics_wrapper.sh

# specify requirements for resource running job. In this case, resource must
# have memory > 512MB and Simics installed
requirements = HasArcherSimics == TRUE && Memory >= 512

# Provide names for the output files. $(Cluster) and $(Process) are substituted
# by unique Condor IDs
Log = simics.$(Cluster).$(Process).log
Error = simics.$(Cluster).$(Process).err
Output = simics.$(Cluster).$(Process).out

# Tell Condor which files to transfer: send the Simics input files, retrieve output
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = setup-gcache.simics, batch_script.simics
transfer_output_files = screen_dump.out

# Submit job
queue

Submitting a sample job

  • In the appliance, create a folder named "condor1" and copy the files from /mnt/ganfs/C124004016/homework3/templates/ into this folder
mkdir condor1
cd condor1
tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz
  • To submit the job:
condor_submit condor_script
  • To track the progress of the job:
condor_q
  • It will take around 10-20 minutes for this job to finish (if there are resources immediately available). When the job finishes executing, you should see the screen_dump.out file with the outputs of this simulation.

Submitting multiple jobs

The instructions above show how to run a single job. In several realistic scenarios, however, you need to run many simulations. This is where Condor is particularly useful, as it ensures that your jobs are queued for execution if there are no resources available, deal with failures by retrying, among other features. But one needs to be careful when preparing multiple jobs to avoid files being overwritten and to keep appropriate bookkeeping.

Given the way the Condor and Simics configuration files are setup, one approach here is to create multiple directories, one per configuration, and submit your jobs from these subdirectories. For example, to set up two different configurations where nbench runs on dom0 and domU:

mkdir nbench-dom0 nbench-domU
cp /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz nbench-dom0
cp /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz nbench-domU
cd nbench-dom0
tar -xzf hw3_cache.tgz
cd ../nbench-domU
tar -xzf hw3_cache.tgz 
# edit nbench-dom0/batch_script.simics to set up con0.input appropriately
# edit nbench-domU/batch_script.simics to set up con0.input appropriately
cd ../nbench-dom0
condor_submit condor_script
cd ../nbench-domU
condor_submit condor_script

The output files will be stored in the corresponding subdirectories at the end of the job execution.

Changing the cache size

The architecture of the cache that is being simulated can be changed by modifying the setup-gcache.simics file. This file is shown below, annotated with explanation of the different parameters

#########################################################
# To set up 1 level of unified cache

@from configuration import *

$tsc = cpu0->ia32_time_stamp_counter
@SIM_set_configuration([
OBJECT("cpu0_l1_uc", "g-cache",
       cpus = OBJ("cpu0"),                - This specfies the cpu to which this cache is connected
       config_line_number = 512,          - The number of entries in the cache
       config_line_size = 64,             - The size of each cache line in bits
       config_assoc = 2,                  - Associativity of the cache
       config_virtual_index = 0,          - Specifies whether the cache is indexed and using virtual address or physical address.
       config_virtual_tag = 0,            - Specifies whether the cache is indexed using virtual address or physical address.
       config_replacement_policy = "lru", - Cache replacement policy
       penalty_read = 0,                  - Read/write latencies
       penalty_write = 0,
       penalty_read_next = 0,             - Latencies for read/write communication with the next unit in the memory hierarchy
       penalty_write_next = 0)])

# plug the hierarchy
@conf.cpu0_mem.timing_model = conf.cpu0_l1_uc

# Send instruction fetches to the cache - NOTE MUST START IN STALL MODE
cpu0.ifm "instruction-fetch-trace"
cpu0->ia32_time_stamp_counter = $tsc
#########################################################

By changing the line number, the size of the cache can be varied.

TLB simulations

Simulations for different TLB configurations require additional files to be sent along with the job, as well as different checkpoint images. The following files provide templates for the execution of TLB simulations of 20 billion instructions for openAFS, with TLBs configured with 64 and 256 entries, respectively:

mkdir openafs-64
cd openafs-64
tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_tlb64.tgz
# edit batch_script.simics to set up con0.input appropriately
condor_submit condor_script
cd ..
mkdir openafs-256
cd openafs-256
tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_tlb256.tgz
# edit batch_script.simics to set up con0.input appropriately
condor_submit condor_script

Summary: con0 input strings for running benchmarks

nBench

  • In dom0:
con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT\n"
  • In domU:
con0.input "ssh 10.10.0.14 'cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT'\n"
  • In both dom0 and domU:
con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT &\n"
con0.input "ssh 10.10.0.14 'cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT'\n"

OpenAFS make

  • In dom0:
con0.input "cd /root/openafs-1.4.7; make\n"
  • In domU:
con0.input "ssh 10.10.0.14 'cd /root/openafs-1.4.7; make'\n"
  • In both dom0 and domU:
con0.input "cd /root/openafs-1.4.7; make &\n"
con0.input "ssh 10.10.0.14 'cd /root/openafs-1.4.7; make'\n"
Personal tools