EEL6892 Class Project 2
From Grid-Appliance Wiki
Contents |
Condor tips
Configuration needed to set up a Condor job - example cache simulation
You need the following files to run a Simics batch job using Condor in the Grid Appliance:
- A wrapper script. This script is the first thing that is executed when a job submitted through Condor finds a destination host to run. There are many ways to create such a wrapper script, and this is dependent upon the application you are running. The following is a template using Bash that works with Simics (simics_wrapper.sh):
#!/bin/sh
mkdir new-workspace
cd new-workspace
tgt_wrk_spc=`pwd` # create a workspace directory and store path in tgt_wrk_spc
cd /opt/virtutech/simics-3.0.31/bin
./workspace-setup $tgt_wrk_spc # setup workspace
cd $tgt_wrk_spc
cp ../setup-gcache.simics . # copy Simics configuration files into workspace
cp ../batch_script.simics .
# Run Simics; -no-win and -batch-mode are required
# for batch execution, and -stall for cache simulation
# Follow commands from batch_script.simics and write standard out to screen_dump.out
./simics /mnt/ganfs/C124004016/checkpoint/xen-dom0-domU -no-win -batch-mode -stall batch_script.simics > ../screen_dump.out
- Simics configuration file. This is provided to Simics to drive what commands and simulation models Simics should execute on the remote machine. Here is a sequence of commands for the simulation of Nbench for 2 billion cycles using the cache model setup by the setup-gcache.simics file, and print out cache statistics at the end (batch_script.simics):
run-command-file setup-gcache.simics con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT\n" c 2_000_000_000 cpu0_l1_uc.statistics quit
- A condor submit file. This file tells Condor which files to transfer before and after running a job, which binary to execute, and requirements for the machines that will run the job. Here is a template Condor submit file for this exercise (condor_script):
# specify the executable and job type (Simics is always vanilla in Condor) universe = vanilla executable = simics_wrapper.sh # specify requirements for resource running job. In this case, resource must # have memory > 512MB and Simics installed requirements = HasArcherSimics == TRUE && Memory >= 512 # Provide names for the output files. $(Cluster) and $(Process) are substituted # by unique Condor IDs Log = simics.$(Cluster).$(Process).log Error = simics.$(Cluster).$(Process).err Output = simics.$(Cluster).$(Process).out # Tell Condor which files to transfer: send the Simics input files, retrieve output should_transfer_files = yes when_to_transfer_output = ON_EXIT transfer_input_files = setup-gcache.simics, batch_script.simics transfer_output_files = screen_dump.out # Submit job queue
Submitting a sample job
- In the appliance, create a folder named "condor1" and copy the files from /mnt/ganfs/C124004016/homework3/templates/ into this folder
mkdir condor1 cd condor1 tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz
- To submit the job:
condor_submit condor_script
- To track the progress of the job:
condor_q
- It will take around 10-20 minutes for this job to finish (if there are resources immediately available). When the job finishes executing, you should see the screen_dump.out file with the outputs of this simulation.
Submitting multiple jobs
The instructions above show how to run a single job. In several realistic scenarios, however, you need to run many simulations. This is where Condor is particularly useful, as it ensures that your jobs are queued for execution if there are no resources available, deal with failures by retrying, among other features. But one needs to be careful when preparing multiple jobs to avoid files being overwritten and to keep appropriate bookkeeping.
Given the way the Condor and Simics configuration files are setup, one approach here is to create multiple directories, one per configuration, and submit your jobs from these subdirectories. For example, to set up two different configurations where nbench runs on dom0 and domU:
mkdir nbench-dom0 nbench-domU cp /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz nbench-dom0 cp /mnt/ganfs/C124004016/homework3/templates/hw3_cache.tgz nbench-domU cd nbench-dom0 tar -xzf hw3_cache.tgz cd ../nbench-domU tar -xzf hw3_cache.tgz # edit nbench-dom0/batch_script.simics to set up con0.input appropriately # edit nbench-domU/batch_script.simics to set up con0.input appropriately cd ../nbench-dom0 condor_submit condor_script cd ../nbench-domU condor_submit condor_script
The output files will be stored in the corresponding subdirectories at the end of the job execution.
Changing the cache size
The architecture of the cache that is being simulated can be changed by modifying the setup-gcache.simics file. This file is shown below, annotated with explanation of the different parameters
#########################################################
# To set up 1 level of unified cache
@from configuration import *
$tsc = cpu0->ia32_time_stamp_counter
@SIM_set_configuration([
OBJECT("cpu0_l1_uc", "g-cache",
cpus = OBJ("cpu0"), - This specfies the cpu to which this cache is connected
config_line_number = 512, - The number of entries in the cache
config_line_size = 64, - The size of each cache line in bits
config_assoc = 2, - Associativity of the cache
config_virtual_index = 0, - Specifies whether the cache is indexed and using virtual address or physical address.
config_virtual_tag = 0, - Specifies whether the cache is indexed using virtual address or physical address.
config_replacement_policy = "lru", - Cache replacement policy
penalty_read = 0, - Read/write latencies
penalty_write = 0,
penalty_read_next = 0, - Latencies for read/write communication with the next unit in the memory hierarchy
penalty_write_next = 0)])
# plug the hierarchy
@conf.cpu0_mem.timing_model = conf.cpu0_l1_uc
# Send instruction fetches to the cache - NOTE MUST START IN STALL MODE
cpu0.ifm "instruction-fetch-trace"
cpu0->ia32_time_stamp_counter = $tsc
#########################################################
By changing the line number, the size of the cache can be varied.
TLB simulations
Simulations for different TLB configurations require additional files to be sent along with the job, as well as different checkpoint images. The following files provide templates for the execution of TLB simulations of 20 billion instructions for openAFS, with TLBs configured with 64 and 256 entries, respectively:
mkdir openafs-64 cd openafs-64 tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_tlb64.tgz # edit batch_script.simics to set up con0.input appropriately condor_submit condor_script cd .. mkdir openafs-256 cd openafs-256 tar -xzf /mnt/ganfs/C124004016/homework3/templates/hw3_tlb256.tgz # edit batch_script.simics to set up con0.input appropriately condor_submit condor_script
Summary: con0 input strings for running benchmarks
nBench
- In dom0:
con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT\n"
- In domU:
con0.input "ssh 10.10.0.14 'cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT'\n"
- In both dom0 and domU:
con0.input "cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT &\n" con0.input "ssh 10.10.0.14 'cd /root/nbench-byte-2.2.3; ./nbench -cCOM.DAT'\n"
OpenAFS make
- In dom0:
con0.input "cd /root/openafs-1.4.7; make\n"
- In domU:
con0.input "ssh 10.10.0.14 'cd /root/openafs-1.4.7; make'\n"
- In both dom0 and domU:
con0.input "cd /root/openafs-1.4.7; make &\n" con0.input "ssh 10.10.0.14 'cd /root/openafs-1.4.7; make'\n"

