EEL6892 Spring 11 HW 4: Scheduling Inside Xen
From Grid-Appliance Wiki
Contents |
Introduction
The objective of this assignment is to introduce you to Xen's source code, running multiple VMs, and use the Grid Appliance for scheduling of long running jobs. Much like the previous project, the overall time required for this assignment will take much longer than the time you spend focused on it. So please do not procrastinate. By the end of the assignment you should be comfortable with the following:
- Understanding where in Xen's code domain changes occur
- Modifying Xen code and compiling it
- Submitting Simics jobs using the Grid Appliance / Condor
The questions following the assignment will verify that you have ascertained this knowledge.
I want to emphasize that because this is an upper level graduate course, you are expected to be independent and creative. The directions given over time will be less and less verbose. Your submissions are to be thoughtful and concise. Furthermore, you should READ the entire assignment first before beginning. It will expedite your process.
Reading
There is nothing new. You have been pointed to all the appropriate documentation for the remainder of the course. At this point, you should be familiar with Google and [Main Grid Appliance Wiki] as well as be a member of the Google Group for the Class.
Part 1 - Modifying Xen
In the previous assignment, you were asked to consider components and HAPs that could be used to better understand the relationship between CR3 changes and mode changes. We will now modify Xen to issue a magic instruction during scheduling of domains. We will issue a magic instruction prior to scheduler changing into a new domain.
- This should probably be done in a chroot environment
- Begin by navigating to the Xen source code inside of ubuntu-xen.img (/root/xen-4.0-testing.hg)
- Open the file xen/common/schedule.c
- Search for the "schedule(void)" method
- Before that function add this function
- Before context_switch in schedule call that function passing prev and next of context_switch as input parameters
- You've just added a magic instruction with some parameters
- Compile Xen, from /root/xen-4.0-testing.hg, execute:
- make XEN_TARGET_ARCH=x86_32 xen
- make XEN_TARGET_ARCH=x86_32 install-xen
- If you see any errors, you should debug them before going further
- You can verify that you successfully installed the magic instruction, by:
- Start simics
- Once Xen is booted, go to the Simics console
- Type: "magic-break-enable" and then "c"
- It should break very short thereafter, printing that a magic breakpoint occurred
- Read about magic-break
- You are now ready to proceed
Part 2 - Duplicating a VM
- We will need two domUs, so let's duplicate them
- Before proceeding, add SPEC CPU GCC to the domU
- Copy the domU files (ubuntu.*) to (ubuntu0.*)
- Edit ubuntu0.cfg
- Change the "name" to "ubuntu0"
- Change the disk to point to ubuntu0.img
Part 3 - Creating a Checkpoints with Two DomUs
- Start Simics
- Start both domUs
- Prepare the benchmark in one domU
- Enter the domU
- Start the benchmark
- Leave the domU
- Pause the domU
- Repeat for the other domU
- From the Simics console:
- Resume (unpause) the domUs (i.e., "action1; action2\n")
- Run for 2,000,000,000 cycles
This will not create perfectly synced executions, but it will be close enough. Ensure you do not take too much time leaving a domU and pausing it, otherwise your evaluations will obviously be invalid.
Part 4 - Submitting Simics via Condor
Unlike previous semesters that used a common image, this semester, you developed your own images and check points. As a result, if you want to submit many simulations to run on the grid that you are connecting to via Grid Appliance, you will need to send all the images to the remote machines. If you are not connected to a VERY high speed connection, such as the university's network, this process will take far too long. This task is left as optional and may be useful for your final project. If you are interested please follow these directions. With the following tweaks:
To run your own simulations on Condor, you will need to make the following tweaks:
- Your nfs share is /mnt/ganfs/$(hostname)
- You can store files into there from /mnt/local or directly into /mnt/ganfs/$(hostname)
- You should modify simics_wrapper.sh to point to your Simics checkpoint
- Determine whether or not you need stall
Part 5 - Questions
- What is the magic instruction for x86? What if anything does this instruction do to the CPU that executed it?
- In the example magic instruction, what is stored at the eax, ebx, and ecx register? If we wanted to use the magic instruction to investigate other components in Xen, which of these would we set to distinguish it from another magic instruction?
- What happens if you put the magic_instruction after context_switch in scedule(void)?
- Going back to the previous homework, what is the relationship of CPU mode changes and domain scheduling? For this assignment, create 3 check points, dom0, single domU, two domU and compare the results. Do this for both GCC and DBENCH.
- Plot the time spent in the various domains using the GCC and DBENCH 2 domU checkpoints. Explain the differences.
- During your experiments, do you find any domains that you did not explicitly create? If so, what is their ID and role.
- How many instructions do we need to execute to get reliable data (i.e., steady state) from the previously check pointed state? Use at least 10 points (for example points in the range 1,000 to 10,000,000,000,000). A plot consisting of 3 different data points is expected (domain schedules, CR3, and mode changes) is expected. (Hint: use Condor). Use only the 2 domU check point.
- Include a commented version of your Simic's Hap code used to profile Domain changes / scheduling. Please do not include components that I have already provided for you.
- For the final project, you will analyze some component of Xen using the components we have setup thus far. You could consider looking at scheduling, virtualization performance overheads, or understanding virtualization better by profiling, to name a few. Please write the challenge you want to address, how you will address it, and what you expect the results to be. References are required. This is a project proposal and should be approximately 1/2 to 1 page long.
Code
inline void magic_domain_switch(struct vcpu *prev, struct vcpu *next) { __asm__ __volatile__ ( "movl $0, %%eax;" "xchg %%bx, %%bx" : "+b" (prev->domain->domain_id), "+c" (next->domain->domain_id) : : "%eax" ); }

