Comments and grades for lab 1 are available on the course submissions website.
We strongly encourage students taking this class as "listener" to still work on and hand in the labs. In the full-length 6.828, the lectures contain a lot more content, including theory discussion, paper readings, and examples from xv6. This IAP version is heavily condensed, with lectures primarily containing the minimal information needed to be able to write JOS. The majority of the learning experience comes from working through the labs.
Read all the comments in a JOS labs. They are there to help you. They frequently include hints.
While grading lab 1, we noticed there was some confusion about
pointer arithmetic. You can perform arithmetic on pointers without
casting them to int first. The increment value will be the same as
sizeof(type); i.e., int* a = 0x4; a++ results in a = 0x8.
The first exercise, the shell, serves two
purposes: familiarity with the C programming language, and exposure to
system calls. What system calls does a completed sh.c use?
brkclosedup2execveforkioctlopenpipereadwaitwriteSome of these system calls you used directly (like dup2 and pipe)
but others (like ioctl) aren't even in the source code! Library
functions (from libc) frequently do work for you that involves
calling into the kernel. The most obvious functions are those that are
thin wrappers around the syscall interface---like write. The less
obvious ones, like fwrite still end up calling into the thin
wrappers. You can get the man pages for these syscalls by running man
2 SYSCALL, like man 2 write, on athena.dialup.mit.edu.
Before we continue, there's a useful tool that's installed on Athena
called strace. As per its man page, it allows you to "trace system
calls and signals". Running strace PROGRAM will show you the
syscalls it is calling---live. This makes strace an incredibly
useful debugging tool. We're going to use strace to identify the
syscalls in some example programs.
hello.c:
#include <stdio.h>
int main(int argc, char* argv[]) {
puts("Hello, world!");
return 0;
}
If we run gcc hello.c -o hello && strace ./hello, we'll eventually
see the syscall
write(1, "Hello, world!\n", 14) = 14
We can conclude that puts is a library function that ends up
calling write. That means we could re-write the program to use
write directly, and then confirm with strace.
hello-write.c:
int main(int argc, char* argv[]) {
const char hello[] = "Hello, world!\n";
write(1, hello, sizeof(hello) - 1);
return 0;
}
puts has to be translated into write because the userspace program
has no way to directly output data onto the screen. The kernel is
responsible for accepting all requests to use the hardware and
determine if they are acceptable before enacting them. Otherwise, each
program could attempt to perform direct operations on each IO
device. If there's only one program running, but in a multitasking
environment, this is going to cause complete mayhem. So we use the
kernel as an arbiter, and the syscalls are our way of communicating to
it our intentions.
Let's look at a more complicated example, in which we want to run another program:
run-system.c:
int main(int argc, char* argv[]) {
system("ls");
return 0;
}
This program just runs ls normally; let's look at the interesting
lines from strace -f, which will tell strace to "follow forks":
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0x7fffe5422818) = 32680
[pid 28702] execve("/bin/sh", ["sh", "-c", "ls"], [/* 60 vars */]) = 0
wait4(32680, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 32680
On Linux, fork is implemented in terms of the more general clone
system call, which is what we see here.
run-fork.c:
int main(int argc, char* argv[]) {
int status;
if (fork() == 0) {
const char* args[] = {"sh", "-c", "ls", 0};
execv("/bin/sh", args);
}
wait(&status);
return 0;
}
Finally, let's look at an example that has to do with files. It's
pretty normal to want a temporary file whose name you don't care
about, so long as it's unique. The C standard library provides
mkstemp:
mkstemp-test.c:
int main(int argc, char* argv[]) {
char template[] = "/tmp/jos-XXXXXX";
int fd = mkstemp(template);
const char msg[] = "6.828 is amazing.\n";
write(fd, msg, sizeof(msg) - 1);
close(fd);
return 0;
}
mkstemp takes a template string that it modifies, replacing the
string of XXXXXX with random ASCII characters. If a file with the
same name already exists, it tries again, until a new file is
created. We can see the strace output:
open("/tmp/jos-Fl9E8n", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(3, "6.828 is amazing.\n", 18) = 18
close(3) = 0
We can translate this into a moral equivalent (without the random name choice and retry; that's an exercise for the reader):
mkstemp-alike.c:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char* argv[]) {
int fd = open("/tmp/jos-Fl9E8n", O_RDWR | O_CREAT | O_EXCL, 0600);
const char msg[] = "6.828 is amazing.\n";
write(fd, msg, sizeof(msg) - 1);
close(fd);
return 0;
}
Each of these examples allowed a userspace program to perform a privileged operation: write to the screen, run another program, create and write to a file; all mediated by the kernel, preventing them from crashing each other. In all cases, the only interface these programs used to communicate with the kernel was the syscall. Now, all we have to do is actually implement them.
So far (up to and including lab2), all of the code in JOS has been
entirely in kernel space. Any function in any file could directly
modify any piece of hardware, as everything is running in ring 0. The
goal of lab3 is two parts: setting up unprivileged user environments
(also called processes) and handling exceptions, and then using this
infrastructure to handle system calls.
The initial part of lab3 is fairly similar to lab2 in that you
have to implement some utility functions for managing in-kernel
tracking data structures. These time it will be struct Env rather
than struct PageInfo. After that, you'll be setting up the interrupt
handlers to perform a userspace to kernelspace transition, and perform
the appropriate action.
When an exception or an interrupt occurs, the processor looks up what
code to run in the Interrupt Descriptor Table (IDT). There are 256
entries, including values for page fault, general protection fault,
and double fault. We'll also use the IDT for enabling a syscall. JOS
uses int $0x30---Linux uses int $0x80.
inc/mmu.h defines the macro SETGATE for conveniently setting all
the fields in the IDT, which are then loaded into the CPU with
lidt. Each entry of the IDT contains many fields, but the ones we're
interested in are the selector and the privilege level. We'll be
setting the selector to GD_KT so that the interrupt runs in the
kernel's context. For syscalls, we'll want to set the privilege level
to ring 3 to allow user programs to invoke the interrupt. For all of
the exceptions, we want to set the privlege level to ring 0 so that
only the kernel or the processor itself can cause them.
Unfortunately, the IDT is not sufficient for describing everything that to do on an interrupt. The processor needs to know which stack to push data onto. To supply this, we have to turn to an x86 feature called the Task State Segment (TSS). The TSS exists to enable hardware multitasking---a feature we will otherwise not be using.
The GDT has a pointer to the TSS which we will fill in with exactly two values: the kernel's stack and privilege level. JOS uses a single kernel stack per CPU, so all interrupts (including syscalls) will be processed there.
Once we've switched to the kernel's stack, the function provided in
the IDT will be run. This function will be defined in trapentry.S,
some specialized assembly that's part of Lab 3 that will save the
necessary information to restore back to userspace. This means
preserving all registers so that when the transition occurs execution
of the userspace program continues unimpeded. This data structure is
defined in inc/trap.h and is called struct Trapframe.
Now that an interrupt has landed execution in the kernel, we need to
process the trap frame. We can't store arguments on the stack, so we
need a special calling convention for syscalls. We'll use EAX to
store the syscall number, and then EDX, ECX, EBX, EDI, ESI
(in that order) to hold the maximum of 5 arguments. We'll use EAX to
return values back into userspace, which we can do by just modifying
the value in our trap frame.
These 5 arguments seem like they're very limiting---they are. However, since the kernel and the userspace share the same virtual memory space, the arguments can be pointers to data in userspace. The kernel has full privileges to read and edit this data, a feature we will take advantage of in future labs. As an added benefit, we don't incur a TLB flush every time we perform a syscall, so calling into the kernel is fairly cheap, but still more expensive than a userspace function call.
One the kernel is done processing the interrupt, heading back to
userspace is easy. The function env_pop_tf will restore the
userspace program by placing the contents in the trap frame back into
the appropriate registers, which will automatically resume exection at
the correct privilege level.
This same functionality will be used in Lab 4 for switching between running processes.
The macros ROUNDUP and ROUNDDOWN are very useful.
You'll have to write code that can load ELF files. You may find the
code in boot/main.c useful, as well as inc/elf.h.
If you get a strange error when trying to run your first userspace
environment, make sure your env_init sets up your environments
such that the first env_alloc call uses envs[0], as that's what
the test code expects.