To request addition or removal, please email sipb-www at mit.edu. Planet is updated every thirty minutes.
“...so that’s what we’re going to build!”
“Cool! What language are you going to write it in?”
“Well, we were thinking we were going to need three programming languages...”
“...three?”
“...and they’ll be research programming languages too...”
“Are you out of your mind?”
This was the conversation in streaming through my head when I decided that I would be writing my latest software project in Coq, Haskell and Ur/Web. I had reasonably good reasons for the choice: I wanted Coq because I didn’t actually want to implement a theorem prover from scratch, I wanted Ur/Web because I didn’t actually want to hand write JavaScript to get an AJAX interface, and I wanted Haskell because I didn’t want to write a bucket of C to get Ur/Web and Coq to talk to each other. But taken altogether the whole thing seemed a bit ludicrous, like an unholy fusion of a trinity of research programming languages.
In the end, it worked out quite well. Now, what this means depends on your expectations: it was not the case that “everything worked out of the box and had very nice instructions attached.” However, if it was the case that:
then yes, it worked “quite well”. In this post, I’d like to describe in a little more detail what happened when I put these three languages together and speculate wildly about general maxims that might apply when someone is doing something similar. (A description of what the project actually is will unfortunately have to wait a little; it’s not quite done yet.)
While Coq is a research language, it is also in very wide use among academics, and most of its instability lies in advanced features that I did not use in my project. So the primary issues I encountered with Coq were not bugs, but in integrating it with the system (namely, making it talk to Haskell).
Maxim 1. Interchange formats will be undocumented and just good enough to get the job done.
Coq is already designed to allow for communication between processes (this is how the Proof General/Emacs and Coq talk to each other), but the format between coqtop and Proof General was undocumented, ad hoc, and didn't transmit enough information for my application. In the face of such a situation, there are two ways to proceed: grit your teeth and implement the bad protocol or patch the compiler to make a better one. I chose the latter, and learned something very interesting:
Maxim 2. In ML-like languages, it’s very easy to make simple but far reaching changes to a codebase, due to the assistance of the typechecker.
Making the changes to the frontend was very simple; there was nothing deep about the change, and a combination of the typechecker and grep allowed me to pull off the patch with zero debugging. With a few XML tags at a few key spots, I got output reasonable enough to build the rest of the system with.
Aside. Later, I learned that coqide in recent versions of Coq (8.4 and later) has another interchange format. Moving forward, it is probably the correct mechanism to interact with Coq interactively, though this is made somewhat more difficult by the fact that the interchange format is undocumented; however, I've filed a bug. With any luck, it will hopefully do better than my patch. My patch was originally intended to be a partial implementation of PGIP, a generic interchange format for interacting with theorem provers, but the Coq developers and I later discovered that the PGIP project is inactive, and the other user, Isabelle, has discontinued using their PGIP backend. (Sometimes standards don’t help!)
Ur/Web is comparatively less used, and accordingly we ran into a variety of bugs and other infelicities spanning all parts of the system, from the frontend to the compiler. Were they blockers? No!
Maxim 3. A deterministically reproducible bug in some core functionality will get fixed very quickly by an active original author of the code.
This maxim doesn’t apply to fundamental limitations in design (where the fix will take a lot of elbow grease, though the author will usually have a good idea when that’s the case), but other bugs of this type, I found I could get freakishly quick turnaround times for fixes. While I may attribute part of this to the fact that my advisor was the one who wrote the compiler, I don’t think that’s all there is to it. There is a certain pride that comes with an interesting, tricky bit of code you wrote, that makes it an irresistible little puzzle when someone shows you a bug. And we love little puzzles.
There’s also a corollary:
Maxim 4. The less interesting a problem is to the academic, the more likely it is you’ll be able to fix it yourself.
Academics are somewhat allergic to problems that they’re not interested in and which aren’t vital for their research. This means they don’t like working on these bits, but it also means that they’ve probably kept it simple, which means you’re more likely to be able to figure it out. (A good typechecker also really helps! See maxim 2.) There was a simple bug with serving 404s from FastCGI's compiled by Ur/Web, which had a very simple fix; I also made some modifications to Ur/Web made it runnable without having to make install first. Maintainers of active research software tend to be quite receptive to these "engineering" patches, which serve no direct research purpose. I consider these contributes to be a vital component of being a good citizen of the open source community.
OK, Haskell is not really “just” a research language anymore; it is also a very flexible general purpose language which has seen quite a bit of real world use and can be treated as an “ordinary” language in that respect. This made it a good choice for gluing the two other languages together; it can do just about anything, and has very good FFI support for calling into and out of Haskell. This brings us to our next maxim:
Maxim 5. An FFI is a crucial feature for any DSL, and should be a top priority among tasks involved in preparing a language for general usage.
Having Haskell and Ur/Web talk to each other through their FFIs was key for making this all work. Ur/Web is a domain specific language for writing web applications, and among other things it does not include robust systems libraries (e.g. executing external processes and interfacing with them). Most languages will have this problem, since library support takes a bit of work to add, but Ur/Web has a second problem: all side-effectful transactions need to also be able to be rolled back, and this is rather hard to achieve for general input-output. However, with an FFI, we can implement any code which needs this library support in a more suitable language (Haskell), wrap it up in an API which gives the appropriate transactional guarantees, and let Ur/Web use it. Without it, we would not have been able to use Ur/Web: it’s an extremely flexible escape hatch.
Specifying an FFI also is a good way of demonstrating how your language is different from C: it forces you to think about what invariants you expect foreign functions to have (referential transparency? thread-safety?): these invariants are exactly the ones that get automatically fulfilled by code written in your language. That’s pretty cool!
However, because functions which manipulate C pointers are non-transactional, Ur/Web is limited to FFI functions which handle basic C types, e.g. integers and strings. Thus the question of parsing becomes one of utmost importance for Ur/Web, as strings are the preferred interchange format for complex structures. While different languages will have different situations, in general:
Maxim 6. Make sure you know how to do parsing in all of the languages involved.
I’ve presented six maxims of research polyglottism:
If you keep all of these maxims in mind, I believe that the tradeoff between some extra bugfixing and yak shaving for the benefits of the research programming language is a compelling one, and one that should be considered seriously. Yes, you have to be willing to muck around with the innards of all the tools you use, but for any sufficiently important tool, this is inevitably true. And what is a more important tool than your compiler?
by Edward Z. Yang at May 16, 2012 06:54 AM
Startups are hard.
They’re an emotional roller coaster. On any given day, it’s equally likely that you’ll feel like your company is on an inevitable path to success as it is you’ll wonder whether the company will even exist tomorrow. But most of the time you’re too busy focusing on your work to feel much of anything at all.
Figuring out what problem your startup should solve is hard on its own, but every minute of every day you’re faced with a similar decision that you have to make on the fly. Out of the infinite space of possible ways you could be spending your time, which tasks should you pick? When there are multiple known bugs to fix, complicated customer queries to answer, and new feature development that’s been on your todo list for months, you need to efficiently prioritize and decide what needs to be done now and what will have to wait until later.
You have to spend time doing things you don’t like, or complete a job you are completely unqualified to do. Maybe you have to become an expert in some arcane system, and you have to do it on a tight deadline (true story). Or maybe you just want your company to have T-shirts, and you have to figure out how this Photoshop thing works (also a true story).
So why do so many people insist on working at startups? I can’t speak for anyone else, but I can tell you why I do.
I get to spend every day working towards something great. Even when the emotional roller coaster goes through a dip, I know that the goal we’re all working towards is worthwhile. The feedback loop between my work and its impact on the world is incredibly tight.
I get to solve hard problems every day. Technical challenges are easy enough to find at any job, but I can’t imagine another environment that forces you to tackle so many challenges at once. You need to figure out how to be efficient, how to work with others effectively, and how to prioritize and manage your own work. At the end of every day, I’ve learned a ton, whether technically, personally, or professionally.
And finally, I get to work with awesome people. At a startup, you have the opportunity to build a team that conforms to whatever standards you find important. I have the incredible fortune that everyone I work with is an amazing individual who has a ton to teach me and whom I really enjoy being around.
In my book, most of the mythos around startups is exaggerated. You don’t have to work insane hours, and you don’t have to give up your social life. But I can’t imagine doing a company without having a desire to explore new things and possessing a healthy tolerance for pain. A startup is simultaneously the best and worst parts of my life, but there’s no doubt that I am far better off for it.
by gdb at May 14, 2012 04:26 PM
While working on my senior thesis, I had to write a prior work section, which ended up being a minisurvey for the particular subfield my topic was in. In the process, a little bird told me some things...
What are your favorite maxims to keep in mind while you're surveying the literature?
by Edward Z. Yang at May 13, 2012 11:03 PM
I gave a talk on Friday at MongoSF on how to run MongoDB for high availability. The slides are posted on slideshare, but without speaker’s notes they’re probably not all that useful. Sorry about that. But to try to make it up to you, I’m going to go through the talk’s contents in the remainder of this post. As well, the video from the talk is now (highly) available on the 10gen website.
For a company like Stripe, availability is incredibly important. If we are unable to accept an API request, that causes downtime not just for us but also for our customers. Since the limiting factor on availability for many applications is the database (if that’s down, you can’t serve any requests), when choosing a database we sought to find one that allows maximum availability.
In an ideal world, your highly-available database would be a collection of indistinguishable nodes. You could add or remove nodes from the pool without any service interruption, and your nodes can fail or become partitioned from one another without downtime. Unfortunately, there are theoretical results which show that such a database can’t exist. So any real distributed database needs to make tradeoffs.
MongoDB has one of the cleanest availability models of commercially-available databases that I’ve seen. MongoDB nodes are clustered into replica sets. Each node in a replica set contains a full copy of the data. At any given time, there is at most one primary, which is the only node capable of accepting writes. If that node fails or is asked to step down, an election occurs and a new primary is selected. Any node can accept reads, though reads to secondaries are not guaranteed to be consistent. There is no single point of failure or manual administration needed to promote a secondary. Determination of the current cluster configuration is managed by a smart client driver; the application writer needs only to provide the driver with a set of seed nodes to establish an initial connection.
The one case that you as an application writer need to worry about is in the midst of cluster reconfiguration. When the set begins reconfiguring, the servers close all open connections. Any mongo operations will fail until the cluster reconfiguration has completed. Your driver will then bubble this as an error to the application level. (In general, it doesn’t have enough information to know what to do in this case, so it bails. If for example an error is returned to a write, the client can’t tell whether that write was accepted or not.) So to achieve robustness in the case of cluster reconfigurations, you need to write code to handle these exceptions.
At Stripe, we make all of our writes idempotent. We then wrap all Mongo operations in retry logic — if the operation throws an exception, we simply retry it until it succeeds (with an appropriate backoff and timeout). Note that your concurrency model needs to be considered when evaluating whether a write is truly idempotent — make sure a conflicting write can’t sneak in between tries.
With this retry code, our application then can handle arbitrary failures or routine reconfigurations of our Mongo cluster. I come from a MySQL world where I couldn’t even reboot my master server for routine maintenance. Being able to withstand the catastrophic failure of our database primary without any service interruption was basically a dream come true.
That’s really all the code you need to write for high availability with MongoDB. However, you also need to use proper administration and deployment techniques to ensure you don’t take yourself down. You’re most vulnerable during replica set reconfigurations, and so you should minimize the number of times you have to do one. While replica set reconfigurations should be safe, it’s plausible to accidentally be in a world where no node is eligible to become primary. As well, if there’s been lots of flapping, we’ve seen primary election take a minute or more, causing pending requests to timeout. You should always make sure to test your administration procedures in your staging environment to make sure they’re safe. One gotcha we’ve hit is that a recently-booted Mongo node isn’t eligible to become a primary unless all nodes in the cluster are up. In our staging environment, we had been fairly slow while testing a MongoDB version upgrade. In production, we were much faster, and when we tried to fail over to a recently-upgraded node, suddenly we had no eligible primary node.
One operation we’ll commonly perform is a rebuild of all machines in our database cluster. We used to rebuild one node at a time, adding it to the pool, waiting for a full sync, and then removing the corresponding old node. This required a decent number of replica set reconfigurations, and it was pretty miserable to perform. These days, we instead just add all the new nodes to the pool, allow them to sync, fail over the primary, and then remove the old ones.
If you build your new node via an initial sync rather than restoring from a snapshot, you should make sure to force that node to sync from a secondary. Otherwise, you’ll end up with your primary reading through all of its data, evicting hot data from cache. This can cause severe performance issues. In the current version of MongoDB, however, there’s no supported way to force a sync from a particular server (this feature was recently added to the development version). If you read the source, however, you’ll see that the new node selects the node with lowest ping time to sync from. So we just drop in an iptables rule blackholing traffic from the new node to the primary, wait for it to select its sync target, and then remove the iptables rule. This technique is obviously a giant hack, and I’m looking forward to just use replSetSyncFrom instead
.
One nonobvious point when configuring your replica sets is how to address the machines. Mongo can use either IP addresses or hostnames. (Fun fact: hostnames are dynamically resolved and connections are frequently reestablished. So if you munge /etc/hosts, you can easily repoint a particular hostname.) When rebuilding a cluster, we like to reuse hostnames, e.g. creating a new server db1 to replace the old db1. But these nodes have to exist in the replica set at the same time. So we used to add the new node with a temporary hostname, fail over the primary, remove the old nodes, munge /etc/hosts to point to the new nodes, and then reconfigure with the correct hostnames. But as much fun as it is munging /etc/hosts in the middle of database administration, it’s also very easy to make a mistake. These days, we just configure our replica sets with IP addresses. Then the machines can have whatever hostnames they want.
Perhaps the most common reason that many people go down is for routine maintenance. With MongoDB, it’s not too bad to never go down for maintenance. Since Mongo doesn’t enforce any particular schema, a common technique is to tag objects with a schema version number. You can then lazily migrate objects as you read them, or with a batch process that migrates them in the background (you just need to make your app be able to understand all active schema version numbers). One note is that I recommend against multi-updates for large collections (i.e. don’t run a batch db.update(…)).
Why not, you ask? You see, Mongo has a global write lock. Long running operations, such as a multi-update, yield the lock many times a second. However, if you are running at sufficient scale, they still hold the lock for long enough that contention begins to be a problem and queries start stacking up. As well, a multi-update will cause all the migrated records to be read, evicting hot data and further slowing down queries. So instead, at Stripe we perform all migrations at the application level. We iterate over all relevant objects, performing a single-record update for each. This allows us to slow down the pace and ensure that production traffic is not impacted.
Note that Mongo supports background index builds. Use these. You do have to be careful because, as a long-running operation, they can have production impact. However, we’ve found that these tend to be gentler than multi-updates, and we haven’t really seen issues from them.
In general, you should always be very thorough with your MongoDB usage and deployment. It turns out distributed systems are complicated. Don’t assume that your application can actually handle a particular failure case (e.g. what happens if my current primary suddenly becomes wedged but is still accepting TCP connections) — make sure you’ve actually tested it.
As well, you should be cognizant that MongoDB is under rapid development, which is great because it means new features are being added all the time. However, it also means that sometimes API changes happen, or sometimes regressions are introduced. Always wait at least a few weeks before upgrading to the latest version of MongoDB. And before you do, go and read the issues opened on that version, and ideally all of the issues resolved between your version and the newest one. It’s not very much fun, but it’s the best way to know what changes are in store for you.
I’ll close with a final point. Mongo gets a bad rap sometimes from people who use it incorrectly. E.g. people don’t put their driver into safe mode and then are surprised when they start losing writes. Make sure you’ve read the documentation, understand what the w and j parameters are, and that they’re tuned appropriately for your application. As well, pay attention to your system configuration (such as your ulimit on number of file descriptors) to make sure you don’t end up running into surprises down the road.
MongoDB provides a rich set of primitives for high availability. Through a little bit of code and a lot of proper administration technique, you can achieve essentially 100% uptime on your database, even in the presence of catastrophic machine failure. I’d be curious to hear how people have achieved this with other database systems — let me know if you have techniques you’d like to share.
by gdb at May 07, 2012 03:59 PM
Ur is a programming language, which among other things, has a rather interesting record system. Record systems are a topic of rather intense debate in the Haskell community, and I noticed that someone had remarked “[Ur/Web has a http://www.impredicative.com/ur/tutorial/tlc.html very advanced records system]. If someone could look at the UR implementation paper and attempt to distill a records explanation to a Haskell point of view that would be very helpful!” This post attempts to perform that distillation, based off my experiences interacting with the Ur record system and one of its primary reasons for existence: metaprogramming. (Minor nomenclature note: Ur is the base language, while Ur/Web is a specialization of the base language for web programming, that also happens to actually have a compiler. For the sake of technical precision, I will refer to the language as Ur throughout this article.)
In Haskell, if you want to define a record, you have to go and write out a data declaration:
data Foo = Foo { bar :: Int, baz :: Bool }
In Ur, these two concepts are separate: you can define an algebraic data type (the Foo constructor) and you can write types which describe a record (the { foo :: Int, bar :: Bool} bit of the type). To emphasize this point, there are actually a lot of ways I can spell this record in Ur/Web. I can define a type synonym:
type foo = { Bar : int, Baz : bool }
which offers me no protection from mixing it up with a structurally similar but semantically different type qux = { Bar : int, Baz : bool }, or I can define:
datatype foo = Foo of { Bar : int, Baz : bool }
which desugars into:
type foo' = { Bar : int, Baz : bool }
datatype foo = Foo of foo'
that is to say, the datatype has a single constructor, which takes only one argument, which is a record! This definition is closer to the spirit of the original Haskell definition. (ML users might be familiar with this style; Ur definitely comes from that lineage.)
This design of separating algebraic data types from records means we now have obvious facilities for record construction (let val x = { Bar = 2, Baz = true }) and record projection (x.Bar); though if I have a datatype I have to unwrap it before I can project from it. These record types are unique up to permutation (order doesn't matter), which makes them a bit more interesting than HList. They are also nicely parsimonious: unit is just the empty record type {}, and tuples are just records with special field names: 1, 2, etc.
Now, if this was all there was to the Ur record system, it wouldn't be very interesting. But actually, the field #Bar is a first class expression in the language, and the curly brace record type syntax is actually syntax sugar! Unpacking this will require us to define quite a few new kinds, as well as a lot of type level computation.
In vanilla Haskell, we have only one kind: *, which in Ur parlance is a Type. Values inhabit types which inhabit this kind. Ur's record system, however, demands more exotic kinds: one such kind is the Name kind, which represents a record field name (#Foo is one). However, GHC has this already: it is the recently added Symbol kind. What GHC doesn't have, however, is the kind constructor {k}, which is the kind of a “type-level record.” If value-level records are things that contain data, type-level records are the things that describe value-level records. They are not, however, the type of the value-level records (because if they were, their kind would be Type). Let’s look at a concrete example.
When I write:
type foo = { Bar : int, Baz : bool }
What I’m really writing is:
type foo = $[ Bar = int, Baz = bool ]
The $ is a type level operator, being applied to the expression [ Bar = int, Baz = bool ], which is a type level record, specifically of kind {Type} (the “values” of the record are types). The dollar sign takes type level records, and transforms them into Type (so that they can actually be inhabited by values).
This may seem like a meaningless distinction, until you realize that Ur has type level operators which work only on type level records, and not types in general. The two most important primitive type level operations are concatenation and map. They both do what you might expect: concatenation takes two records and puts them together, and map takes a type level function and applies it to every member of the record: so I can easily transform [ Bar = int, Baz = bool ] into [ Bar = list int, Baz = list bool ] by mapping the list type constructor. Extensible records and metaprogramming dispatched in one swoop!
Now, recall that field names all live in a global namespace. So what happens if I attempt to do [ Bar = bool ] ++ [ Bar = int ]? The Ur type checker will reject this statement as ill-typed, because I have not provided the (unsatisfiable) proof obligation that these records are disjoint. In general, if I have two record types t1 and t2 which I would like to concatenate, I need a disjointness proof [t1 ~ t2]. Handling disjointness proofs feels rather unusual to users coming from traditional functional programming languages, but not all that odd for users of dependently typed languages. In fact, the Ur/Web compiler makes handling disjointness obligations very easy, automatically inferring them for you if possible and knowing some basic facts about about concatenate and map.
The Ur record system crucially relies on type level computation for its expressiveness: we can expand, shrink and map over records, and we can also take advantage of “folders”, which are functions which use the type level records as structure to allow generic folding over records. For more information about these, I suggest consulting the type level computation tutorial. But in order to offer these features in a user friendly way, Ur crucially relies on a compiler which has some level of knowledge of how these operators work, in order to avoid making users discharge lots of trivial proof obligations.
Unfortunately, here I must admit ignorance as to how the rest of the Haskell record proposals work, as well as how a record system like this would interact with Haskell (Ur does have typeclasses, so this interaction is at least reasonably well studied.) While this proposal has the benefit of having a well specified system in an existing language, it is complex, and definitely shooting for the moon. But I think it says a bit about what might have to be added, beyond type-level strings, to fulfill Gershom Bazerman's vision here:
It seems to me that there's only one essential missing language feature, which is appropriately-kinded type-level strings (and, ideally, the ability to reflect these strings back down to the value level). Given that, template haskell, and the HList bag of tricks, I'm confident that a fair number of elegant records packages can be crafted. Based on that experience, we can then decide what syntactic sugar would be useful to elide the TH layer altogether.
by Edward Z. Yang at April 20, 2012 05:24 AM

by Edward Z. Yang at April 17, 2012 04:51 PM
This is a very quick and easy fix that has made latency on Ubuntu servers I maintain go from three to four seconds to instantaneous. If you've noticed that you have high latency on ssh or scp (or even other software like remctl), and you have control over your server, try this on the server: aptitude remove libnss-mdns. It turns out that multicast DNS on Ubuntu has a longstanding bug on Ubuntu where they didn't correctly tune the timeouts, which results in extremely bad performance on reverse DNS lookups when an IP has no name.
Removing multicast DNS will break some applications which rely on multicast DNS; however, if you're running Linux you probably won't notice. There are a number of other solutions listed on the bug I linked above which you're also welcome to try.
by Edward Z. Yang at March 24, 2012 07:56 PM
If you haven't noticed, these are coming in the order of the visit days.
Whereas the weather at UPenn was nice and sunny, the NJ Transit dinghy rolled into a very misty Princeton. Fortunately, I had properly registered for this visit day, so the hotel was in order. I was a bit early, so I met up with an old friend who had recently penned this short story and we talked about various bits and bobs ("I hear you're up for a Hugo!") before I meandered over to the computer science building.
The Princeton campus also holds some distinctive memories for me, a patchwork of experiences of my high school self, including a unpleasant month temping for the Princeton Wind Ensemble (I've been since informed that the ensemble is much healthier now), visits to the area for various reasons, a fascinating interview in which I was told that "I should not go to Princeton", and a very pleasant, if slightly out-of-place, campus preview weekend. Of course, it is the case that graduate student life is completely different from undergraduate life, so I was prepared to shelve these experiences for my second visit.
One of the most interesting things I realized about Princeton computer science, during Andrew Appel's speech during the admit's reception dinner, was how many famous computer scientists had graded Princeton with their presence at some point or another; these included Church, Turing, Gödel, Neumann, and others... One of the insights of his talk was this: “maybe your PhD thesis doesn’t need to be your most important work, but maybe you can have a four page aside which sets forth one of the most important techniques in computer science.” (He was referring to Alan Turing’s thesis, in which he introduced the concept of relative computing.)
Some "facts": at Princeton, your advisors work to get you out, so you don't have to worry about dallying too long on the PhD. Being a part-time student is not ok, and Princeton does have qualifying exams, but you'll probably pass. (The overall wisdom that I came away with here is that computer science quals are very different from quals in other PhD disciplines, where the qual is very much a weeder. In CS, they want you to pass.) The generals are in some sense a checkpoint, where you can figure out whether or not you're happy with the research you're doing with your advisor. You need to take six classes, four in distribution, but you can pass out of the final. You spend your second year teaching, both semesters as a "preceptor". You pick your advisor at the end of your first year, and five people serve on your thesis committee (it tends not to be too adversarial). You're relatively shielded from grants.
Andrew Appel is a very smart man. He’s been recently working with my current advisor Adam Chlipala, and their research interests have some overlap. He is working on a verified software toolchain, but he's also happy to tackle smaller problems. One of the small theoretical problems he showed us during our three-on-one meeting was the problem of calculating fixpoints of functions which are not monotonic, but instead oscillate towards some convergence (this occurs when the recursion shows up in the contravariant position; as strange as this may seem, this shows up a bit in object oriented languages!) One of his grad students told me he likes "semantic definitions", and is not afraid to take on large challenges or tackle the gritty details of languages like Cminor, integer alignment, or scaling up theoretical techniques for large programs. And despite being department chair, it's easy to schedule time with him, and he seems to have this uncanny ability to generate grants.
David Walker is very excited about the Frenetic programming language, which is a language for dealing with network programming. I already had a bit of context about this, because Jennifer Rexford had given an invited talk at POPL about this topic. One of the interesting things I noticed when hearing about OpenFlow for the second time was how similar its handling of unknown events was to hardware: if the efficient lookup table doesn't know what to do, you interrupt and go to a slow, general purpose computer, which figures out what to do, installs the rule in the lookup table, and sends the packet on its way. The three of us might have been a technical mood, because we ended up asking him to give us the gory details of per-packet consistency when updating network rules. I hear in another life David Walker also did type systems, and has also done a bit of work in semantics. One contrast a grad student noted between Appel and Walker was that Appel would assume you know what he is talking about (so it's your onus to interrupt him if you don't: a useful skill!), whereas Walker will always go and explain everything and always stop you if he doesn't know what you're talking about. This makes him really good at communicating. (Oh, did I mention Frenetic is written in Haskell?)
Randomly, I discovered one of the current grad students was the creator of smogon.com. He does PL research now: but it's kind of nifty to see how interests change over the years...
by Edward Z. Yang at March 20, 2012 01:00 PM
Note: this is not a discussion of Hilbert's formalism versus Brouwer's intuitionism; I'm using formalism and intuition in a more loose sense, where formalism represents symbols and formal systems which we use to rigorously define arguments (though not logic: a sliding scale of rigor is allowed here), while intuition represents a hand-wavy argument, the mental model mathematicians actually carry in their head.
Formalism and intuition should be taught together.

But as utterly uncontroversial as this statement may be, I think it is worth elaborating why this is the case—the reason we find will say important things about how to approach learning difficult new technical concepts.
Taken in isolation, formalism and intuition are remarkably poor teachers. Have you had the reading a mathematical textbook written in the classic style, all of the definitions crammed up at the beginning of the chapter in what seems like a formidable, almost impenetrable barrier. One thinks, "If only the author had bothered to throw us a few sentences explaining what it is we're actually trying to do here!" But at the same time, I think that we have all also had the experience of humming along to a nice, informal description, when we really haven't learned much at all: when we sit down and try to actually solve some problems, we find these descriptions frustratingly vague. It's easy to conclude that there really is no royal road to intuition, that it must be hard won by climbing the mountains of formalism.
So why is the situation any different when we put them together?
The analogy I like is this: formalism is the object under study, whereas intuition is the results of such a study. Imagine an archaeologist who is attempting to explain to his colleagues a new conclusion he has drawn from a recently excavated artifact. Formalism without intuition is the equivalent of handing over the actual physical artifact without any notes. You can in principle reconstruct the conclusions, but it might take a while. Intuition without formalism, however, is the equivalent of describing your results over the phone. It conveys some information, but not with the resolution or fidelity of actually having the artifact in your hands.
This interpretation explains a bit about how I learn mathematics. For example, most mathematical formalisms aren't artifacts you can "hold in your hand"—at least, not without a little practice. Instead, they are giant mazes, of which you may have visibility of small fragments of at a time. (The parable of the blind men and the elephant comes to mind.) Being able to hold more of the formalism in your head, at once, increases your visibility. Being able to do quick inference increases your visibility. So memorization and rote exercise do have an important part to play in learning mathematics; they increase our facility of handling the bare formalism.
However, when it comes to humans, intuition is our most powerful weapon. With it, we can guess that things are true long before we have any good reason of thinking it is the case. But it is inextricably linked to the formalism which it describes and a working intuition is no where near as easily transferable as a description of a formal system. Intuition is not something you can learn; it is something that assists you as you explore the formalism: "the map of intuition." You still have to explore the formalism, but you shouldn't feel bad about frequently consulting your map.
I've started using this idea for when I take notes in proof-oriented classes, e.g. this set of notes on the probabilistic method (PDF). The text in black is the formal proof; but interspersed throughout there are blue and red comments trying to explicate "what it is we're doing here." For me, these have the largest added value of anything the lecturer gives, since I find these remarks are usually nowhere to be seen in any written notes or even scribe notes. I write them down even if I have no idea what the intuition is supposed to mean (because I'll usually figure it out later.) I've also decided that my ideal note-taking setup is already having all of the formalism written down, so I can pay closer attention to any off-hand remarks the lecturer may be making. (It also means, if I'm bored, I can read ahead in the lecture, rather than disengage.)
We should encourage this style of presentation in static media, and I think a crucial step towards making this possible is easier annotation. A chalkboard is a two-dimensional medium, but when we LaTeX our presentation goes one-dimensional, and we rely on the written word to do most of the heavy lifting. This is fine, but a bit of work, and it means that people don't do it as frequently as we might like. I'm not sure what the best such system would look like (I am personally a fan of hand-written notes, but this is certainly not for everyone!), but I'm optimistic about the annotated textbooks that are coming out from the recent spate of teaching startups.
For further reading, I encourage you to look at William Thurston's essay, On proof and progress in mathematics, which has had a great influence on my thoughts in this matter.
by Edward Z. Yang at March 18, 2012 08:51 PM
I'm hoping that this will be the beginning of a series of posts describing all of the visit days/open houses that I attended over the past month. Most of the information is being sucked out of the notes I took during the visits, so it's very stream of consciousness style. It's kind of personal, and I won't be offended if you decide not to read. You've been warned!
I arrive at the Inn at Penn shortly before midnight, and check in. Well, attempt it; they appear to have no reservation on hand. It appears that I hadn't actually registered for the visit weekend. Oops.
Sam tells me later that first impressions stick, and that her first impression of me was a thoroughly discombobulated PhD admit. Ruffle up my hair! Ask if they're a fellow CS PhD admit, whether or not they have a room, oh, and by the way, what is your name? (I didn't realize that was the real sticking point until she told me about it at the end of the CMU visit.) But Brent is on IRC and I ping him and he knows a guy named Antal who is also visiting UPenn so I give him a call and he lends me his room for a night and things are good. (Mike, the graduate studies coordinator, is kind of enough to fit me into schedules the next day. Thank you Mike!)
I've been to UPenn before. I visited with my Dad; Roy Vagelos was former CEO of Merck and a large influence on UPenn, and at one point I had private aspirations of doing my undergraduate degree in something non-CS before hopping to computer science graduate school. But when the MIT and Princeton admissions came UPenn got soundly bumped off the table, and this experience was wrapped up and stowed away. But I recognized small pieces of the campus from that visit, stitched together with the more recent experience of Hac Phi and POPL. Our tour of Philadelphia lead us to the Bed and Breakfast I stayed at POPL, and I was Surprised™.
Benjamin Pierce is flying off to Sweden for WG 2.8 (a magical place of fairies, skiing and functional programming), so he's only around for the morning presentations, but during breakfast he talks a bit about differential privacy to a few of the candidates, and then heads up the presentations. I hear from his students that he’s gotten quite involved with the CRASH project, and is looking a lot more at the hardware side of things. One of the remarkable things that I heard in the morning talks was how programming languages has sort of leaked into all of the CS research that was presented (or maybe that was just selection bias?) Well, not everyone: the machine learning researchers just “want to do a lot of math” (but maybe we still get a little worried about privacy.)
UPenn is well known for its PL group, where “we write a little bit of Greek every day.” It’s really quite impressive walking into the PL lunch, with a battalion of grad students lined up against the back wall cracking jokes about Coq and Haskell and the presentation is about F*.
Here are some "facts". At UPenn you spend two semesters TAing, there’s an office lottery but people in the same group tend to cluster together, you don’t have to work on grants, the professors are super understanding of graduate student's life situations, the living costs are around $500/mo, and that it is a very laid back place. Your advisor is very much in charge of your degree, except once a year when there's a department wide review to check no one falls through the cracks. The drop-out rate is low. UPenn has a nice gym which is $400/yr. Biking Philadelphia is great, but you shouldn't live in grad housing. Because there is a power trio of Pierce, Zdancewic and Weirich, when it comes time for you to form your thesis committee, the other two profs who aren't your advisor serve on your committee. The PL group is in a slightly unusual situation where there aren't any 2nd or 3rd year students (this was due to a double-sabbatical one year, and bad luck of the draw another.) But there are infinitely many 1st year students. You have to take three laid back exams. Philadelphia is a nice and large city (5th biggest!) UPenn and CMU are perhaps the biggest pure programming languages departments out of the places I visited.
Stephanie Weirich is super-interested in dependently typed languages. She wants to find out how to get functional programmers to use dependent types, and she's attacking it from two sides: you can take a language like Haskell for which we are adding more and more features moving it towards dependent types, and you can build a language from scratch like TRELLYS which attempts to integrate dependent types in a usable way. She’s not shy about giving 1st years really hard projects, but she's happy to do random stuff with students. She reflects on Dimitrios Vytiniotis' trajectory: where he discovered his true calling with polymorphism and Haskell, and is now off designing crazy type systems with Simon Peyton Jones at Microsoft Research. “I can’t make you go crazy about a topic,” (in the good sense) but she’s interested in helping you find out what lights you on fire. I ask her about the technical skills involved in developing metatheory for type systems (after all, it’s very rare for an undergraduate to have any experience in this sort of thing): she tells me it’s all about seeing lots of examples, figuring out what’s going to get you in trouble. (In some sense, it’s not “deep”, although maybe any poor computer scientist who has tried to puzzle out the paper of a type theorist might disagree.)
Steve Zdancewic has a bit more spread of research interests, but one of the exciting things coming out is the fact that they have Coq infrastructure for LLVM that they've been working on over the course of the year, and now it's time to use the infrastructure for cool things, such as do big proofs and figure out what's going on. There have been a lot of cases where doing just this has lead to lots of interesting insights (including realization that many proofs were instances of a general technique): mechanized verification is a good thing. He also has some side projects, including compilers for programming languages that will execute on quantum computers (all the computations have to be reversible! There is in fact a whole conference on this; it turns out you can do things like reduce heat emissions.) There's also a bit of interest in program synthesis. Steve echoes Stephanie when he says, “I am enthusastic about anything that my students are interested in enough to convince me to be enthusiastic about.” They don’t do the chemistry model of getting a PhD, where you “use this pipette a hundred million times.”
In the words of one grad student, these are professors that are “friendly and much smarter than you!” They’re happy to play around and have fun with programming languages concepts (as opposed to the very hard core type theory research happening at CMU; but more on that later.)
We'll close with a picture, seen in the neighborhoods of Philadelphia.

by Edward Z. Yang at March 16, 2012 05:59 AM
For the past N months, it seems like there is no new technology stack that is either hotter or more controversial than node.js. node.js is cancer! node.js cures cancer! node.js is bad ass rock star tech!. I myself have given node.js a lot of shit, often involving the phrase “explicit continuation-passing style.”
Most of the arguments I’ve seen seem to center around whether node.js is “scalable” or high-performance, and the relative merits of single-threaded event loops versus threading for scaling out, or other such noise. Or how to best write a Fibonacci server in node.js (wat?).
I am going to completely ignore all of that (and I think you should, too!), and argue that node.js is in fact on to something really cool, and is worth using and thinking about, but for a reason that has absolutely nothing to do with scalability or performance.
node.js is cool because it solves a problem shared by virtually every mainstream language. That problem is the fact that, as long as “ordinary” blocking code is the default, it is difficult and unnatural to write networked code in a way that it can be combined with other network code, while allowing them to interact.
In most languages/environments — virtually every other language people use today — when you write networked code, you can either make it fully blocking itself, and implement your own main loop — which is almost always easiest — or you can pick your favorite event loop library (you probably have half a dozen choices), and write your code around that. If you do the latter, not only will your code likely be more awkward than if you chose the blocking main loop approach, but your only reward for the effort is that your code is combinable with the small fraction of other code that also chose the same event loop library.
The upshot of this situation is that if you pick a couple of random networked libraries written by different people — let’s say, an HTTP server, a Twitter client, and an IRC client, for example — and want to combine them — maybe you want a Twitter < -> IRC bridge with a web-based admin panel — you will end up at best having to write some awkward glue code, and at worst doing something truly hackish in order to make them communicate at all.
(In case you’re not convinced about the existence or scope of this problem, you can detour ahead to the optional example of how this problem manifests with typical Python libraries)
node.js solves this problem, somewhat paradoxically by reducing the number of options available to developers. By having a built-in event loop, and by making that event loop the default way to accomplish virtually anything at all (e.g. all of the builtin IO functions work asynchronously on the same event loop), node.js provides a very strong pressure to write networked code in a certain way. The upshot of this pressure is that, since essentially every node.js library works this way, you can pick and choose arbitrary node.js libraries and combine them in the same program, without even having to think about the fact that you’re doing so.
node.js is cool, then, not for any performance or scalability reasons, but because it makes composable networked components the default.
An interesting point here is that there is not really any fundamental differences between what you can do in Python and node.js. The Twisted project, for example, is basically an attempt to implement the node.js ideology in Python (yes, I’m being terribly anachronistic describing it that way). Twisted, however, has a fairly steep learning curve, and “feels” unnatural to developers used to writing “normal” Python, and so relatively few libraries get written for the Twisted environment, compared to the rest of the Python ecosystem. Twisted suffers from the fact that the Python language and Python community are not set up to make Twisted the default way to do things.
The key is that node.js makes it, both technically and socially, easier to write code in this composable way than not to do so. The built-in event loop and nonblocking primitives make it technically easy, and the social culture that has grown up around it discourages libraries that don’t work this way, so libraries that attempt to just block for IO are looked down on and are unlikely to thrive and gain adoption and development resources.
I’m also not ignoring the downside of the node.js style — the potentially convoluted callback-based style, the risk of bringing the whole world to a halt with an accidental blocking call, the single-threaded model that makes it hard to effectively exploit multiple cores. node.js definitely makes you do more work than you might otherwise have to, in many circumstances. But the key is, in exchange for this work, you get something really cool — and something much more valuable, in my opinion, than nebulous performance gains.
I also don’t want to claim that node.js is the only system, or the only technical approach, that makes this property possible. But node.js is the most successful such system I know of, and that’s worth at least as much as the technical possibility — what’s the use of being able to combine third-party libraries, if no one has written anything worth using in the first place?
And similarly, while the single-threaded callback model may not be the best possible model, it seems to have hit some sweet spot for finding a sweet spot in terms of what developers are willing to put up with. Certainly, people are writing node.js code like mad — check out the npm registry for a partial list.
So, node.js is not magic, and it definitely doesn’t cure cancer. But there is something here worth looking at. The next time you need to glue some unrelated networked services together, give node.js a shot – I think you’ll like it. And if you’re still not convinced, glue a quick HTTP frontend onto whatever you’ve created. I promise you’ll be shocked by how easy it is.
As an optional addendum, here’s a step-by-step discussion of how just bad the situation is in most other languages.
Let’s imagine we want to write a trivial Jabber -> IRC bridge: A bot that lurks in an IRC channel, and signs on to Jabber, and sends all the messages it receives on Jabber into the IRC channel. This is the kind of simple problem that can be described in one sentence, and sounds like it should take all of 20 lines of code, but actually turns out to be rather a nuisance.
Python has, by this point, a great library ecosystem, so we happily
start googling, and find the plausible looking python-irclib
and xmpppy libraries. Great. So, what does code in each of
those look like? Well, in python-irclib, we construct a subclass of
SingleServerIRCBot, and call the start function, which runs the
IRC main loop:
bot = MyBot(channel, nickname, server, port)
bot.start()
And in xmpppy, we construct an xmpp.Client object, and call
Client.Process in a loop, with a timeout:
conn = xmpp.Client(server)
# connect to the server
while True:
conn.Process(1)
Ok, so, we launch a thread for each one, and a few minutes of fumbling later, we’re connected to both Jabber and IRC. So far, so good. We’re using Python’s threads, which will inevitably bring us a world of pain, but I’ll ignore that for now, since a better threading implementation could fix most of the pain.
But now, what do we do when we receive a Jabber message? We want to
send a message out the python-irclib instance, but how do we do that?
python-irclib isn’t thread safe, so we can’t just call .send() from
the Jabber thread. Ok, so we add in a Queue.Queue, and have the
Jabber thread push messages onto it.
Now we just need to make the IRC thread fetch messages from this
queue. But how do we do that? The IRC thread is blocked somewhere deep
inside python-irclib, waiting for network traffic. How do we wake it
up to read messages from the queue? The easiest way is to switch from
calling start to calling process_once in a loop with a short
timeout.
This will work, and we’ll eventually get something working, but now we’re forced into polling, with all the annoying latency/CPU tradeoffs that entails, and also half of our code so far has just been spent gluing these two libraries together.
In node.js, on the other hand, we’d just instantiate client objects for both protocols, set up some event handlers, and … well, that’s about it. Because everything’s hooked into the same main loop, they’ll Just Work together, and because we’re all running single-threaded, we can mostly just communicate directly between the two libraries without having to think too hard about race conditions or anything.
The point, of course, is not that this is impossible to write this program in Python. It is, and I’ve done it, and it’s not that terrible. But any option you take will involve some annoying tradeoffs, and will involve making lots of irrelevant plumbing decisions about how to make your pieces play well together. And compared to all that, node.js feels like a breeze.
by nelhage at March 12, 2012 03:36 PM
Suppose that you have k sorted arrays, each of size n. You would like to search for single element in each of the k arrays (or its predecessor, if it doesn't exist).

Obviously you can binary search each array individually, resulting in a runtime. But we might think we can do better that: after all, we're doing the same search k times, and maybe we can "reuse" the results of the first search for later searches.
Here's another obvious thing we can do: for every element in the first array, let's give it a pointer to the element with the same value in the second array (or if the value doesn't exist, the predecessor.) Then once we've found the item in the first array, we can just follow these pointers down in order to figure out where the item is in all the other arrays.

But there's a problem: sometimes, these pointers won't help us at all. In particular, if a later lists is completely "in between" two elements of the first list, we have to redo the entire search, since the pointer gave us no information that we didn't already know.

So what do we do? Consider the case where k = 2; everything would be better if only we could guarantee that the first list contained the right elements to give you useful information about the second array. We could just merge the arrays, but if we did this in the general case we'd end up with a totally merged array of size , which is not so good if k is large.
But we don't need all of the elements of the second array; every other item will do!

Let's repeatedly do this. Take the last array, take every other element and merge it into the second to last array. Now, with the new second to last array, do this to the next array. Rinse and repeat. How big does the first array end up being? You can solve the recurrence: , which is the geometric series
. Amazingly, the new first list is only twice as large, which is only one extra step in the binary search!

What we have just implemented is fractional cascading! A fraction of any array cascades up the rest of the arrays.
There is one more detail which has to be attended to. When I follow a pointer down, I might end up on an element which is not actually a member of the current array (it was one that was cascaded up). I need to be able to efficiently find the next element which is a member of the current array (and there might be many cascaded elements jammed between it and the next member element, so doing a left-scan could take a long time); so for every cascaded element I store a pointer to the predecessor member element.

Fractional cascading is a very useful transformation, used in a variety of contexts including layered range trees and 3D orthogonal range searching. In fact, it can be generalized in several ways. The first is that we can cascade some fixed fraction α of elements, rather than the 1/2 we did here. Additionally, we don't have to limit ourselves to cascading up a list of arrays; we can cascade up an arbitrary graph, merging many lists together as long as we pick α to be less than 1/d, where d is the in-degree of the node.

Exercise. Previously, we described range trees. How can fractional cascading be used to reduce the query complexity by a factor of ?
Exercise. There is actually another way we can setup the pointers in a fractionally cascaded data structure. Rather than have downward pointers for every element, we only maintain pointers between elements which are identical (that is to say, they were cascaded up.) This turns out to be more convenient when you are constructing the data structure. However, you now need to maintain another set of pointers. What are they? (Hint: Consider the case where a search lands on a non-cascaded, member element.)
by Edward Z. Yang at March 05, 2012 06:30 AM
Range trees are a data structure which lets you efficiently query a set of points and figure out what points are in some bounding box. They do so by maintaining nested trees: the first level is sorted on the x-coordinate, the second level on the y-coordinate, and so forth. Unfortunately, due to their fractal nature, range trees a bit hard to visualize. (In the higher dimensional case, this is definitely a “Yo dawg, I heard you liked trees, so I put a tree in your tree in your tree in your...”) But we’re going to attempt to visualize them anyway, by taking advantage of the fact that a sorted list is basically the same thing as a balanced binary search tree. (We’ll also limit ourselves to two-dimensional case for sanity’s sake.) I’ll also describe a nice algorithm for building range trees.
Suppose that we have a set of points . How do we build a range tree? The first thing we do is build a balanced binary search tree for the x-coordinate (denoted in blue). There are a number of ways we can do this, including sorting the list with your favorite sorting algorithm and then building the BBST from that; however, we can build the tree directly by using quicksort with median-finding, pictured below left.

Once we’ve sorted on x-coordinate, we now need to re-sort every x-subtree on the y-coordinates (denoted in red), the results of which will be stored in another tree we’ll store inside the x-subtree. Now, we could sort each list from scratch, but since for any node we're computing the y-sorted trees of its children, we can just merge them together ala mergesort, pictured above right. (This is where the -1 in comes from!)
So, when we create a range-tree, we first quicksort on the x-coordinate, and then mergesort on the y-coordinate (saving the intermediate results). This is pictured below:

We can interpret this diagram as a range tree as follows: the top-level tree is the x-coordinate BBST, as when we get the leaves we see that all of the points are sorted by x-coordinate. However, the points that are stored inside the intermediate nodes represent the y-coordinate BBSTs; each list is sorted on the y-coordinate, and implicitly represents another BBST. I’ve also thrown in a rendering of the points being held by this range tree at the bottom.
Let’s use this as our working example. If we want to find points between the x-coordinates 1 and 4 inclusive, we search for the leaf containing 1, the leaf containing 4, and take all of the subtrees between this.

What if we want to find points between the y-coordinates 2 and 4 inclusive, with no filtering on x, we can simply look at the BBST stored in the root node and do the range query.

Things are a little more interesting when we actually want to do a bounding box (e.g. (1,2) x (4,4) inclusive): first, we locate all of the subtrees in the x-BBST; then, we do range queries in each of the y-BBSTs.

Here is another example (4,4) x (7,7) inclusive. We get lucky this time and only need to check one y-BBST, because the X range directly corresponds to one subtree. In general, however, we will only need to check subtrees.

It should be easy to see that query time is (since we may need to perform a 1-D range query on
trees, and each query takes
time). Perhaps less obviously, this scheme only takes up
space. Furthermore, we can actually get the query time down to
, using a trick called fractional cascading. But that’s for another post!
by Edward Z. Yang at February 26, 2012 08:41 AM
The You could have invented... article follows a particular scheme:
Why does framing the problem this way help?
It's very important that the problem is easy to understand, and the process of "working out the details" is simple. Otherwise, the presentation feels contrived. This method is also inappropriate when the audience is in fact smart enough to just look at the end-result and understand on an intuitive level what is going on. Usually, this is because they have already seen the examples. But for the rest of us, it is a remarkably effective method of pedagogy.
I'll take this opportunity to dissect two particular examples. The first is Dan Piponi’s canonical article, You Could Have Invented Monads. Here are the four steps:
The second is article of mine, You could have invented Zippers:
So the next time you try to explain something that seems complicated on the outside, but simple on the inside, give this method a spin! Next time: You could have invented fractional cascading.
Postscript. This is a common way well-written academic papers are structured, though very few of them are titled as such. One noticeable difference, however, is that often the “detail work” is not obvious, or requires some novel technical methods. Sometimes, a researcher comes across a really important technical method, and it diffuses throughout the community, to the point where it is obvious to anyone working in the field. In some respects, this is one thing that characterizes a truly successful paper.
by Edward Z. Yang at February 23, 2012 07:42 PM
Here is a full transcript to Github of Bret Victor's "Inventing on Principle". It was transcribed by me, An Yu and Tal Benisty.
Below is a copy of the transcript which I will endeavor to keep up to date with the Github copy. The original content was licensed under CC-BY.
[[0:07]] So, unlike the previous session, I don't have any prizes to give up. I'm just going to tell you how to live your life.
[[0:14]] This talk is actually about a way of living your life that most people don't talk about. As you approach your career, you'll hear a lot about following your passion, or doing something you love. I'm going to talk about something kind of different. I'm going to talk about following a principle—finding a guiding principle for your work, something that you believe is important, necessary, and right, and using it to guide what you do.
[[0:46]] There are three parts to this talk. I'm first going to talk about a principle that guides a lot of my work, and try to give you a taste of what comes out of that. And, I'm going to talk about some other people who have lived that way; what their principles are, what they believe in. But these are just examples, to help you think about what you believe in, and how you want to live your life.
[[1:10]] So to begin with me: Ideas are very important. I think that bringing ideas into the world are one of the most important things people can do. And I think that great ideas, in the form of great art, stories, inventions, scientific theories, these things take on lives of their own, which give meaning to lives as people. So, I think a lot about how people create ideas and how ideas grow. And in particular, what sorts of tools create an environment where ideas grow. Now, I've spent a lot of time over the years making creative tools, using creative tools, thinking about them a lot, and here's something I've come upon: Creators need an immediate connection with what they're creating. So that's my principle. ___ What I mean by that is when you're making something, when you make a change, or make a decision, you need to see the effect of that immediately. There can't be a delay, there can't be anything hidden. Readers have to ... do it. Now I'm going to show a series of cases where I noticed that that principle was violated and I'll show you what I did about that, and then I'm going to talk about the larger context in which this work lives.
[[2:32]] So, to begin with, let's think about coding. Here's how coding works: you type a bunch of code into a text editor, kind of imagining in your head what you find your code is going to do. And then you compile and run, and something comes out. In this case, this is just JavaScript, Canvas, and it draws this single little tree. But, if there's anything wrong with the scene, or if I go and make changes, if I have further ideas, if I go back to the code, I go edit the code, compile and run, see what it looks like. Anything wrong, go back to the code. Most of my time is spent working in the code, working in a text editor blindly, without an immediate connection to this thing, which is what I'm actually trying to make.
[[3:20]] So I feel that this goes against the principle that I have, that creators need immediate connection to what they're making, so I have tried a coding environment that would be more in line with this principle. So what I have here is a picture on the side, and the code on the side, and this part draws the sky and this part draws the mountain and this part draws the tree, and when I make any change to the code the picture changes immediately. So the code and the picture are always the same; there is no compile and run. I just change things in the code and I can see the change in the picture. And now that we have this immediate connection between the code and the picture, we can think of ways of changing the code other than typing. ... So if I want to control that number, I just point my mouse to it, hold down my control key, and I can dial it up and down. So I can see what it looks like for big branches and small branches, and I can converge on what feels right to me artistically. And this works great on any part of the code, I just point to it, and dial it up and down. Some of these numbers, I know what they do but it's kind of surprising to see them do it. [chuckles] And other ones are just completely surprising. [more chuckles]
[[4:48]] So down here, I've got this for loop where I'm counting to sixteen, sixteen little pink blossoms on each branch. And I can turn it down for less blossoms or turn it up for more. But, look at what I'm doing here: I'm just moving the number up and down around twenty or so: and it has this really interesting shimmering effect; it kind of looks as if the wind was blowing through the tree. And the first time I saw this I immediately started thinking about how I could use this effect for an animation. How would I have ever have discovered that if I had to compile and run between every change? So much of art, so much of anything is discovery, and you can't discover anything unless you can see what you're doing.
[[5:33]] So I've shown you adjusting the code, now let's add to the code. I'd like to put a sun up here in the sky, so I'll go to the draw sky section, and I'll want to fill in a circle, so I start typing context.fillCircle, and as soon as I start typing I get this autocomplete list of the different fill methods. So these are the things I can type there: fillCircle, fillRect, fillText. And as I move up and down this autocomplete list, I immediately see what each of them is doing. So, I don't have to imagine what it would do from the method name. I don't have to look at the documentation, I just see it, immediately.
[[6:12]] So I want a circle, and I adjust the x coordinate and the y coordinate, play with it a bit. That looks about right. Probably should be yellow, so I set the fill context, fill style, same autocomplete as before, gives me white by default, and I can change that color code the same way I change any number, I hit the control key, and I get a color palette. I choose a nice yellow. ...Although, the white was kind of interesting, I thought. I kind of didn't expect that. But, with white, it now looks like the moon instead. So just like that. [chuckles] So having this immediate connection allows ideas to surface that would be impossible before.
[[7:17]] But there's still a problem, I think. I've got a picture over, and this code over there and I have to maintain a mapping between the two with my head. And I've got all these lines of code, and just looking over here I don't immediately know what it does. So here's what I can do. I can hold down the option key, and the cursor changes to a magnifying glass, and now as I roll over each line of code, it's highlighting in the picture what's being drawn in that line. So, if I want to know what's going on in a function, I just roll down the function and see what highlights. So here I've got two calls to drawMountain; I don't know which is which; well, here's that mountain, and here's that mountain. And this has to work the other way too; if I see part of the picture, I have to know what code was responsible for drawing it. So I do the same thing; I hold down the option key, and now as I move down each pixel in the picture, it's jumping to the line of code that drew that pixel. So that drew the guy, and that drew the tree, and that drew the blossom. So this is really important for making that mapping, and also useful for moving around. I want to make the sun a little bit bigger; I jump there, and make it a little bigger. I want to make this tree a little bigger; I jump there, and bring up the tree a little bit; I want to bring up the mountains a little bit, so I jump there, bring up the mountains a little bit; and I can make these changes as quickly as I think about them, and that is so important to the creative process. To be able to try ideas as you think of them. If there is any delay in that feedback loop, of thinking of something and seeing it, and building on it, then there is this whole world of ideas which will just never be. These are thoughts that we can't think.
[[9:36]] Ideas are very important. The thing about ideas, is that ideas start small. Ideas start out, tiny, weak and fragile. In order to develop and mature. In order for ideas to develop and mature, ideas need an environment where the creator can nurture them. Kind of take care of them, feed them, and shape their growth. And to me, that's what the principle of immediate connection is all about. And because ideas are so precious to me, when I see this principle violated, when I see ideas stillborn or stunted because their creator couldn't see what they were doing, I feel that's wrong. And not wrong in the sense of violating some UI guideline or going against a best practice, but wrong in a deeper sense then that. And I'll come back to this, but I want to show you another example of this principle.
[[10:26]] Here, there is no state. There is no time, there is no interactivity. And I was thinking about how we would handle those aspects of coding that was inline with those principles I had, that was inline with this immediate connection. So, what I have here, is a little platform game. Here is my little guy, he can run around, he can jump, he can die [chuckles]. And the code for him is over here. So this code makes him run around, this makes him jump, this makes him collide with things... and down here, I've got some code for this little turtle. And this turtle is not doing much right now because I haven't finished writing this code, so, I'm just going to do that right now. So on each tick x position plus equals direction times time interval one sixtieth of a second times the speed, which, um, I dunno? Could be fast, could be slow, could be negative; he walks backwards. [chuckles] And these are all ideas that I can use for other enemies but I think turtles are supposed to be slow, so let's set that speed for our turtle. Now, up here, I've got some code, when my guy jumps on the turtle, he gets some Y velocity, so he bounces into the air, and the turtle gets stomped. So that looks like that. And the turtle gets down for a bit.
[[12:01]] The problem is, I don't want the player to be able to get up there yet. I want the player to bounce up the turtle, and go through this little passageway down there. It'll have to go around and solve puzzles and whatnot, and then come back and get the star. So, the turtle is too bouncy right now. So I could just turn that down on the code, and now I can try it but, but now it's not bouncy enough. So while it's nice is I can adjust the code while it's running, without having to stop and recompile and find my place again, I can't see immediately what it is I want to see, which is whether or not I make that jump.
[[12:43]] So here's what I'm going to do. I'm going to bounce off the turtle, and pause the game. So I pause the game, and now there's a slider up here, which lets me rewind through time. And now, I could rewind to back before I made the jump, change the code, and now when I rewind it forward, it's going to simulate it, using the same inputs as before, but with different code. [applause]
[[13:20]] This is not good enough. [laughter] I need to be able to see the changes immediately. I need to be able to see immediately whether or not my bounciness is correct. None of this stuff. And so if you have a process in time, and you want to see changes immediately, you have to map time to space. So here's what I'm going to do. I'm going to bounce off my turtle, pause my game, and now hit this button here, which shows my guys trail. So now I can see where he's been. And now when I rewind, this trail in front of him, is where he is going to be. This is his future. And when I change the code, I change his future. [gasps] So I can find exactly the value I need, so when I find the value I need, he slips right in there. [applause]
[[14:26]] So, creators need to be able to see what they're doing. If you're designing something embedded in time you need to be able to control time. You need to be able to see across time, otherwise you're flying blind.
[[14:40]] As I was playing with this, I noticed it's fun to play with gravity. So I can make gravity a little negative and he starts to float up in the air. [laughter] And I can kind of play with that and try to get him to stay there. And you could probably make an entire game around just the mechanic here - gravity manipulation. In fact, I bet I could fiddle with any part of this code and come up with an idea for a game. Even if I just comment out the first statement in the code, now my guy can't move left - he can only move right. Which sounds kinda silly, but Terry Cavanagh actually made a beautiful game around that concept called "Don't Look Back". Terry Cavanagh, he made another really wonderful game which you might have seen, called "V", spelled as the letter v six times. And, the way that game works, is that you can't jump. Instead, you can only flip upside down, and you fall up instead of falling down. So it kinda works like this. You can walk on the ceiling or you can walk around on the ground. And so you have these levels which kinda look like this, and you kinda walk around...You have to learn how to navigate a terrain like this. And so if you had like a, something like that, you wouldn't be able to jump over it. You'd have to flip over and flip over; he got an amazing amount of game play out of this concept.
[[16:07]] So again, being able to try ideas as you think of them. [pause] This example, and the last one with the tree, these are both very visual programs; they're able to see our changes just by seeing how the picture changes. So I was thinking about, how we could do more abstract coding that's more in line with this principle. How can we write a generic algorithm in such a way that we can see what we're doing. So as an example, let's take a look at binary search. Super quick refresher on how binary search works: you have an array of values that are in order, and you have a key, which is the value you're trying to locate within the array. And you keep track of two variables, which are the lower and upper bounds of where you think that value could possibly be; right now, it could be anywhere. And you look right in the middle of that range - if what you find is too small, then the key has to be after that. Look in the middle of the range, if what you find is too big, the key has to be before that. And you kinda keep subdividing your range until you narrow in on the value you're looking for. And in code, binary search looks like this. And from my perspective, you can't see anything here. You can't see anything. I see the word 'array', but I don't actually see an array. And so in order to write code like this, you have to imagine an array in your head, and you essentially have to play computer. You have to simulate in your head what each line of code would do on a computer. And to a large extent, the people that we consider to be skilled software engineers are just those people that are really good at playing computer. But if we're writing our code on a computer...why are we simulating what a computer would do in our head? Why doesn't the computer just do it...and show us?
[[18:06]] So. Let's write binary search. Function "binary search" takes a key and an array. And then over here on this side, it's saying "Ok, it takes a key and an array, such as what? Give me an example; I need something to work with here." So, for instance, my array might be 'a', 'b', 'c', 'd', 'e', 'f'. And let's say for instance we're looking for the 'd'. So now let's start coding. The lower bound starts out as zero. Over here it says 'low equals zero', nothing amazing there. Upper bound starts out at the end of the array, so high equals array length minus one. And over here, it says 'high equals five'. So I have my abstract formula in the code. Over here, it's giving me the concrete value corresponding to these example arguments. So I don't have to maintain this picture in my head; it's just showing it to me.
[[19:09]] So now I need the index in the middle of the array, so I'm going to take the average of those two. Mid equals low plus high over two, and...well that's obviously not right. Two point five is not a valid array index. So I guess I need to round this off. So I'm going to add the floor function and it rounded it down to two. And I caught that bug literally the second I typed it, instead of writing the function in twenty unit tests. So now I get the value out of the array...and then I need to subdivide my range, which, so this if statement which I'll just paste in here. So in this case, the value I found is less than the key, so it's taking this first branch of the if statement. This is adjusting the lower bound. Of course if the key was smaller, then it would take this branch of the if statement and adjust the upper bound. Or, if the key was 'c', then we would've just happened to find it on the first shot, and we'd return the index.
[[20:14]] So this is the first iteration of this algorithm. And now what we need to do, is loop. We've subdivided the array, we need to keep subdividing until we narrow in on what we're looking for. So, we need to loop; I will just loop. While 1, do all this. And now what we have are three columns corresponding to three iterations of this loop. So this first column here is exactly what you saw before. Low and high span the entire array, we found a 'c', it was too low, so we adjusted our lower bound, ad loop up to here. Second iteration, bounds are tighter; we found an 'e'. Adjust the upper bound. Third iteration, loop up to here; low and high are the same. We've narrowed it down to a single candidate - it's indeed the key we're looking for, and we returned this index. So there's nothing hidden here; you see exactly what the algorithm is doing at every point. And I can go up to here and try different keys, so I can see how the algorithm behaves for these different input arguments.
[[21:20]] And by looking across this data, I can develop an intuition for how this algorithm works. So I'm trying different keys here, and say I try looking for a 'g'. And this looks a little different. It's not actually returning. And the reason for this is, I'm looking for a key which is not actually in the array. And the only way of breaking out of this loop, is by finding the key. So it's kinda stuck here looping forever. So we can take a look at this and see what went wrong, where's the algorithm going off the rails. These first few iterations look fine, but this iteration looks weird, because low is greater than high. Our range is completely collapsed. So if we get to this point, then we know the key can't be found. So I see this faulty condition, and I say, "Oh, that's not right; low has to be less than or equal to high." Okay, well, I'll just put that over as the condition for my while statement. Low, less than equal to high, and then that would break out of the loop, and I would return some signal to say that it couldn't be found. So here we have three iterations of the loop, couldn't be found, we return a not found value.
[[22:33]] So that's what it might be like to write an algorithm without a blindfold on. [applause]
[[22:45]] So I've got this principle, again, that creators need to be able to see what they doing. They need this immediate connection with that they're making. And I've tried to show this principle through three coding examples, but that's just because this is a software engineering conference, I thought I was supposed to talk about programming. But to me, this topic has nothing to do with programming in particular. It has to do with any type of creation. So I'd like to show you a couple more demos, just to show you the breadth of what I have in mind here.
[[23:17]] So, to begin with, let's look at a different branch of engineering. So here I have an electronic circuit that I drew. I'm not quite done drawing it, so let me finish it up there. And we'll put 2. And now we have a working circuit. I mean I assume it's a working circuit. I don't actually see anything working here. So this is exactly the same as writing code, that we work in a static representation. But what we actually care about is the data. The values of the variables, so we can't see that here. Now in a circuit, the variables are the voltages on these different wires. So each of these wires has a voltage that's changing over time, and we have to be able to see that. Now, if I was building this circuit on a lab bench, building it physically, I could at least take an oscilloscope and kinda poke around and see what's going on in the different wires, what's going on here, or here. So at the very least, I should be able to do that. So what I have here, is a plot of the voltage of this wire over time. You can see it's high and low, high and low, so this is clearly oscillating. If I built this physically, also I would be able to see the circuit doing something. In this case I have these two LED's up here. These are LED's, little lights, presumably they're there for a reason. I can hit Play, and watch it simulate out in real time. So now you can see what the circuit is doing.
[[24:50]] In order to design a circuit like this, you had to understand the voltage on every wire. You have to understand how all the voltages are changing throughout the entire circuit. And just like coding, either the environment shows that to you, or you simulate it in your head. And I have better things to do with my head, than simulating what electrons are doing. So what I'm gonna do, I'm gonna spread it out a bit. So same circuit, spread out a little bit, and I'm going to add the voltage at every node. So now you can see every voltage throughout the circuit. And I can even hit Play and watch it all kind of simulate out in real time.
[[25:30]] Although, what I prefer to do, is just move my mouse over it, and I can kind of look in areas that are interesting to me and see what the values are. I can compare any two nodes. So if you look at say the node over here, while I mouse over this one, you see the shadow of the one I'm mousing over is overlaid on that. The shadow of the one I'm mousing over is actually overlaid on all of them. And so I can compare any two nodes just by mousing over one of them and looking at the other one.
[[26:00]] And again, I can immediately see results of my changes. So I've got this 70k resistor here. I want to change its value, I just click and drag it, and now I see the waveforms change immediately. And you'll notice that when I click and drag, it leaves behind the shadow of the waveform before I started dragging, so I can compare. I can immediately see the results of my changes.
[[26:26]] Two golden rules of information design. Show the data, show comparisons. That's all I'm doing here. But even this isn't quite good enough. What we're seeing here are the voltages, but in electronics there are actually two data types. There is voltage and there is current. And what we're not seeing is the current, flowing through each of these components. And in order to design a circuit, you need to understand both the voltage and the current. You need to understand the interplay between the two. That's what analog design is.
[[26:51]] So what I'm gonna do is spread these out a little bit more. And now I'm gonna replace each of these components with a plot of the current going throw it over time. So each of these blue boxes represents a component. And you can see which component it is, because it has a little badge in the corner, a little icon, but now you can see everything that's going on in the circuit. You can see how the current changes, you can see how the current and the voltage changes. There's nothing hidden, there's nothing to simulate in your head.
[[27:22]] So what we have here is a different way of representing the circuit. Just in general, you could draw any circuit with these blocks and instead of being made out of little squiggly symbols, it's made out of data. And I think it's important to ask: Why do we have these squiggly symbols in the first place? Why do they exist? They exist because they're easy to draw with pencil on paper. This is not paper. So when you have a new medium, you have to rethink these things. You have to think how can this new medium allow us to have a more immediate connection to what we're making. How can this new medium allow us to work in such a way that we can see what we're doing.
[[28:00]] And it's really the same situation with programming. Our current conception of what a computer program is —a list of textual definitions that you hand to a compiler. That's derived straight from Fortran and ALGOL in the late '50's. Those languages were designed for punchcards. See you'd type your program on a stack of cards, and hand them to the computer operator, (it's the guy in the bottom picture), and you would come back later. So there was no such thing as interactivity back then. And that assumption is baked into our notions of what programming is.
[[28:34]] C was designed for telepipes. That's Ken Tompson and Dennis Richie up there. Richie made C. And there are no video displays in this picture. Ritchie is basically typing on a fancy typewriter that types back to him. Any time you use a console or a terminal window, you're emulating a teletype. And even today, people still think of a rappel(???) or an interactive top-level as being interactive programming. Because that's the best thing you could do on a teletype.
[[29:06]] So I have one more demo I want to show because I want to emphasize that this principle, media connection, is not even about engineering, it's about any type of creation. So I want to move to a different field entirely, so let's think about animation.
[[29:22]] So I've got this painting here, of a tree and leaf on it and I want to make a little video with the leaf kinda drifting down the tree. And the normal way of doing this, in a conventional animation package like Flash, is through keyframes. So you basically say where you want the leaf to be at different points in time, and you hit Play and see what it looks like. So, ok I'm gonna say at frame 20, I'm gonna create a keyframe and the leaf should be there. And at frame 40, create a keyframe and the leaf should be there, and I'm totally guessing here. I can not see the motion. I can not feel the timing, I'm just throwing things in time and space.
[[30:12]] So I got this leaf at different points in time, and add a tween, which tells Flash to connect the dots. And then I'm gonna hit Play and see what it looks like. And it looks ridiculous, it looks like billiard goals bouncing back and forth.
[[30:32]] And the thing is I kind of know what I want, right. It's a leaf. I want leaf drifting down from a tree. And I can even perform that with my hand: leaf drifting down from a tree. But Flash doesn't know how to listen to my hand. But maybe there's a new medium that does know something about listening to my hand.
[[30:57]] So what I'm gonna show you here is a little app I made for performing animation. And we're not really set up for doing a live demo off the iPad so I'm just gonna show you a little video of me making a video. The way this scene is going to play out is the leaf is gonna kind of drift down from the tree, and the scene is gonna pan over and the rabbit is gonna do something. And two things: one, this is going to move pretty quickly, and second, I'll be using both hands at almost all times. So I've got these different layers, the background, the mid-ground and the foreground. I'm choosing which layer to move using my left thumb. I'm gonna move my leaf to its position. I'm gonna move my bunny off stage and start time rolling. Now I'm gonna perform the leaf drifting down from the tree. Run it back, check out how that looked. The motion looks pretty good but the leaf kinda needs to rock back and forth. So i'm gonna pull out a rotation controller, run it back, find where the leaf is about to break off, and record the rotation. And I added a little flip there just because it felt right at the moment. It wasn't even planned. Stop, because I want to pan over. So I'm gonna drag a whole bunch of layers at once, I grab all the layers into a list, I turn down the sensitivity of the background layers so they'll move slower for a kind of parallax effect. I only want to move horizontally so I pull out a horizontal dragger and check out how it looks. I don't quite like the parallax so I adjust the sensitivities just a little bit, try it out again, I like that better, so I get ready to go, I run it back to the beginning so I can get back into the rhythm of the piece. The leaf hits, I wait a beat, and I start panning. And I don't know how many frames I waited, I don't know how long it was, I went when it felt right.
[[32:50]] So I panned over this winter scene and kind of slowed down to a stop. And then I run it back, because I want to do something with my bunny. Throw away these tools because I'm done with them. And wait until I think my bunny should move and he hops away. And I have got a few different poses for my bunny. So I pull those out. And then I find the point when the bunny is about to take off the ground. Which is right there. I switch his pose and I kind of toggle between the poses as he hops away. And then I run it back because I wanna check out how it looked and I'm just gonna bring that up full screen for you. This is the piece.
[[33:50]] So I made that in 2 minutes, performing with my hands like a musical instrument. Very immediate connection between me and what I was trying to make.
[[34:08]] One of the inspirations for this tool was an animation I tried to make several years ago. Not that one but it also began with a leaf drifting down from a tree. And I spent all day in Flash trying to keyframe that leaf. Couldn't do it. And so that was the end of that. I still have my storyboards. Sometimes I play the music I wrote for the piece. But the piece itself is locked in my head. And so I always think about the millions of pieces that are locked in millions of heads. And not just animation, and not just art, but all kinds of ideas. All kinds of ideas including critically important ideas, world changing inventions, life saving scientific discoveries. These are all ideas that must be grown. And without an environment in which they can grow, or their creator can nurture them with this immediate connection, many of these ideas will not emerge. Or they'll emerge stunted.
[[35:14]] So I have this principle that creators need an immediate connection and all of these demos that I just showed you simply came from me looking around, noticing places where this principle was violated, and trying to fix that. It's really all I did. I just followed this guiding principle and it guided me to what I had to do.
[[35:40]] But I haven't said much about the most important part of the story, which is why. Why I have this principle. Why I do this.
[[35:51]] When I see a violation of this principle, I don't think of that as an opportunity. When I see creators constrained by their tools, their ideas compromised, I don't say: Oh good, an opportunity to make a product. An opportunity to start a business. Or an opportunity to do research or contribute to a field. I'm not excite by finding a problem to solve. I'm not in this for the joy of making things. Ideas are very precious to me. And when I see ideas dying, it hurts. I see a tragedy. To me it feels like a moral wrong, it feels like an injustice. And if I think there's anything I can do about it, I feel it's my responsibility to do so. Not opportunity, but responsibility.
[[36:44]] Now this is just my thing. I'm not asking you to believe in this the way I believe I do. My point here is that these words that I'm using: Injustice, Responsibility, Moral wrong, there aren't the words we normally hear in a technical field. We do hear these words associated with social causes. So thinks like censorship, gender discrimination, environmental destruction. We all recognize these things as moral wrongs. Most of us wouldn't see a civil rights violation and think "Oh good, an opportunity." I hope not.
[[37:23]] Instead, we've been very fortunate to have people throughout history who recognized these moral wrongs and saw it as their responsibility to address them. And so there's this activist lifestyle where these persons dedicate themselves to fighting for a cause that they believe in. And the purpose of this talk is to tell you that this activist lifestyle is not just for social activism. As a technologist, you can recognize a wrong in this world. You can have a vision of what a better world could be. And you can dedicate yourself to fighting for a principle. Socially activists typically fight by organizing but you can fight by inventing.
[[38:07]] So now I'd like to tell you about a few other people who have lived this way, starting with Larry Tesler. Larry has done a lot of wonderful things in his life, but the work I'm gonna tell you about he did in the mid '70s at Xerox Park. And at the time, there really wasn't such a thing as personal computers. The notion of personal computing was very young and Larry and colleagues at Park felt that hey had transformative potential, that personal computing could change how people thought and lived. And I think all of us in this room would agree that they turned out to be right about that.
[[38:43]] But at the time, software interfaces were designed around modes. So, in a text editor for instance, you couldn't just type and have words appear on the screen like on a typewriter. You would be in command mode and if you wanted to insert text you'd have to press I to go into insert mode then Escape back out to command mode or maybe you'd hit A to go into append mode. Or if you wanted to move text around you'd hit M go to the Move mode and then you'd have to select and you'd be in the mode to select and move things around. And Larry would watch people using the computer—they actually pioneered the concept of software user studies, another thing that he did—but he would watch people using the software and he found that many people even after training and weeks of use, many people would not become comfortable with the computer.
[[39:30]] And he believed that it was these modes that were to blame. That the complexity of modes was a kind of barrier that many people couldn't cross. And so this kind of represented a threat to this dream of what personal computing could be. So Larry made it his personal mission to eliminate modes from software. And he formed a principle. No person should be trapped in a mode. His slogan that he would go around saying was 'Don't mode me in' and he had it printed on his t-shirt. And this principle informed everything that he did. He thought about it with all the work that he did. And eventually he came up with a text editor called Gypsy, which essentially introduced text editing as we know today. There's an insertion point. And when you typed, words appeared on the screen. To select text, he invented modeless selection with click and drag. So you just click and drag over the text you want to select like using a highlighter —one of the first uses of drag. To move text around, he invented these commands that he called Cut Copy Paste. You select and cut. Later on you paste in whenever you're ready. You're never trapped in a mode, you never have to switch between modes. When you hit W on the keyboard you get W on the screen. Always.
[[40:48]] And he would watch people using his software and he found that someone who had never seen a computer before—which was most people back then—could be up and running in like half an hour. So this was clearly a transformative change in enabling people to connect with computers. And his ideas about modelessness spread to the rest of the desktop interface which was then being invented at Xeroc Park at the same time. And today they're so ingrained in the computing experience that today we kind of take them for granted.
[[41:20]] Now I said that Larry made the elimination of modes his personal mission. That's actually his words, and if you think he's exaggerating, he's Larry's license plate for the last 30 years. Nowadays off course Larry has a website, at nomodes.com and he's on twitter: @nomodes. And so like I said, Larry has made a lot of amazing work in his career but his self identity is clearly associated with this cause.
[[41:46]] And so I'd like to ask What exactly did Larry do? Like how could we best describe what Larry did? A typical biography might say Larry Tesler invented Cut Copy Paste. Which is true, but I think it's really misleading, because this invention was very different than say, Thomas Edison inventing the phonograph. Edison basically just stumbled over the technology for audio recording and he built it out as a novelly. And he built this list of possible applications for his technology but he didn't have any cultural intent. Whereas what Larry did was entirely a reaction to a particular cultural context.
[[42:41]] So another thing that you might hear is that Larry Tesler solved the problem of modeless text manipulation. Solved the problem. And obviously that's true, he worked on the problem for a long time, eventually solved it. But I think that's misleading too, because this problem that he solved only existed in his own head. Nobody else saw this as a problem. For everybody else modes were just how computers worked. There wasn't anything wrong with them any more than nothing wrong than we think there's something wrong with having two arms. It just kind of was a fact of life.
[[43:18]] So the first thing that Larry did was that he recognized a wrong that had been unacknowledged in the culture. And the thing is, that's how many great social changes began as well. So a 150 years ago, Elizabeth Cady Stanton had to stand up and say: women should vote. And everybody else said 'That's crazy, what are you talking about'. Today, we recognize gender discrimination as a wrong. Back then, it was part of society, it was invisible. She had to recognize it, and she had to fight it. And to me, that's a much closer model to what Larry did than the Thomas Edison model of inventing a bunch of random technology so he could patent it.
[[44:01]] Now to be clear I'm not making any judgement about the relative importance or the impact of these two people, I'm just talking about their motivations and their approach. Both of them recognized a cultural wrong, they envisioned a world without that wrong and they dedicated themselves to fighting for a principle. She fought by organizing, he fought by inventing.
[[44:23]] And many other similar figures in computing had similar motivations. So certainly Doug Engelbart. Doug Engelbart basically invented interactive computing. The concept of putting information on a screen. Navigating through it. Looking at information in different ways. Pointing at things and manipulating them. He came up with all this at a time when real-time interaction with a computer was just almost unheard of. Today he is best known as the inventor of the mouse, but what he really came up with is this entirely new way of working with knowledge. His explicit goal from the beginning was to enable mankind to solve the world's urgent problems. And his vision, he had this this vision of what he called knowledge workers using complex powerful information tools to harness the collective intelligence. And he only got into computers because he had a hunch that these new things called computer things could help him realize that vision. Everything that he did was almost single-mindedly driven by pursuing this vision.
[[45:26]] Here's Allen Kay. Allen Kay ran the lab at Xerox Park where we got the desktop interface, so things like windows and icons, command menus. He also invented object-oriented programming and lots of other things. His goal, and I quote, was to 'amplify human reach, and bring new ways of thinking to a faltering civilization that desperately needed it.' Isn't that great? His approach was through children. He believed that if children became fluent in thinking in the medium of the computer, meaning if programming was a form of basic literacy like reading and writing, then they'd become adults with new forms of critical thought, and new ways of understanding the world. And we'd have this enlightened society, similar to the difference that literacy brought to society. And everything that he did, everything he invented, came out of pursuing this vision, this goal with children. And following principles that he adopted from Piaget and Montessori, Jerome Bruner, these people who would study how children think.
[[46:37]] And the figure probably most widely associated with software activism is Richard Stallman. Stallman started a new project which today makes up a big chunk of any Linux system. He also started the Free Software Foundation, wrote GCC, GPL, many many other things. His principle is that software must be free, as in freedom, and he has very precise meaning associated with that statement. He's always been very clear that software freedom is a matter of moral right and wrong. And he has taken a particularly uncompromising approach in his own life to that.
[[47:10]] All of these tremendously influential people dedicated their lives to fighting for a particular ideal with a very clear sense of right and wrong. Often really fighting against an authority or mainstream that did not recognize their wrong as being wrong. And today, the world is still very far from any of their ideals and so they still see a world in crisis and they keep fighting. They're always fighting.
[[47:41]] Now I'm not saying you have to live this way. I'm not saying that you should live this way. What I'm saying is that you can. That his lifestyle is an option that is available to you. And it's not one you're gonna hear about much. Your career councilor is not going to come back to you and say you should start a personal crusade. In a social field, they might, but not in technology. Instead the world will try to make you define yourself by a skill.
[[48:08]] That's why you have a major in college. That's why you have a job title. You are a software engineer. And you'll probably specialize to be a database engineer or a front-end engineer, and you'll be given front-ends and asked to engineer them. And that could be worthwhile and valuable, and if you want to spend your life pursuing excellence and practicing a skill, you can do that. That is the path of a craftsman. That is the most common path. The only other path you really hear about much is the path of the problem solver. So I see entrepreneurship and academic research as kind of two sides of that coin. There is the field. There's the set of problems in that field, or needs in the market. You go in, you choose one, you work it, you make your contribution there. Maybe later on, you choose another problem, you work it, you make your contribution there. Again, that could be worthwhile and valuable and if that's what you want to do, then you can take that path.
[[49:04]] But I don't see Larry Tesler on either of those paths. I wouldn't say he was contributing to the field of user experience design because there was no such thing. He didn't choose some open problem to solve, he came up with a problem that only existed in his own head, and no one else even recognized. And certainly he did not define himself by his craft, he defined himself by his cause. By the principle he fought to uphold. And I'm sure if you look at Wikipedia it will say that he's a computer scientist of a user experience something but to me that's like saying Elizabeth Cady Stanton was a community organizer. No, Elisabeth Cady Stanton established the principle of women suffrage. That's who she was. That's the identity she chose and Larry Tesler established the principle of modelessness. He had a vision, and he brought the world to that vision.
[[50:01]] So, you can choose this life. Or maybe it will end up choosing you. It might not happen right away. It can take time to find a principle because finding a principle is essentially a form of self-discovery, that you're trying to figure out what your life is supposed to be about. What you want to stand for as a person. Took me like a decade. Ten years before any real understanding of my principles solidified. That was my twenties. When I was young I felt I had to live this way but I would get little glimmers of what mattered to me but no big picture. It was really unclear. This was very distressing for me. What I had to do was just do a lot of things. Make many things, make many types of things. Study many things, experience many many things. And use all these experiences as a way of analyzing myself. Taking all these experiences and saying 'Does this resonate with me?'. Does this repel me? Do I not care? Building this corpus of experiences that I felt very strongly about for some reason and trying to make sense of it. Trying to figure out why. What is this secret ingredient to all these experiences that I'm reacting to so strongly.
[[51:16]] Now I think everyone's different. And all the guys I talked about they have their own origin stories which you can read about. I will just say that confining yourself to practicing a single skill can make it difficult to get that broad range of experiences which seem to be so valuable for this sort of work.
[[51:35]] And finally, if you choose to follow a principle, a principle can't be just any old thing you believe in. You'll hear a lot of people say they want to make software easier to use. Or they want to delight their users. Or they want to make things simple. That's a really big one right now. Everyone wants to make things simple. And those are nice thoughts and maybe kind of give you a direction to go in but they're too vague to be directly actionable. Larry Tesler likes simplicity. But his principle was this specific nugget of insight: No person should be trapped in a mode. And that is a powerful principle because it gave him a new way of seeing the world. It divided the world in right and wrong in a fairly objective way. So, he could look at someone selecting text and ask Is this person in a mode? Yes or no? If Yes, he had to do something about that. And likewise, I believe that creators need powerful tools. It's a nice thought, but it doesn't really get me anywhere. My principle is that creators need this immediate connection. So I can watch you changing a line of code and I can ask Did you immediately see the effect of that change? Yes or No? If no, I got to do something about that.
[[52:52]] And again, all those demos that I showed you came out of me doing that. of me following this principle and letting it lead me to exactly what I needed to do. So if you're guided by a principle and bodies of specific insight, it will guide you. And you will always know if what you're doing is right.
[[53:19]] There are many ways of living your life. That's maybe the most important thing you can realize in your life, is that every aspect of your life is a choice. But there are default choices. You can choose to sleepwalk through your life and accept that path that's been laid out for you. You can choose to accept the world as it is. But you don't have to. If there is something in the world you feel is a wrong and you have a vision for what a better world could be, you can find your guiding principle. And you can find for a cause. So after this talk, I'd like you to take a little time and think about what matters to you. What you believe in. And what you might fight for.
[[54:06]] Thank you. [applause]
by Edward Z. Yang at February 20, 2012 09:23 PM
For various reasons (mostly PhD-related) I will be traveling a bit over the next month.
Let me know if you're any of these areas and want to say hi!
by Edward Z. Yang at February 18, 2012 03:55 PM
Abstract. Proof-carrying code can be used to implement a digital-rights management scheme, in the form of a proof-verifying CPU. We describe how this scheme would work and argue that DRM implemented this way is both desirable and superior to trusted (“treacherous”) computing schemes. This scheme permits users to retain control over their own machines, while allowing for specific limitations on software capabilities. The ability to impose these limitations will become especially important when 3D printers and biosynthesis machines become ubiquitous. This essay assumes some technical knowledge, although no background in formal methods is required. (If you know how proof-carrying code works, go away; this essay is not for you.)
It is a truth universally acknowledged that digital-rights management schemes are universally harmful to users. Existing DRM schemes are annoying, counterproductive, buggy and fundamentally ineffective. As implemented, they are nearly indistinguishable from spyware.
I’d like to challenge this assumption.
I have no interest in defending the state of current DRM technology. But I do think there is a way to do it better. My goal is to convince you that proof-based digital-rights management could work; that it has a sound theoretical foundation, that it offers a useful subset of DRM-like functionality, and that it does so in a way that avoids many of the privacy, security and trust problems associated with extant trusted computing platforms. I’d like to describe what this system would look like, and what its implications would be (for example, it does offer some control, but it certainly does not solve the analog hole). Unfortunately, this system doesn’t exist yet; the technology underlying formal methods is still being actively researched and is not yet ready for industry.
Why do I feel compelled to speak up about this “ivory tower vaporware” now? We are currently in the midst of what Cory Doctorow calls “The War on General Purpose Computing”, with bills like SOPA/PIPA being considered by Congress, and technical standards like UEFI being aggressively pushed by large software vendors. I feel that it is critical that we convince industries with a stake in digital rights management to invest in this formal methods research. The tools being pursued right now, namely, trusted ("treacherous") computing, may enable the effective enforcement of DRM for the first time in human history, but it will come at the cost of general purpose computing as we know it today.
Because we can’t describe the implications of a system without describing the system itself, the first thing to do is describe how proof-based digital rights management would be implemented. This description will also set the stage for a discussion of some of the issues surrounding such a system; primarily, whether or not this system is possible and whether or not it is desirable.
Proof-based DRM consists of two components. The first component is a proof verifier, which takes a theorem and a proof of that theorem as inputs, and returns a yes/no answer on whether or not the proof is correct. (We’ll discuss in more detail what exactly a “theorem” and a “proof” is in this context soon.) The second component is a set of theorems, which will describe the desired behavior of software that runs on this hardware (the DRM policies). These two components are integrated into the hardware and collectively serve as the gatekeepers for the programs that will run on the CPU. Code that is to be loaded and run on this CPU must first pass through the proof verifier chip; if the proof is in order, the code the user provided may be directly executed, its adherence to some digital rights policy ensured by the force of logic. (Nota bene: in the rest of this essay, we will not consider the issues of trusting the underlying hardware; a deficiency of our essay, but it is a deep and thorny issue, that also affects CPUs in use today.)
The proof verifier is the core of this system. It can be thought of as a “little mathematician”: someone who reviews a proof in order to check that it is correct. He is furnished with a set of assumptions and a goal (the “theorem”), and a series of deductions from the assumptions to the goals (the “proof”). All the verifier needs to do is, for each goal, check that every step logically follows from the previous one. “P, and P implies Q. Therefore, Q!” Proof verifiers are relatively well studied, and there exist multiple implementations, among which include Coq, HOL Light, Isabelle and F*. Usually, these are written in software, but there is also ongoing research on the design of proof verifiers suitable for embedded devices.
Let’s delve into the operation of a proof verifier a little more. The first input of a proof verifier is the theorem which is to be proved. So the very first task placed upon the user of a proof verifier is to state the theorem in a way that a computer can understand. It’s certainly not reasonable to expect the proof verifier to understand a paragraph of English or some LaTeX equations! What we do is write down mathematical logic as a computer language, this is the language we write our theorems in. Take, as an example, the statement of Fermat’s Last Theorem: no three positive integers a, b, and c, could satisfy , for any
. In Coq, this statement could be written as forall (x y z:Z) (n:nat), x^(n+3) + y^(n+3) = z^(n+3) -> x <= 0 \/ y <= 0 \/ z <= 0. Read out loud, it says “for all x, y, z which are integers (Z), and for all n which are natural numbers (nat), if it is the case that
, then either x, y or z is less than or equal to zero.” While the “computer-friendly” version looks a little different from the informal version (we’ve taken the contrapositive of the statement to avoid negations, and we’ve used addition by three to express the fact that the exponent must be greater than two), they are fairly similar.
Unfortunately, saying in similarly precise language what it means for a program to be “memory-safe” (that is, it never dereferences an invalid pointer) is considerably more difficult. Transforming an informal statement into a formal one is something of an art, for which there is no “correct” answer. During this process, you must balance competing desires: the statement should be easy for a human to understand, but it should also be easy to prove in a computer. Even in the case of Fermat’s theorem, we’ve elided some details: what does it actually mean for something to be an integer or a natural number? What is exponentiation? For that matter, what is addition? What does it mean for two integers to be equal? Fortunately, there are conventional answers for these questions; even in the case of more complicated properties like “memory-safety”, there is a reasonable understanding of the general design principles behind writing a theorem like this down.
For proof-based DRM, we need to scale up security properties like “memory-safety” to the properties one might need to enforce in a digital rights management scheme. How do we show that a program never leaks the contents of a hardware-based private key or prove that a program transmits within a limited set of radio frequencies set by the law? The possibilities multiply, as do the risks. As we move from the realm of well-defined concepts to more irregular, fuzzy ones, it becomes more difficult to tell if a formalization does what we want, or if it is merely a set of rules with a loophole. A criminal may abide by the letter of the law, but not the spirit. In a computer system, there is no judge which can make this assessment. So a reasonable question is whether or not there are any properties which we have some hope of formalizing. Fortunately, the answer is yes; we’ll return to this question in the next section, as it will influence what kinds of DRM we can hope to implement.
The second input of a proof verifier is a proof. Now, we’ve claimed that a proof is a long list (actually, it is more like a tree, as the proof of one goal may require you to prove a few subgoals) of logical steps, each of which can easily be checked. Now, if you have ever attempted to read a real mathematics proof, you’ll know that checking if a proof is correct is never quite this simple. Mathematical proofs leave out steps. Like a writer, a mathematician optimizes his proof for his audience. If they have some relevant background knowledge, he will elide information in order to ensure the clarity of the higher-level structure of a proof. You see, the mathematician is not only interested in what is true, but why it is true. We cannot be so facile when it comes to proofs for computer consumption: as a dumb calculating machine, the computer requires every step of the proof to be explicitly spelled out. This is one of the primary challenges of mechanically checked proofs: a human may write out a three line proof for Euclid’s algorithm, but for the computer, you might end up writing a page. For more complicated theorems about computer programs, a verification project can easily get crushed by the sheer amount of code involved. Scaling up automated theorem proving technology is yet another area of active research, with current technology on the level of abstraction as assembly languages are for traditional programming.
However, once we are in possession of a verified, mechanized proof, we have something that a traditional mathematician does not: assurance that the proof is correct, and that the program has the property we demanded of it. (On the contrary, mathematical proofs published in papers can be, and sometimes are, wrong! Though, even more interesting, the theorem still ends up being true anyway.) This is a good situation to be in: by the principle of proof erasure, we’re allowed to ignore the elaborate proofs we constructed and execute the program directly. We can run the program straight on our hardware, without any untrusted DRM-enabled operating system running underneath. We’ll return to this question later, when we compare our scheme to existing “treacherous computing” schemes.
So what have we covered here? We’ve described how a proof verifier works by looking more closely at its two inputs: theorems and proofs, and touched over some of the active research areas involved with scaling this technology for the real world. In the next two sections, I’d like to go in more detail about two particular aspects of this system as they apply to digital rights management: the theorems associated with DRM, and the relationship between this scheme and existed “trusted computing” schemes.
The MPAA would quite appreciate it there was a way to make it impossible to copy a digital video. But even the most sophisticated technological scheme cannot work around the fact that I can simply setup a video recorder trained on my screen as I watch the movie: this is the so-called “analog hole”, a fundamental limitation of any copy protection scheme. Proof-based DRM cannot be directly used to eliminate the copying of static materials, such as books, music or movies.
Does this mean that proof-based DRM has no useful applications? The new capability we have gained through this scheme is the ability to select what code will run on the hardware. Any legal ramifications are strictly side effects of this technological enforcement. Furthermore, since general purpose computing devices are ubiquitous, a proof-certifying CPU gains us nothing unless the hardware itself affords something extra. A proof-verifying CPU can be thought of as a specialized appliance like a toaster or a microwave, both of which are only interesting insofar as much as they perform non-computational tasks (such as toast bread or heat food).
But there are certainly many interesting hardware peripherals for which this would be useful. In fact, modern CPUs already have some of the specialized chips which are developing along these lines: the Trusted Platform Module, a specification for a cryptoprocessor that can be used to securely store cryptographic keys, is present in most modern laptops. Intel’s Trusted Execution Technology allows the specification of “curtained” regions of memory, which cannot necessarily be accessed by the operating system. The creation of these features has been driven by the trusted (“treacherous”) computing movement, and these are features that can be used both for good and for evil. In a proof-based DRM world, we can give users far more precise control over this secret data, as the user is permitted to write whatever code they want to manipulate it: they simply need to prove that this code won’t get leaked outside of the module. This is information-flow control analysis, which lets us track the flow of private data. (For secret data which has a low number of bits, such as secret keys, a lot of care needs to be taken to mitigate timing attacks, which can be used to slowly leak data in non-obvious ways.) This private data could even be proprietary code, in the case of using a proof-verifying CPU to assist in the distribution of software. This would be more flexible than current software distribution schemes, which are either “in the cloud” (software-as-a-service), or where a proprietary, all-in-one “application box” must be physically hosted by the customer.
Another important application of proof-based DRM is for auditing; e.g. the guaranteed logging of certain events to external stores. The logging store may be some sort of write-only device, and we guarantee that all of the relevant events processed by our CPU are sent to this device by proving that, whenever the event associated with an auditing requirement is triggered, the associated logging call will also be called. This would be a great boon for electronic voting systems, a class of machines which are particularly plagued by unaccountability.
However, looking even further into the future, I think the most interesting hardware peripherals will not truly be peripherals. Rather, the relationship will be inverted: we will be looking at specialized machinery, which happens to need the power of a general purpose computer. We do not ride in cars and fly in planes: we ride in computers and fly in computers. But just as I would prefer my car not to be hackable, or become infected by a computer virus, I would prefer the computational power of a car computer to be limited. This is precisely what proof-based DRM does: it restricts the set of runnable programs. 3D printers and biosynthesis machines are other examples of “peripherals”, which I suspect governments around the world will have a large interest in regulating.
Coming up with useful theorems that define the very nature of the device in question is a bit more challenging: how do you define the line between legitimate use, or an attempt to jam an aircraft radio, create a bioweapon or forge currency? How do you mathematically specify what it means to be “working car software”, as opposed “car software that will cause accidents”? The key insight here is that while it is impossible to completely encapsulate what it means to “be a radio” or “be a car”, we can create useful partial specifications which are practical. Rather than state “steering works properly”, we can state, “given an appropriate input by the wheel, within a N millisecond latency an appropriate mechanical output will be generated.” Rather than state “GPS works”, we can state, “during the operation of GPS, information recording the current location of the vehicle is not transmitted to the public Internet.”
It is also possible to modularize the specification, so that extremely complicated aspects of operation can be verified by an independent systems, and our proof checker only verifies that the independent system is invoked and acted upon. Here are two concrete examples:
It should be clear that the ability to limit what code is run on a device has practical applications. Indeed, it should seem that proof-based DRM is very similar to the trusted (treacherous) computing platform. So, in the next section, I would like to directly describe the differences between these two schemes.
What is the difference between proof-based DRM and current trusted (“treacherous”) computing schemes? Both operate by limiting the code that may be run directly on a piece of hardware.
I think the easiest way to see the difference is to consider how each defines the set of permissible programs that may be run on a computer. In trusted computing, this set is defined to be the set of programs signed by a private key held by some corporation. The corporation has complete control over the set of programs that you are allowed to run. Want to load your own software? No can do: it hasn’t been signed.
In proof-based DRM, the set of allowed programs is larger. It will include the code that the corporation provides, but it will also include any other program that you or the open-source community could write, so long as it provides the appropriate proofs for the digital policies. This means, for example, that there is no reason to choke down proprietary software which may have rootkits installed, may spy on your usage habits, etc. The user retains control. The theorems are public knowledge, and available for anyone to inspect, analyze, and base their own software off of.
What if the user isn’t able to, in practice, load his own code? Given the current difficulties in theorem proving, it is certainly a concern that what may happen in practice is that corporations will generate specifications that are overfitted to their proprietary code: that have such stringent parameters on the code’s operation that it would be effectively impossible for anyone to run anything else. Or, even more innocuously, the amount of effort involved with verifying software will put it out of reach for all but well-funded corporations. Unfortunately, this is a question that cannot be resolved at the moment: we have no data in this area. However, I have reason to be optimistic. One reason is that it has been extremely important for current formal methods work for specifications to be simple; a complicated specification is more likely to have bugs and is harder to prove properties about. Natural selection will take its course here, and if a company is tempted to slip in a “back door” to get their code in, this back door will also be exploitable by open source hackers. Another reason to be optimistic is that we may be able to develop correct-by-construction programming languages, languages which you write in, and when compiled, automatically provide you with the proof for the particular theorems you were looking for.
And of course, there is perhaps one last reason. Through the ages, we’ve seen that open source hackers are highly motivated. There is no reason to believe this will not be the case here.
Of course, demonstrating that proof-based digital-rights management is “better” than the rather bad current alternatives doesn't mean that it is “good.” But, through the course of this essay, I’ve touched upon many the reasons why I think such a scheme could be valuable. It would provide further impetus in the development of proof-carrying code, a technology that is interesting in and of itself. It would provide a sound basis for limiting the functionality of general purpose computers, in circumstances where you really don’t want a general purpose computing device, without requiring rootkits or spyware. The ability to more precisely say what ways you would like your product be used gives producers more flexibility when negotiating in the market (for example, if I can ensure a video game will have limited distribution during its first month, I may be willing to sell it for less). And as general purpose computers gain the ability to influence reality in unprecedented ways, there will be a growing desire for this technology that provides this capability. I think that it is uncontroversial that many, powerful bodies will have an interest in controlling what can be run on certain computing devices. Cory Doctorow has said that “all attempts at controlling PCs will converge on rootkits”; it should be at least worth considering if there is another way.
by Edward Z. Yang at February 15, 2012 09:49 PM
Slow code loading is an insidious problem. It tends to creep upon a codebase so gradually you don’t notice it happening. First you find yourself waiting a second, then a few seconds, then maybe ten or more seconds every time you want to reload your code to test out some change.
I recently wrote a tool to profile Ruby code load times. The tool, called require-prof, is implemented as a simple wrapper over Ruby’s require and load. It spits out a list of the most expensive files to load in your project (correctly subtracting out the time spent recursively requiring other files), so you can go and easily tune your project.
I’ve now used require-prof to greatly improve code load times on a few code bases. In no particular order, here are a few points that I’ve noticed:
mime-types for one reason or another, though none of them actually use its code. On my laptop, loading mime-types takes about 200 milliseconds because it spends a bunch of time loading and registering all of its hardcoded mime types. I have a patched mime-types so that, if ENV["RUBY_MIME_TYPES_LAZY_LOAD"] is set to "true", it will lazily do the parsing at runtime. This drops the mime-types load time to a few milliseconds. require. Otherwise, it will search through your entire load path on every require, which can become pretty expensive. This may be most easily implemented using a wrapper around require, for example using a simple function such as:
def base_require(relpath)
require File.expand_path(File.join(File.dirname(__FILE__), relpath))
end
in a top-level file of your project.
require-prof to find your bottlenecks, make sure to load your code once to ensure it’s all in cache before paying attention to any of the numbers you see. mail gem is incredibly slow to load (about a second or two on my laptop). A lot of the slowness is due to loading its giant autogenerated parsers. I didn’t dig enough to both figure out how to make it load quickly and still not break, but in my projects I shifted over to dynamically loading it when needed. Once your code load time has been tuned to your satisfaction, the next most important operation is to ensure that you don’t end up back where you started a few months down the time. I’ve been considering adding alerts to my code that complain loudly whenever it takes over a certain amount of time to load, but I haven’t quite decided on the best way of doing so.
by gdb at February 06, 2012 07:50 PM
Last night, I returned from my very first POPL, very exhausted, and very satisfied. It was great putting faces to names, chatting with potential PhD supervisors (both from the US and in the UK), and reveling in the atmosphere.
Highlights from my files:
by Edward Z. Yang at January 28, 2012 01:30 PM
The MonadIO problem is, at the surface, a simple one: we would like to take some function signature that contains IO, and replace all instances of IO with some other IO-backed monad m. The MonadIO typeclass itself allows us to transform a value of form IO a to m a (and, by composition, any function with an IO a as the result). This interface is uncontroversial and quite flexible; it’s been in the bootstrap libraries ever since it was created in 2001 (originally in base, though it migrated to transformers later). However, it was soon discovered that when there were many functions with forms like IO a -> IO a, which we wanted to convert into m a -> m a; MonadIO had no provision for handling arguments in the negative position of functions. This was particularly troublesome in the case of exception handling, where these higher-order functions were primitive. Thus, the community began searching for a new type class which captured more of IO.
While the semantics of lift were well understood (by the transformer laws), it wasn’t clear what a more powerful mechanism looked like. So, early attacks at the problem took the approach of picking a few distinguished functions which we wanted, placing them in a typeclass, and manually implementing lifted versions of them. This lead to the development of the already existing MonadError class into a more specialized MonadCatchIO class. However, Anders Kaseorg realized that there was a common pattern to the implementation of the lifted versions of these functions, which he factored out into the MonadMorphIO class. This approach was refined into the MonadPeelIO and MonadTransControlIO typeclasses. However, only MonadError was in the core, and it had failed to achieve widespread acceptance due to some fundamental problems.
I believe it is important and desirable for the community of library writers to converge on one of these type classes, for the primary reason that it is important for them to implement exception handling properly, a task which is impossible to do if you want to export an interface that requires only MonadIO. I fully expected monad-control to be the “winner”, being the end at a long lineage of type classes. However, I think it would be more accurate to describe MonadError and MonadCatchIO as one school of thought, and MonadMorphIO, MOnadPeelIO and MonadTransControlIO as another.
In this blog post, I’d like to examine and contrast these two schools of thought. A type class is an interface: it defines operations that some object supports, as well as laws that this object abides by. The utility in a type class is both in its generality (the ability to support multiple implementations with one interface) as well as its precision (the restriction on permissible implementations by laws, making it easier to reason about code that uses an interface). This is the essential tension: and these two schools have very different conclusions about how it should be resolved.
This general technique can be described as picking a few functions to generalize in a type class. Since a type class with less functions is preferable to one with more (for generality reasons), MonadError and MonadCatchIO have a very particular emphasis on exceptions:
class (Monad m) => MonadError e m | m -> e where throwError :: e -> m a catchError :: m a -> (e -> m a) -> m a class MonadIO m => MonadCatchIO m where catch :: Exception e => m a -> (e -> m a) -> m a block :: m a -> m a unblock :: m a -> m a
Unfortunately, these functions are marred by some problems:
In some sense, MonadError is a non-sequitur, because it isn’t tied to IO at all; perfectly valid instances of it exist for non-IO backed monads as well. MonadCatchIO is closer; the latter three points are not fatal ones could be easily accounted for:
class MonadException m where throwM :: Exception e => e -> m a catch :: Exception e => m a -> (e -> m a) -> m a mask :: ((forall a. m a -> m a) -> m b) -> m b
(With a removal of the ContT instance.) However, the “finalizers are sometimes skipped” problem is a bit more problematic. In effect, it is the fact that there may exist zeros which a given instance of MonadCatchIO may not know about. It has been argued that these zeros are none of MonadCatchIO’s business; one inference you might draw from this is that if you have short-circuiting which you would like to respect finalizers installed using MonadException, it should be implemented using asynchronous exceptions. In other words, ErrorT is a bad idea.
However, there is another perspective you can take: MonadException is not tied just to asynchronous exceptions, but any zero-like value which obeys the same laws that exceptions obey. The semantics of these exceptions are described in the paper Asynchronous Exceptions in Haskell. They specify exactly the interaction of masking, throw and catch, as well as how interrupts can be introduced by other threads. In this view, whether or not this behavior is prescribed by the RTS or by passing pure values around is an implementation detail: as long as an instance is written properly, zeros will be properly handled. This also means that it is no longer acceptable to provide a MonadException instance for ErrorT e, unless we also have an underlying MonadException for the inner monad: we can’t forget about exceptions on the lower layers!
There is one last problem with this approach: once the primitives have been selected, huge swaths of the standard library have to be redefined by “copy pasting” their definitions but having them refer to the generalized versions. This is a significant practical hurdle for implementing a library based on this principle: it’s simply not enough to tack a liftIO to the beginning of a function!
I think an emphasis on the semantics of the defined type class will be critical for the future of this lineage of typeclasses; this is an emphasis that hasn’t really existed in the past. From this perspective, we define with our typeclass not only a way to access otherwise inaccessible functions in IO, but also how these functions should behave. We are, in effect, modeling a subset of IO. I think Conal Elliott would be proud.
There is a lively debate going on about extensions to the original semantics of asynchronous exceptions, allowing for the notion of “recoverable” and “unrecoverable” errors. (It’s nearer to the end of the thread.)
This technique can be described as generalizing the a common implementation technique which was used to implement many of the original functions in MonadCatchIO. These are a rather odd set of signatures:
class Monad m => MonadMorphIO m where morphIO :: (forall b. (m a -> IO b) -> IO b) -> m a class MonadIO m => MonadPeelIO m where peelIO :: m (m a -> IO (m a)) class MonadBase b m => MonadBaseControl b m | m -> b where data StM m :: * -> * liftBaseWith :: (RunInBase m b -> b a) -> m a restoreM :: StM m a → m a type RunInBase m b = forall a. m a -> b (StM m a)
The key intuition behind these typeclasses is that they utilize polymorphism in the IO function that is being lifted in order to thread the pure effects of the monad stack on top of IO. You can see this as the universal quantification in morphIO, the return type of peelIO (which is IO (m a), not IO a), and the StM associated type in MonadBaseControl. For example, Int -> StateT s IO a, is equivalent to the type Int -> s -> IO (s, a). We can partially apply this function with the current state to get Int -> IO (s, a); it should be clear then that as long as the IO function we’re lifting lets us smuggle out arbitrary values, we can smuggle out our updated state and reincorporate it when the lifted function finishes. The set of functions which are amenable to this technique are precisely the ones for which this threaded is possible.
As I described in this post, this means that you won’t be able to get any transformer stack effects if they aren’t returned by the function. So perhaps a better word for MonadBaseControl is not that it is unsound (although it does admit strange behavior) but that it is incomplete: it cannot lift all IO functions to a form where the base monad effects and the transformer effects always occur in lockstep.
This has some interesting implications. For example, this forgetfulness is in fact precisely the reason why a lifted bracketing function will always run no matter if there are other zeros: finally by definition is only aware of asynchronous exceptions. This makes monad-control lifted functions very explicitly only handling asynchronous exceptions: a lifted catch function will not catch an ErrorT zero. However, if you manually implement finally using lifted versions of the more primitive functions, finalizers may be dropped.
It also suggests an alternate implementation strategy for monad-control: rather than thread the state through the return type of a function, it could instead be embedded in a hidden IORef, and read out at the end of the computation. In effect, we would like to embed the semantics of the pure monad transformer stack inside IO. Some care must be taken in the forkIO case, however: the IORefs need to also be duplicated appropriately, in order to maintain thread locality, or MVars used instead, in order to allow coherent non-local communication.
It is well known that MonadBaseControl does not admit a reasonable instance for ContT. Mikhail Vorozhtsov has argued that this is too restrictive. The difficulty is that while unbounded continuations do not play nice with exceptions, limited use of continuation passing style can be combined with exceptions in a sensible way. Unfortunately, monad-control makes no provision for this case: the function it asks a user to implement is too powerful. It seems the typeclasses explicitly modeling a subset of IO are, in some sense, more general! It also highlights the fact that these type classes are first and foremost driven by an abstraction of a common implementation pattern, rather than any sort of semantics.
I hope this essay has made clear why I think of MonadBaseControl as an implementation strategy, and not as a reasonable interface to program against. MonadException is a more reasonable interface, which has a semantics, but faces significant implementation hurdles.
by Edward Z. Yang at January 24, 2012 06:31 PM
One of the most beautiful aspects of Stripe is how it lowers the barrier for software developers to build revenue-creating businesses. Before Stripe, programmatically accepting payments was largely reserved for large companies willing to put in the time and drudgery required to obtain a merchant account. It was easy to write applications, easy to deploy applications, but very hard to charge for applications .
When I decided to drop out of MIT and join Stripe, I didn’t really understand just how true this was. In my case, it really hit home when I was creating domaincli, a command-line client for registering domain names. I wanted to allow a user to run domaincli register gregbrockman.com, enter his or her credit card details, and then end up the proud owner of gregbrockman.com. As you might imagine, there are two principal challenges with making a tool like work: you need to plug in an API to actually register the domain names, and you need to plug in another API to let people pay for said domains.
Not knowing anything about domain registration APIs, or even whether they existed, I started Googling around to see what solutions were available. The results were quite surprising–it turns out that loads of registrars expose APIs. Seeing that Go Daddy had an offering (everyone’s favorite company, I know), I started poking around their product page. However, you had to pay several hundred dollars even to get access to the documentation, and I found rumors in forums that their API was a complicated XML-based mess. So I moved on to others’ reseller APIs.
One company that looked quite good was a Dutch registrar. Their documentation was reasonable (dare I say even good?), and I found several forum posts lauding their API. So I signed up for an account, played around in their test environment, and even PayPal’d in some funds for my account. But when I tried to get my live credentials, I ran into a popup notifying me that I had to request access to their production environment, and the request would be processed during normal business hours. This was on a Friday night, and I was planning on writing domaincli as my weekend project. So with a heavy heart, I realized this registrar wasn’t going to work.
So I started skimming through several other registrars, but it seemed that none of them offered instantaneous setup. I finally stumbled across a registrar who did, though their API was documented only by a pair of terribly-written client libraries. Though their WSDL worked for calls such as “check if this domain is registered”, I could never get their “register this domain” call to actually work.
Disheartened, I was about ready to give up. I dejectedly tried one more Google query, not expecting anything to turn up. But to my surprise, I ended up finding Internet.bs, a registrar with a documented JSON API and, more importantly, instant setup. Their API was a bit complicated, so it took some time to grok the relevant parameters and fully integrate, but in the end in didn’t take more than an hour or two.
After the wild west that was plugging in domain registration, it felt almost like cheating to just use Stripe for payments. I created an account, plugged in a create_token call to send the user’s card details directly to Stripe, and then added server-side create_customer and create_charge calls to store a card on file and charge it, respectively.
The rest was just adding some wiring around these two APIs. You can use the resulting tool by installing it via sudo pip install domaincli, or checking out the source on Github.
by gdb at January 23, 2012 06:06 PM
Editor's note. I've toned down some of the rhetoric in this post. The original title was "monad-control is unsound".
MonadBaseControl and MonadTransControl, from the monad-control package, specify an appealing way to automatically lift functions in IO that take "callbacks" to arbitrary monad stacks based on IO. Their appeal comes from the fact that they seem to offer a more general mechanism than the alternative: picking some functions, lifting them, and then manually reimplementing generic versions of all the functions built on top of them.
Unfortunately, monad-control has rather surprising behavior for many functions you might lift.
For example, it doesn't work on functions which invoke the callback multiple times:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Trans.Control
import Control.Monad.State
double :: IO a -> IO a
double m = m >> m
doubleG :: MonadBaseControl IO m => m a -> m a
doubleG = liftBaseOp_ double
incState :: MonadState Int m => m ()
incState = get >>= \x -> put (x + 1)
main = execStateT (doubleG (incState)) 0 >>= print
The result is 1, rather than 2 that we would expect. If you are unconvinced, suppose that the signature of double was Identity a -> Identity a, e.g. a -> a. There is only one possible implementation of this signature: id. It should be obvious what happens, in this case.
If you look closely at the types involved in MonadBaseControl, the reason behind this should become obvious: we rely on the polymorphism of a function we would like to lift in order to pass StM m around, which is the encapsulated “state” of the monad transformers. If this return value is discarded by IO, as it is in our function double, there is no way to recover that state. (This is even alluded to in the liftBaseDiscard function!)
My conclusion is that, while monad-control may be a convenient implementation mechanism for lifted versions of functions, the functions it exports suffer from serious semantic incoherency. End-users, take heed!
Postscript. A similar injunction holds for the previous versions of MonadBaseControl/MonadTransControl, which went by the names MonadPeel and MonadMorphIO.
by Edward Z. Yang at January 23, 2012 05:39 PM
The above is a new Stripe WordPress plugin. I’m just running it in test mode; feel free to enter test card details such as those from https://stripe.com/docs/testing.
by gdb at January 20, 2012 07:51 PM
It can be hard to understand the appeal of spending three days, without sleep, solving what some have called “the hardest recreational puzzles in the world,”; but over this weekend, hundreds of people converged on the MIT campus to do just that, as part of MIT Mystery Hunt. To celebrate the finding of the coin, I'd like to share this little essay that I found in my files, which compares Mystery Hunt and the scientific endeavour. (If you are not familiar with Mystery Hunt, I recommend listening to the linked This American Life program.)
Thomas Kuhn, in his famous book The Structure of Scientific Revolutions, states that “normal science” is “puzzle solving”: what he means is that the every day endeavors of scientists are the addressing of small, tractable problems, these problems are “puzzles” not “grand mysteries of the universe.” Kuhn goes on to describe what is involved with normal science: generation of facts, increasing the fit between theory and observation, and paradigm articulation. We will see that, as one might expect, these activities are part of “normal” puzzle solving. But (perhaps more unexpectedly) Popperian falsification and Kuhnian revolutions also have something to say about this situation. There are limits to the analogy of puzzles to science, the perhaps most striking of which is that a puzzle has a single, definite solution. But because it is not possible to call up the puzzle writers midway through a puzzle and ask them, “Am I on the right track?” (you are only allowed to phone in the final answer) the intermediate steps still are somewhat informative of the practice of science. In this context, I argue that Popper assumes a microscopic view of scientific progress, whereas Kuhn assumes a macroscopic view of scientific progress.
What is a Mystery Hunt puzzle? This is a question that, at first glance, may seem to defy answering: puzzles can vary from a crossword to an album of images of birds to a single audio file of seemingly random electronic pitches. The answer is always a phrase, perhaps “KEPLERS THIRD LAW” or “BOOTS,” but a puzzle, like a scientific problem, doesn't come with instructions for how to solve it. Thus, every Mystery Hunt puzzle starts off in Kuhn's pre-science stage: without any theory about how the puzzle works, puzzlers roll up their sleeves and start collecting miscellaneous facts that may be relevant to the puzzle at hand. If there are pictures of birds, one starts off identifying what the birds are; if there are short video clips of people waving flags, one starts off decoding the semaphore messages. This is a stage that doesn't require very much insight: there is an obvious thing to do. Some of the information collected may be irrelevant (just as the Linnaean classification of species was broadly modified in light of modern information about observable characteristics of plants and animals), but all-in-all this information forms a useful basis for theory formation. But while Popper doesn’t have much to say about data collection, Kuhn’s concept of the theory-ladenness of observation is important. The theory-ladenness of observation states that it is impossible to make an observation without some preconceptions and theories about how the world works. For example, a list of images of birds may suggest that each bird needs to be identified, but in the process of this identification it may be realized that the images corresponded directly to watercolor engravings from Audubon's birds of America (in which case the new question is, which plates?) Even during pre-science, small theories are continually being invented and falsified.
Once the data has been collected, a theory about how all the data is to be put together must be formed: this step is called “answer extraction”. In regular science, forming the right theory is something that can take many, many years; for a puzzle, the process is accelerated by a collection of pre-existing theories that an experienced puzzler may attempt (e.g. each item has a numbering which corresponds to a letter) as well as hints that a puzzle writer may place in a puzzle's flavor text and title (e.g. “Song of birds” refers to “tweeting” refers to “Twitter”, which means the team should take an extracted phrase to mean a Twitter account.) Naïve inductivism suggests that unprejudiced observation of the data should lead to a theory which explains all of the information present. However, puzzles are specifically constructed to resist this sort of straightforward observation: instead, what more frequently happens is akin to Popper's falsificationism, where theories are invented out of sheer creativity (or perhaps historical knowledge of previous puzzles) and then they are tested against the puzzle. To refer once again to the bird puzzle, one proposed theory may be that the first letter of the scientific names of the birds spells a word in the English language. When the scientific names are gathered and the first letters found not to form a word, the theory is falsified, and we go and look for something else to do. This makes the Popperian view highly individualistic: while only some people may come up with the “good ideas”, anyone can falsify a theory by going out and doing the necessary calculation. Sophisticated falsification allows for the fact that someone might go out and do the calculation incorrectly (thus sending the rest of the puzzlers on a wild goose hunt until someone comes back to the original idea and realizes it actually does work.) However, it only explains the scientific endeavor at very high resolution: it explains the process for a single theory; and we'll see that Kuhn's paradigms help broaden our perspective on the overall puzzle solving endeavor.
Kuhn states that normal science organizes itself around paradigms, which are characterized by some fundamental laws (Maxwell's equations, Newton's laws) as well conventions for how these laws are to be used to solve problems. Unlike a theory, a paradigm cannot be “falsified”, at least in the Popperian sense: a paradigm can accommodate abnormalities, which may resolve themselves after further investigation. The difference is one of scope. So, in a puzzle, while we might have a straightforwardly falsifiable theory “the first letters form a word,” a more complicated, thematic idea such as “the puzzle is Dr. Who themed” is closer to a paradigm, in that the precise mechanism by which you get answer from “Dr. Who” is unspecified, to be resolved by “normal puzzling.” The paradigm is vague, but it has “the right idea”, and with sufficient effort, the details can be worked out. Which paradigm is to be worked on is a social process: if a puzzler comes in freshly to a puzzle and finds that a group of people is already working within one paradigm, he is more likely to follow along those lines. Of course, if this group is stuck, they may call in someone from the outside precisely to think outside of the paradigm. In this manner, revolutions, as Kuhn describes them, occur. When a paradigm is failing to make progress (e.g. failing to admit an answer), the puzzlers will continue to attempt to apply it until a convincing alternative paradigm comes along, at which point they may all jump to this new method. The revolution is compressed, but it still carries many common traits: including the lone puzzler who still thinks the old method will admit an answer with “just a little more work.” (Sometimes they're right!)
We see a striking correspondence between the activities of a scientist working in a lab, and the activities of a scientist working on a Mystery Hunt puzzle. If you'll allow me to indulge in a little psychoanalysis, I think this similarity is part of the reason why Mystery Hunt is so appealing to people with a science, technology, engineering and maths background: in your every day life, you are faced with the most vicious puzzles known to man, as they are by definition the ones that have resisted any attempt to resolution. Months can go by without any progress. In Mystery Hunt, you are once again faced with some of the most vicious puzzles known to man. But there's a difference. You see, in Mystery Hunt, there is an answer. And you can find it.
Happy puzzling!
by Edward Z. Yang at January 16, 2012 09:12 PM
While there are a host of static analysis tools for statically-typed languages like C or Java, dynamic languages like Ruby are traditionally regarded as too dynamic to be able to do any useful analysis without actually running the code. While there are some rudimentary tools you can run over your Ruby codebase (such as druby, reek, and roodi), as far as I can tell there is no tool you can just point at a main file in your Ruby codebase and have it start telling you about your bugs. Frustrated with the state of existing tools, I sat down one weekend to write a proof-of-concept Ruby static analysis tool, which I creatively dubbed ruby-static-checker.
Probably my most common cause of error while programming in Ruby is a name error–caused for example by typoing a variable name or by forgetting to update the name of a class after a refactor. My initial iteration of ruby-static-checker supports checking for typoed variable names. Just run bin/ruby-static-checker , and it will start spitting possible name errors at you. Here’s how it works.
The chief design goal of ruby-static-checker is that it needed to be runnable on unmodified Ruby projects with only the specification of a single file (rather than the entire project). Rather than resolve the import graph by hand, I load the project into the static analysis tool as a library. This has the side benefit of making any load-time class generation trivial to analyze.
The rest of ruby-static-checker is walking over abstract syntax trees for all defined classes (using the pretty cool parse_tree gem). Since Ruby does not require parentheses in order to call a method, it’s perhaps nonobvious, but in fact it is statically resolvable whether a particular name refers to a local variable (in which case no name error is possible) or a function call (in which case a name error is possible). Ruby represents these two cases differently in the AST, and so the checking problem reduces to building a list of defined methods and checking against the calls that are made.
Of course, dynamic constructs like method_missing, Object.send, runtime imports, and load-time branching can cause both false positives and false negatives for this approach. I consider these issues an acceptable cost–I wanted a tool that would generally find bugs in reasonable codebases, and missing out on the more Byzantine corner cases seemed perfectly fine.
I’ve run ruby-static-checker over a couple of codebases, including Stripe’s. Though it usually fires off a number of false positives (usually almost entirely in library code), I’ve also found actual bugs with it. I’ve also found it fairly useful to make sure that ruby-static-checker has found no new name errors in between runs.
Anyway, the tool was written in a weekend, so there’s obviously tons more to do. I’m curious to see how well the load-as-library approach works for more advanced analyses.
by gdb at January 16, 2012 06:32 PM
It’s taken me a long time to fully appreciate the value of people.
In middle and high school, I spent gobs of time studying chemistry and working on my math research. Try as I might, I was unable to find any peers who had similar interests or inclinations. The result was that I ended up spending the vast majority of my time working in isolation. My study carrel at the library became a second home, and I became resigned to thinking that I was alone. While I wasn’t overjoyed at the thought that I could not find anyone with whom to collaborate or share a common struggle, I also wasn’t too distraught about it. That was just how things were, and I accepted that fact without question.
I got my first glimpse of the potential joys of working with others when starting college. At Harvard, I began working in a study group for my math class, the infamous Math 55. Each week the professor would rain ten excruciatingly difficult problems on our heads, and it was all but impossible to keep up with the work on our own. For the first few weeks of school, I worked alone as I usually would, pounding my head against the problems until they yielded. While I would ultimately triumph, my willpower and time to do work for other classes would both be all but depleted. However, a number of us gradually started spending more and more time working together on the harder problems, until before we knew it we had a full-fledged tradition of meeting and collaboratively working on the problems each week. And correspondingly, I stopped feeling quite so beat down when handing in a problem set–in fact, I felt something that was most akin to pride. I realized that having a team of smart collaborators had turned this once harrowing experience into a positive one, with each of us providing mental and emotional support to the others.
I tried to employ this model in other classes, never with much success. A number of collaborators would gather to work on our physics problem sets. However, I found that our group was too mixed in experience, diligence, and ability. A few of the members clearly were just there to copy answers, while some came without ever having looked at the problem set. I started dreading going to physics study group almost as much as I had dreaded working on my math problem sets alone, even though most of the members were extremely smart and dedicated. I realized that the composition of a group of collaborators needs to be precisely tuned, and there is simply no room for those who won’t hold up their end of the social contract.
By the time sophomore year rolled around, I had all but despaired of finding another study group like the one from Math 55. I began working alone in my room, but I found myself consumed with loneliness. I started trying to work in the dining hall, where at least there were other people around. And while that did make me feel a lot better, the underlying problem of working alone remained.
Though I didn’t really comprehend it at the time, in my other endeavors throughout college, the most pleasant were those which involved a number of other people working towards a common goal. When the Harvard Computer Society’s LDAP server blew up right after I became the Director of Systems, a group of four of us worked nonstop for the next eighteen hours to bring our services back online. I can’t think of a more cherished memory or of a time I felt more accomplished during my college years.
These days, working on Stripe, the vast majority of my day is spent working with other people. We discuss architecture and strategy; we review each others’ code; we hang out and talk about life. We are bound together by a common goal, an overarching purpose, a reason for existence: to fix payments for the web, and in the process build an awesome company. I’ve learned that I absolutely need to be around people in order to be happy, but they must be the right sort of people. In all our hiring, we’re very conscious of this fact, and we make sure to only bring on board people who will contribute in the right way to our culture.
It’s shocking to me how far my views have evolved. Four years ago I was sure that I preferred doing everything on my own, and that having a shared goal with others was just inefficient. Now, I’m sure that I prefer working in a team and being able to share not only the burden and responsibility but also the joy that come from driving towards an goal. I’ll be interested to see what my views are in another four years, but until then, I’ve figured out how to maximize my happiness while working.
by gdb at January 09, 2012 07:35 PM
Have you ever wondered how the codensity transformation, a surprisingly general trick for speeding up the execution of certain types of monads, worked, but never could understand the paper or Edward Kmett's blog posts on the subject?
Look no further: below is a problem set for learning how this transformation works.
The idea behind these exercises is to get you comfortable with the types involved in the codensity transformation, achieved by using the types to guide yourself to the only possible implementation. We warm up with the classic concrete instance for leafy trees, and then generalize over all free monads (don't worry if you don't know what that is: we'll define it and give some warmup exercises).
Experience writing in continuation-passing style may be useful, although in practice this amounts to "listen to the types!"
Solutions and more commentary may be found in Janis Voigtlander's paper "Asymptotic Improvement of Computations over Free Monads."
To read more, see Edward Kmett's excellent article which further generalizes this concept:
If there is a demand, I can add a hints section for the exercises.
{-# LANGUAGE Rank2Types, MultiParamTypeClasses, FlexibleInstances #-}
import Prelude hiding (abs)
_EXERCISE_ = undefined
-----------------------------------------------------------------------------
-- Warmup: Hughes lists
-----------------------------------------------------------------------------
-- Experienced Haskellers should feel free to skip this section.
-- We first consider the problem of left-associative list append. In
-- order to see the difficulty, we will hand-evaluate a lazy language.
-- For the sake of being as mechanical as possible, here are the
-- operational semantics, where e1, e2 are expressions and x is a
-- variable, and e1[e2/x] is replace all instances of x in e1 with e2.
--
-- e1 ==> e1'
-- ---------------------
-- e1 e2 ==> e1' e2
--
-- (\x -> e1[x]) e2 ==> e1[e2/x]
--
-- For reference, the definition of append is as follows:
--
-- a ++ b = foldr (:) b a
--
-- Assume that, on forcing a saturated foldr, its third argument is
-- forced, as follows:
--
-- e1 ==> e1'
-- -----------------------------------
-- foldr f e2 e1 ==> foldr f e2 e1'
--
-- foldr f e2 (x:xs) ==> f x (foldr f e2 xs)
--
-- Hand evaluate this implementation by forcing the head constructor,
-- assuming 'as' is not null:
listsample as bs cs = (as ++ bs) ++ cs
-- Solution:
--
-- (as ++ bs) ++ cs
-- = foldr (:) cs (as ++ bs)
-- = foldr (:) cs (foldr (:) bs as)
-- = foldr (:) cs (foldr (:) bs (a:as'))
-- = foldr (:) cs (a : foldr (:) b as')
-- = a : foldr (:) cs (foldr (:) bs as')
--
-- Convince yourself that this takes linear time per append, and that
-- processing each element of the resulting tail of the list will also
-- take linear time.
-- We now present Hughes lists:
type Hughes a = [a] -> [a]
listrep :: Hughes a -> [a]
listrep = _EXERCISE_
append :: Hughes a -> Hughes a -> Hughes a
append = _EXERCISE_
-- Now, hand evaluate your implementation on this sample, assuming all
-- arguments are saturated.
listsample' a b c = listrep (append (append a b) c)
-- Solution:
--
-- listrep (append (append a b) c)
-- = (\l -> l []) (append (append a b) c)
-- = (append (append a b) c) []
-- = (\z -> (append a b) (c z)) []
-- = (append a b) (c [])
-- = (\z -> a (b z)) (c [])
-- = a (b (c []))
--
-- Convince yourself that the result requires only constant time per
-- element, assuming a, b and c are of the form (\z -> a1:a2:...:z).
-- Notice the left-associativity has been converted into
-- right-associative function application.
-- The codensity transformation operates on similar principles. This
-- ends the warmup.
-----------------------------------------------------------------------------
-- Case for leafy trees
-----------------------------------------------------------------------------
-- Some simple definitions of trees
data Tree a = Leaf a | Node (Tree a) (Tree a)
-- Here is the obvious monad definition for trees, where each leaf
-- is substituted with a new tree.
instance Monad Tree where
return = Leaf
Leaf a >>= f = f a
Node l r >>= f = Node (l >>= f) (r >>= f)
-- You should convince yourself of the performance problem with this
-- code by considering what happens if you force it to normal form.
sample = (Leaf 0 >>= f) >>= f
where f n = Node (Leaf (n + 1)) (Leaf (n + 1))
-- Let's fix this problem. Now abstract over the /leaves/ of the tree
newtype CTree a = CTree { unCTree :: forall r. (a -> Tree r) -> Tree r }
-- Please write functions which witness the isomorphism between the
-- abstract and concrete versions of trees.
treerep :: Tree a -> CTree a
treerep = _EXERCISE_
treeabs :: CTree a -> Tree a
treeabs = _EXERCISE_
-- How do you construct a node in the case of the abstract version?
-- It is trivial for concrete trees.
class Monad m => TreeLike m where
node :: m a -> m a -> m a
leaf :: a -> m a
leaf = return
instance TreeLike Tree where
node = Node
instance TreeLike CTree where
node = _EXERCISE_
-- As they are isomorphic, the monad instance carries over too. Don't
-- use rep/abs in your implementation.
instance Monad CTree where
return = _EXERCISE_
(>>=) = _EXERCISE_ -- try explicitly writing out the types of the arguments
-- We now gain efficiency by operating on the /abstracted/ version as
-- opposed to the ordinary one.
treeimprove :: (forall m. TreeLike m => m a) -> Tree a
treeimprove m = treeabs m
-- You should convince yourself of the efficiency of this code.
-- Remember that expressions inside lambda abstraction don't evaluate
-- until the lambda is applied.
Tample' = treeabs ((leaf 0 >>= f) >>= f)
where f n = node (leaf (n + 1)) (leaf (n + 1))
-----------------------------------------------------------------------------
-- General case
-----------------------------------------------------------------------------
-- Basic properties about free monads
data Free f a = Return a | Wrap (f (Free f a))
instance Functor f => Monad (Free f) where
return = _EXERCISE_
(>>=) = _EXERCISE_ -- tricky!
-- Leafy trees are a special case, with F as the functor. Please write
-- functions which witness this isomorphism.
data F a = N a a
freeFToTree :: Free F a -> Tree a
freeFToTree = _EXERCISE_
treeToFreeF :: Tree a -> Free F a
treeToFreeF = _EXERCISE_
-- We now define an abstract version of arbitrary monads, analogous to
-- abstracted trees. Witness an isomorphism.
newtype C m a = C { unC :: forall r. (a -> m r) -> m r }
rep :: Monad m => m a -> C m a
rep = _EXERCISE_
abs :: Monad m => C m a -> m a
abs = _EXERCISE_
-- Implement the monad instance from scratch, without rep/abs.
instance Monad (C m) where
return = _EXERCISE_
(>>=) = _EXERCISE_ -- also tricky; if you get stuck, look at the
-- implementation for CTrees
-- By analogy of TreeLike for free monads, this typeclass allows
-- the construction of non-Return values.
class (Functor f, Monad m) => FreeLike f m where
wrap :: f (m a) -> m a
instance Functor f => FreeLike f (Free f) where
wrap = Wrap
instance FreeLike f m => FreeLike f (C m) where
-- Toughest one of the bunch. Remember that you have 'wrap' available for the
-- inner type as well as functor and monad instances.
wrap = _EXERCISE_
-- And for our fruits, we now have a fully abstract improver!
improve :: Functor f => (forall m. FreeLike f m => m a) -> Free f a
improve m = abs m
-- Bonus: Why is the universal quantification over 'r' needed? What if
-- we wrote C r m a = ...? Try copypasting your definitions for that
-- case.
by Edward Z. Yang at January 07, 2012 08:00 AM
There are two primary reasons why the low-level implementations of iteratees, enumerators and enumeratees tend to be hard to understand: purely functional implementation and inversion of control. The strangeness of these features is further exacerbated by the fact that users are encouraged to think of iteratees as sinks, enumerators as sources, and enumeratees as transformers. This intuition works well for clients of iteratee libraries but confuses people interested in digging into the internals.
In this article, I’d like to explain the strangeness imposed by the purely functional implementation by comparing it to an implementation you might see in a traditional, imperative, object-oriented language. We’ll see that concepts which are obvious and easy in an imperative setting are less-obvious but only slightly harder in a purely functional setting.
The following discussion uses nomenclature from the enumerator library, since at the time of the writing it seems to be the most popular implementation of iteratees currently in use.
The fundamental entity behind an iteratee is the Step. The usual intuition is that is represents the “state” of an iteratee, which is either done or waiting for more input. But we’ve cautioned against excessive reliance on metaphors, so let’s look at the types instead:
data Step a b = Continue (Stream a -> m (Step a b)) | Yield b type Iteratee a b = m (Step a b) type Enumerator a b = Step a b -> m (Step a b) type Enumeratee o a b = Step a b -> Step o (Step a b)
I have made some extremely important simplifications from the enumerator library, most of important of which is explicitly writing out the Step data type where we would have seen an Iteratee instead and making Enumeratee a pure function. The goal of the next three sections is to explain what each of these type signatures means; we’ll do this by analogy to the imperative equivalents of iteratees. The imperative programs should feel intuitive to most programmers, and the hope is that the pure encoding should only be a hop away from there.
We would like to design an object that is either waiting for input or finished with some result. The following might be a proposed interface:
interface Iteratee<A,B> {
void put(Stream<A>);
Maybe<B> result();
}
This implementation critically relies on the identity of an object of type Iteratee, which maintains this identity across arbitrary calls to put. For our purposes, we need to translate put :: IORef s -> Stream a -> IO () (first argument is the Iteratee) into a purely functional interface. Fortunately, it’s not too difficult to see how to do this if we understand how the State monad works: we replace the old type with put :: s -> Stream a -> s, which takes the original state of the iteratee (s = Step a b) and some input, and transforms it into a new state. The final definition put :: Step a b -> Stream a -> m (Step a b) also accomodates the fact that an iteratee may have some other side-effects when it receives data, but we are under no compulsion to use this monad instance; if we set it to the identity monad our iteratee has no side effects (StateT may be the more apt term here). In fact, this is precisely the accessor for the field in the Continue constructor.
We would like to design an object that takes an iteratee and feeds it input. It’s pretty simple, just a function which mutates its input:
void Enumerator(Iteratee<A,B>);
What does the type of an enumerator have to say on the matter?
type Enumerator a b = Step a b -> m (Step a b)
If we interpret this as a state transition function, it’s clear that an enumerator is a function that transforms an iteratee from one state to another, much like the put. Unlike the put, however, the enumerator takes no input from a stream and may possibly cause multiple state transitions: it’s a big step, with all of the intermediate states hidden from view.
The nature of this transformation is not specified, but a common interpretation is that the enumerator repeatedly feeds an input to the continuation in step. An execution might unfold to something like this:
-- s :: Step a b
-- x0, x1 :: Stream a
case s of
Yield r -> return (Yield r)
Continue k -> do
s' <- k x0
case s' of
Yield r -> return (Yield r)
Continue k -> do
s'' <- k x1
return s''
Notice that our type signature is not:
type Enumerator a b = Step a b -> m ()
as the imperative API might suggest. Such a function would manage to run the iteratee (and trigger any of its attendant side effects), but we’d lose the return result of the iteratee. This slight modification wouldn’t do either:
type Enumerator a b = Step a b -> m (Maybe b)
The problem here is that if the enumerator didn’t actually manage to finish running the iteratee, we’ve lost the end state of the iteratee (it was never returned!) This means you can’t concatenate enumerators together.
It should now be clear why I have unfolded all of the Iteratee definitions: in the enumerator library, the simple correspondence between enumerators and side-effectful state transformers is obscured by an unfortunate type signature:
type Enumerator a b = Step a b -> Iteratee a bOleg’s original treatment is much clearer on this matter, as he defines the steps themselves to be the iteratees.
At last, we are now prepared to tackle the most complicated structure, the enumeratee. Our imperative hat tells us a class like this might work:
interface Enumeratee<O,A,B> implements Iteratee<O,B> {
Enumeratee(Iteratee<A,B>);
bool done();
// inherited from Iteratee<O,B>
void put(Stream<O>);
Maybe<B> result();
}
Like our original Iteratee class, it sports a put and result operation, but upon construction it wraps another Iteratee: in this sense it is an adapter from elements of type O to elements of type A. A call to the outer put with an object of type O may result in zero, one or many calls to put with an object of type A on the inside Iteratee; the call to result is simply passed through. An Enumeratee may also decide that it is “done”, that is, it will never call put on the inner iteratee again; the done method may be useful for testing for this case.
Before we move on to the types, it’s worth reflecting what stateful objects are involved in this imperative formulation: they are the outer Enumeratee and the inner Iteratee. We need to maintain two, not one states. The imperative formulation naturally manages these for us (after all, we still have access to the inner iteratee even as the enumeratee is running), but we’ll have to manually arrange for this is in the purely functional implementation.
Here is the type for Enumeratee:
type Enumeratee o a b = Step a b -> Step o (Step a b)
It’s easy to see why the first argument is Step a b; this is the inner iteratee that we are wrapping around. It’s less easy to see why Step o (Step a b) is the correct return type. Since our imperative interface results in an object which implements the Iteratee<O,B> interface, we might be tempted to write this signature instead:
type Enumeratee o a b = Step a b -> Step o b
But remember; we need to keep track of two states! We have the outer state, but what of the inner one? In a situation similar reminiscent of our alternate universe Enumerator earlier, the state of the inner iteratee is lost forever. Perhaps this is not a big deal if this enumeratee is intended to be used for the rest of the input (i.e. done always returns false), but it is quite important if we need to stop using the Enumeratee and then continue operating on the stream Step a b.
By the design of iteratees, we can only get a result out of an iteratee once it finishes. This forces us to return the state in the second parameter, giving us the final type:
type Enumeratee o a b = Step a b -> Step o (Step a b)
“But wait!” you might say, “If the iteratee only returns a result at the very end, doesn’t this mean that the inner iteratee only gets updated at the end?” By the power of inversion of control, however, this is not the case: as the enumeratee receives values and updates its own state, it also executes and updates the internal iteratee. The intermediate inner states exist; they are simply not accessible to us. (This is in contrast to the imperative version, for which we can peek at the inner iteratee!)
Another good question is “Why does the enumerator library have an extra monad snuck in Enumeratee?”, i.e.
Step a b -> m (Step o (Step a b))My understanding is that the monad is unnecessary, but may be useful if your Enumeratee needs to be able to perform a side-effect prior to receiving any input, e.g. for initialization.
Unfortunately, I can’t claim very much novelty here: all of these topics are covered in Oleg’s notes. I hope, however, that this presentation with reference to the imperative analogues of iteratees makes the choice of types clearer.
There are some important implications of using this pure encoding, similar to the differences between using IORefs and using the state monad:
These assurances would not be possible in the case of simple mutable references. There is one important caveat, however, which is that while the pure component of an iteratee is easily reversed, we cannot take back any destructive side-effects performed in the monad. In the case of forking, this means any side-effects must be atomic; in the case of backtracking, we must be able to rollback side-effects. As far as I can tell, the art of writing iteratees that take advantage of this style is not well studied but, in my opinion, well worth investigating. I’ll close by noting that one of the theses behind the new conduits is that purity is not important for supporting most stream processing. In my opinion, the jury is still out.
by Edward Z. Yang at January 04, 2012 01:00 PM
For the first six months after joining Stripe, I slept on a mattress in the entryway bedroom of our five-bedroom house. We were getting a great bargain on rent, since the landlord was under the mistaken impression it was only a two-bedroom house. But what she didn’t realize was that room with the refrigerator in it is a perfectly valid bedroom. And that space where one might be tempted to put a dining room table or something equally wasteful also made a perfect living space. My room moonlighted as both a vestibule and a living room, which meant that people were always transiting through it, but having grown up next to train tracks I’ve developed the ability to sleep through arbitrary amounts of noise so it wasn’t a big deal. Admittedly, there wasn’t much space in the house to do anything but sleep, but that was all we wanted to use it for in any case.
The vast majority of my waking hours were spent at the office. I’d typically show up at 10 in the morning and leave at 2 (also in the morning). On days when I was feeling lazy and wanted to leave early, I’d usually head out at midnight.
I was loving every moment of this. At last, I was surrounded by exactly the kinds of people I wanted to be surrounded by, all the time. I was doing the kind of work I had been dreaming of doing. And best of all, it all felt like it was building up to some higher purpose, and that every incremental hour I put into Stripe was meaningfully increasing our probability of success.
My first day at work was spent being credentialed. Patrick created user accounts for me on the various third-party services we were using and gave me access on our servers. One thing I noticed was that the shared passwords for some of these third-party services were kept in cleartext on one of these servers. The security nerd in me cried out in protest, and I spent the next day building a service, password-vault, for encrypting shared passwords to people’s GPG keys on the client side and storing the encrypted password on the server. We started using password-vault immediately, and it’s still in use today. It was amazing to me that I could go from identifying a problem needing solving to having an adopted solution in so little time.
I worked in an analogous way on a day-to-day basis. There were a number of systems that clearly needed building, our server architecture needed some serious love, and we did not have T-shirts. Each day, I would decide the order of priority of these problems, and I’d choose when and how they would be solved. Sometimes another Stripe would push a task onto my plate, if it pertained to my area of interest or expertise, and sometimes I’d push a task onto theirs. On the whole, we operated as a decentralized network of independent nodes, with loose central coordination as we pushed for a single higher-level goal.
I think this workflow and work dynamic are really easy to have in a small startup, but they are just as easy to lose over time. As a company grows, communication becomes harder, and even knowing what other people are working on becomes a challenge. A lot of companies try to solve this by pigeonholing their employees into narrower roles, with direction and oversight from a manager. We’ve opted for a different approach, instead continuing to hire people who work well as independent nodes, and to push really hard to make sure people are communicating and have a clear, shared vision of where we need to go.
We don’t all live in the same house anymore, but we try to maintain the same feeling of closeness and camaraderie. One of the standards we apply when interviewing a potential Stripe employee is whether this person is someone who, if you knew he or she was alone in the office, would make you want to go to the office just to be around him or her.
I still spend basically every waking moment at the office working on Stripe. I’ll take a weekend every so often to hack on a side project. But I’ve worked hard to make sure that Stripe remains the most enjoyable and incrementally useful application of my time, and so my days remain happily ensconced in working on my latest project for Stripe. As we continue to grow, I think we’ll have to continue to work hard to maintain our culture and all the properties that make Stripe an awesome place to work, but I also am confident that being cognizant of this challenge is half the battle.
by gdb at December 19, 2011 05:56 PM

Do you remember your first computer program? When you had finished writing it, what was the first thing you did? You did the simplest possible test: you ran it.
As programs increase in size, so do the amount of possible tests. It’s worth considering which tests we actually end up running: imagine the children’s game Battleship, where the ocean is the space of all possible program executions, the battleships are the bugs that you are looking for, and each individual missile you fire is a test you run (white if the test passes, red if the test fails.) You don’t have infinite missiles, so you have to decide where you are going to send them.

In the case of “your first computer program,” the answer seems pretty obvious: there’s only one way to run the program, only a few cases to test.
But this fantasy is quickly blown away by an encounter with real software. Even if your program has no inputs, hardware, operating system, development environment, and other environmental factors immediately increase the space of tests. Add explicit inputs and nondeterminism to the application, and you’re looking at the difference between a swimming pool and an ocean.

How do we decide what to test? What is our strategy—where do we send more missiles, where do we send less? Different testing strategies result in different distributions of tests on the space of all possible executions. Even though we may not be thinking about the distribution of test cases when we write up tests or run the whole system in an integration test, different test strategies result in different coverage.
For example, you might decide not to do any tests, and rely on your users to give you bug reports. The result is that you will end up with high coverage in frequently used areas of your application, and much less coverage in the rarely used areas. In some sense, this is an optimal strategy when you have a large user base willing to tolerate failure—though anyone who has run into bugs using software in unusual circumstances might disagree!

There is a different idea behind regression testing, where you add an automatic test for any bug that occurred in the past. Instead of focusing coverage on frequently used area, a regression test suite will end up concentrated on “tricky” areas of the application, the areas where the most bugs have been found in the past. The hypothesis behind this strategy is that regions of code that historically had bugs are more likely to have bugs in the future.

You might even have some a priori hypotheses about where bugs in applications occur; maybe you think that boundary cases in the application are most likely to have bugs. Then you might reasonable focus your testing efforts on those areas on the outset.

Other testing strategies might focus specifically on the distribution of tests. This is especially important when you are concerned about worst-case behavior (e.g. security vulnerabilities) as opposed to average-case behavior (ordinary bugs.) Fuzz testing, for example, involves randomly spattering the test space without any regard to such things as usage frequency: the result is that you get a lot more distribution on areas that are rarely used and don’t have many discovered bugs.

You might notice, however, that while fuzz testing changes the distribution of tests, it doesn’t give any guarantees. In order to guarantee that there aren’t any bugs, you’d have to test every single input, which in modern software engineering practice is impossible. Actually, there is a very neat piece of technology called the model checker, designed specifically with all manner of tricks for speed to do this kind of exhaustive testing. For limited state spaces, anyway—there are also more recent research projects (e.g. Alloy) which perform this exhaustive testing, but only up to a certain depth.

Model checkers are “dumb” in some sense, in that they don’t really understand what the program is trying to do. Another approach we might take is to take advantage of the fact that we know how our program works, in order to pick a few, very carefully designed test inputs, which “generalize” to cover the entire test space. (We’ll make this more precise shortly.)

The diagram above is a bit misleading, however: test-cases rarely generalize that readily. One might even say that the ability to generalize behavior of specific tests to the behavior of the program is precisely what distinguishes a good program from a bad one. A bad program is filled with many, many different cases, all of which must be tested individually in order to achieve assurance. A good program is economical in its cases, it tries to be as complex as the problem it tries to solve, and no more.

What does it mean to say that a test-case generalizes? My personal belief is that chunks of the test input space which are said to be equivalent to each other correspond to a single case, part of a larger mathematical proof, which can be argued in a self-contained fashion. When you decompose a complicated program into parts in order to explain what it does, each of those parts should correspond to an equivalence partition of the program.
The corollary of this belief is that good programs are easy to prove correct.

This is a long way from “running the program to see if it works.” But I do think this is a necessary transition for any software engineer interested in making correct and reliable software (regardless of whether or not they use any of the academic tools like model checkers and theorem provers which take advantage of this way of thinking.) At the end of the day, you will still need to write tests. But if you understand the underlying theory behind the distributions of tests you are constructing, you will be much more effective.
Postscript. The relationship between type checking and testing is frequently misunderstood. I think this diagram sums up the relationship well:

Types eliminate certain regions of bugs and fail to affect others. The idea behind dependent types is to increase these borders until they cover all of the space, but the benefits are very tangible even if you only manage to manage a subset of the test space.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
by Edward Z. Yang at December 19, 2011 04:04 PM
An “easy”, two-step process:
Hope this helps someone else. In case you were wondering why I was building glibc, it's because I was reporting these two bugs in iconv:
by Edward Z. Yang at December 18, 2011 11:03 PM
For the final project in our theoretical computer science and philosophy class taught by Scott Aaronson, Karen Sittig and I decided to create an interactive demonstration of zero-knowledge proofs. (Sorry, the picture below is not clickable.)

For the actually interactive demonstration, click here: http://web.mit.edu/~ezyang/Public/graph/svg.html (you will need a recent version of Firefox or Chrome, since we did our rendering with SVG.)
by Edward Z. Yang at December 17, 2011 06:56 PM
Someone recently asked on haskell-beginners how to access an lazy (and potentially infinite) data structure in C. I failed to find some example code on how to do this, so I wrote some myself. May this help you in your C calling Haskell endeavours!
The main file Main.hs:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.Types
import Foreign.StablePtr
import Control.Monad
lazy :: [CInt]
lazy = [1..]
main = do
pLazy <- newStablePtr lazy
test pLazy -- we let C deallocate the stable pointer with cfree
chead = liftM head . deRefStablePtr
ctail = newStablePtr . tail <=< deRefStablePtr
cfree = freeStablePtr
foreign import ccall test :: StablePtr [CInt] -> IO ()
foreign export ccall chead :: StablePtr [CInt] -> IO CInt
foreign export ccall ctail :: StablePtr [CInt] -> IO (StablePtr [CInt])
foreign export ccall cfree :: StablePtr a -> IO ()
The C file export.c:
#include <HsFFI.h>
#include <stdio.h>
#include "Main_stub.h"
void test(HsStablePtr l1) {
int x = chead(l1);
printf("first = %d\n", x);
HsStablePtr l2 = ctail(l1);
int y = chead(l2);
printf("second = %d\n", y);
cfree(l2);
cfree(l1);
}
And a simple Cabal file to build it all:
Name: export Version: 0.1 Cabal-version: >=1.2 Build-type: Simple Executable export Main-is: Main.hs Build-depends: base C-sources: export.c
Happy hacking!
by Edward Z. Yang at December 15, 2011 10:18 PM
In Thunderbird, if you read a message on a mailing list and want to reference a post in a blog or elsewhere, you may often want to access a copy of the mailing list posting on the web. While you can often accomplish this by searching for the subject or manually finding it in the specific archives of that list, if you’re using full mail headers, there’s a built-in way to find a message on the web.
Each email or Usenet message has a unique Message-ID. These are indexed at various providers which archive mailing lists, with Gmane being the most notable in the Free Software community.
If you right-click on the Message-ID of a message in Thunderbird, you can choose to “Open Browser with Message-ID”. By default this menu item opens sensible-browser with a Google Groups page, which, while indexing of Usenet, remains ignorant about most mailing lists. Most people reading this will probably find that Gmane carries their favourite mailing lists, while Google does not.
This is configurable, but sadly you have to choose one or another. In Thunderbird, go to Edit → Preferences → Advanced → Config Editor, and set the “mailnews.messageid_browser.url” property to http://mid.gmane.org/%mid. (default of http://groups.google.com/groups?selm=%mid&rnum=1)
by Luke Faraone at December 14, 2011 11:33 PM
I made the decision to drop out of MIT on a Thursday.
I spent most of that Friday meeting with my professors. I wanted to explain to them why they weren’t going to be seeing me in classes any more, and for those whom I knew well to say goodbye. Their reactions were mixed, but all were very heartfelt. My advisor thought I was making a huge mistake, that now was the time for me to be in academia, and that I could always go do a startup in the future. My AI professor was enthusiastically supportive, commenting that now is the time for me to be learning and exploring and that a startup would be a great place to do that. Another of my professors told me about his own experiences, both good and bad, with startups he’d founded; he gave me advice about what to watch out for. I also met with a dean at Student Support Services, who actually submitted my request to go on a leave of absence.
Saturday was the day I packed up my life. I grabbed some boxes from the UPS store, and within the next eight hours there was no sign I had even inhabited my room. For some reason I happen to have a picture of me with said boxes:
Sunday I spent hanging out with the friends I was going to be leaving behind.
And then on Monday, I was on a plane to California.
It struck me as insane how I was able to so completely alter the course of my life in such a short window. Within the span of four days, I had transitioned from being a MIT student, thinking of little more than how I was going to complete the next problem set, to working full time on a startup in Silicon Valley. Usually lives are much less malleable than that. Going abroad after high school, starting college, and transferring to MIT were all major life events that had required months of advance planning. But somehow I managed to pull off the transition smoothly.
Looking back, I’m still quite surprised that I left school at all. I had always pictured myself as the kind of person who wanted absolute certainty in decisions, who would try to gather all available data and spend weeks pondering it before making a decision. But the timescales here had been far too small to make a fully rational decision, and I had made my decision off of the sense that the others at /dev/payments were exactly the kinds of people I had been trying for years to find. For the first time in my life, I had explicitly made a choice based off of gut rather than rationality.
Since then, I think my views on gut instinct have changed significantly. My decision to join Stripe (which is what we renamed /dev/payments to) turned out to be the best decision I’ve made thus far in my life. Looking back at the other life choices I’ve made, I’ve realized that for the most part the outcome after all my thought and information gathering matched my initial gut reactions. I’ve also been running recruiting at Stripe for a while now, and I’ve noticed that my initial feelings about a candidate tend to translate extremely well into whether they end up being a good fit for us or not.
To some extent, this mentality is necessary in a startup. Being able to rapidly make decisions and act quickly is perhaps the chief advantage that you have by being a startup. Knowing when to tune out the desire for full rationality and instead trust one’s instincts is a skill that I think I’m still honing, but all evidence seem to indicate that it’s well worth having.
by gdb at December 12, 2011 12:04 AM
Fall 2010 marked the beginning of my junior year and second semester at MIT. At the end of the summer, I had sat down and planned my goals for the next few years. I had a breakdown for the next semester, the next year, and the next few after that as well, and I had compiled the results in a notes file. (This codification of who I am and what I am trying to accomplish turned out to be hugely beneficial, and since then I’ve made sure to do the same every six months or so.)
My short-term plan was to start working on a research project, start getting more involved in major open-source projects, and build more side projects. On the whole, I wrote, “I think the plan is still to get as good as I can at the things that I do.” I wanted to learn, but I had recognized that class was not the place that I was going to learn how to build.
In the longer term, I wanted to do a startup. I figured it’d be a long time before I was confident enough in my abilities that it would make sense to do so. And so in the meanwhile, I was going to finish up my undergraduate, do a masters at MIT, and then start grad school, where hopefully I would come up with a cool technology to build a company around.
I began my preparations for the semester. I came up with an idea for a research project that I was extremely excited about (static buffer overrun detection), and I recruited three of my friends to work on this with me. And I was going through the course catalog picking out classes for the next semester, I was overjoyed when I noticed a startup class (entitled “Founder’s Journey”). I immediately signed up for the class.
My computer science classes that semester were far better than the ones I had been taking previously, but still they left me hungering for more. I would typically work intensely for a few days finishing the assignments for the next few weeks, which left me a lot of time to work on my own projects.
My research project, unfortunately, never really got off the ground. My friends and I would read papers and meet weekly with a graduate student, but one by one my friends found that they had other obligations which ended up taking priority over the project. By the end of the first month, I was the only one left. I was disappointed, because I had really been looking forward to working with others towards building something awesome, but I continued onwards anyway.
The startup class also ended up falling short of my hopes. I had been expecting it to be a cookbook for how to start a company. I anticipated practical advice, such as here’s when you start talking to a lawyer and this is what you have to watch out for when taking investment. Instead, it was a much fuzzier class than I was looking for, talking a lot about entrepreneurial thinking and reasoning. After a few weeks, I came to the realization that even outside of computers, the things that I wanted to learn were not going to be taught to me in a classroom; I ended up dropping the course.
But I had not given up on learning the parts of doing a startup outside of writing code. I resolved to spend more time meeting people who were working on companies and to talk with them and how they were approaching problems. I also planned to try tracking how these approaches correlated with success of these people’s startups.
So when I received a Facebook message from a person doing a startup who wanted to meet up, I accepted. I had no intention of joining the guy’s company, but I wanted to get started with my practical learning about how to build a startup as soon as possible.
The next day, we met up in the MIT student center. He began to weave the story of the startup he was building. They were building a developer-focused payments company, called /dev/payments at the time. He showed me the product they had built thus far, and demonstrated running a curl request to create a sample payment. I really could not have been more underwhelmed. I didn’t really get it–why was building a curlable API an interesting company? I could build a curlable API from my dormroom.
And then he mentioned the investors who were involved–Peter Thiel, Elon Musk, Sequoia, Y Combinator, and a host others. I hadn’t heard of all the names he mentioned, but I’ll admit that the ones who I had heard of made me rescind my judgement that he was not working on anything interesting. I didn’t have any payments experience, but they had some well-reputed investors who clearly did. Presumably those investors would be able to distinguish just another API from something truly revolutionary. (Incidentally, I’ve since decided that this was wrong–investors are very good at evaluating team and market, but they don’t tend to judge based on product nearly as much as I assumed.)
When I returned to my room, I began searching for all of the information I could find on /dev/payments (which was not much; they had done a pretty good job of staying under the radar) and the founders (of which there was a considerable amount; they had already built and sold a previous company and had a lot of cool code on Github). I also pinged Corey, the only person I knew in the startup scene in Silicon Valley, and asked if he had heard of the company or the founders before. A few days later, he got back to me, saying that he had reached out into his network and heard a bunch of extremely positive things. As well, it turned out the founders and I had some mutual friends; they all confirmed the positivity.
It was at that point that I began seriously considering this startup as a potential place to work rather than just a guinea pig. I accepted an offer to fly out for a weekend. And within five minutes of stepping into their office, I was flattened by the realization that these were the exact kinds of people I wanted to work with. I can’t remember a word of the conversation we had, but I can recall very clearly the feeling of belonging it generated, like I had finally found my people.
Everything about the company seemed perfect. It was still unlaunched, though it had some beta customers who were loving the product. It had only one full-time employee (plus a part-time office manager), and I would be the first engineering hire. And best of all, rather than putting me up in a hotel for the weekend, I was invited to crash with the founders. My ideal picture of a startup had long been finding a group of people who I loved spending time around and focusing singlemindedly on solving a real problem. I wanted to spend my time living and breathing the problem, working and hanging out and existing with my coworkers, and generally letting loose the intensity had been lacking an outlet in school. And it was clear that this company was exactly that.
I went back to school in time for class on Monday and started thinking very hard. I consulted with tons of people. Many thought I was crazy to even think about leaving school for a startup. Many thought I was crazy to leave school for this startup (especially knowing so little about the product). But a few people realized what I was after and saw that I wasn’t finding it at MIT and thought I might not be completely insane.
By Thursday night, I was about ready to make a decision. I was leaning towards, but I wasn’t quite sure I was ready to make the leap. One of the founders happened to be in Boston, and we met up at the Asgard. We spent a long time chatting about the company, about life, about the universe, and really about everything. And I realized then that what really mattered to me and made me willing to take this leap was that it was clear that the people from /dev/payments were looking out for not just their best interests, but my own as well. And so, my fears allayed, I decided to put school on hold and enter the startup world.
by gdb at December 03, 2011 09:53 AM
Things I should be working on: graduate school personal statements.
What I actually spent the last five hours working on: transparent xmobar.

It uses the horrible “grab Pixmap from root X window” hack. You can grab the patch here but I haven’t put in enough effort to actually make this a configurable option; if you just compile that branch, you’ll get an xmobar that is at 100/255 transparency, tinted black. (The algorithm needs a bit of work to generalize over different tints properly; suggestions solicted!) Maybe someone else will cook up a more polished patch. (Someone should also drum up a more complete set of XRender bindings!)
This works rather nicely with trayer, which support near identical tint and transparency behavior. Trayer also is nice on Oneiric, because it sizes the new battery icon sensibly, whereas stalonetray doesn’t. If you’re wondering why the fonts look antialiased, that’s because I compiled with XFT support.
(And yes, apparently I have 101% battery capacity. Go me!)
Update. Feature has been prettified and made configurable. Adjust alpha in your config file: 0 is transparent, 255 is opaque. I’ve submitted a pull request.
by Edward Z. Yang at November 28, 2011 10:09 AM
My first semester at MIT was one of the most eye-opening experiences of my life. Having transferred from Harvard, I felt like I had just moved into either a playground or an alien planet (or possibly an alien playground). There was some amount of becoming acclimated to a new physical campus, a new way of registering for classes (which culminated in me wandering around the Sloan Business School for hours, trying to figure out why no one could find my registration), and a fresh set of faces. But those weren’t the interesting bits. For me, the truly fascinating part of being at MIT was finding all the parts of the college experience I hadn’t even imagined I’d been missing.
For example, I soon realized that each living group had its own persistent subculture, usually with tight alumni ties. One floor in my dorm was full of gamers, another populated by mathematicians. At East Campus, I found two halls with cultures that strongly appealed to me, and I started spending decent amounts of time hanging around those halls trying to passively experience what they had to offer. I avoided the West Campus dorms at all costs, remembering well the experience that had driven me away from MIT in the first place.
In the circles I frequented, geek culture was rampant, if not dominant. It was rare that a conversation did not include a computer science joke of some form. At parties people would shoot each other with nerf guns. I found that with very little effort, I could surround myself by computer scientists and spend all my time living and breathing tech.
My biggest dissatisfaction with the move was that I did not have a research project. Since first semester of freshman year, I had been planning on doing research, but for one reason or another there was always something in the way. Research had always struck me as a fascinating concept (“how could one spin knowledge out of nothing?”), and CS research all the more so (“they’ll let me publish a paper on a coding project I’m hacking on for fun?!?”), and it would also be necessary for me to get into a good graduate school. So I began speaking with professors and found one whom I really liked and who had a project that seemed quite interesting.
About this time, Jessica and I had SIPB chair and vice-chair initiation. This was nothing formal–really just an evening of hanging out with former chairs and vice-chairs and listening to their stories of the past few years. As they told the stories and dispensed advice on running the organization and its projects, I realized that these people were exactly the kinds of people I had transferred to MIT to learn from and surround myself with. I had already known from reputation that they were technically brilliant, but I hadn’t expected they’d be willing to pass on their knowledge, nor that I’d enjoy spending time with them quite so much.
As it turned out, these alumni were all working at a startup called Ksplice, which had been founded by a group of SIPB members a few years earlier. One of the Ksplicers had been trying to recruit me as an intern, but I had fended him off because between research and classes and becoming acclimated to MIT I knew I’d have no time to work at a startup. But after that evening, I knew I had to do whatever it took to work with them. Even if it meant giving up research for the semester–graduate school positioning could wait.
So I told my professor I would not in fact be able to work on the project, and I began working part-time at Ksplice. My initial project was a fancy testing framework for rebootless updates. Over the course of a month, I wrote it in a very magical way, having tons of library functions that would apply heuristics to their arguments to coerce them into the correct types, and I think I even worked in some uses of getattr().
As I submitted my work for review, I felt quite proud of myself. So it was a bit of a shock when my reviewer’s comments came back saying that my code was basically unusable. As I read his comments in detail, I was even more shocked to realize how right he was. He pointed out that my interfaces, by being so broad, made it basically impossible to tell what kinds of arguments they actually accept, much less what they are intended to do. And as well, he pointed out that I should clean up my commit history, which was something that I had not even known was possible. So I spent another week going back and cleaning up my code and revision history, polishing it until no signs of my previous design were visible.
The rest of my time at Ksplice proceeded in much the same manner. (I ended up staying through the summer.) Every time I submitted code for review, every time I interacted with a member of the team, I felt myself learning and gaining new perspective on what it means to write good code. It wasn’t always smooth sailing, but from a technical perspective it was exactly what I had been looking for.
I was at the same time working on projects, sometimes on my own and sometimes with other SIPB members. My largest project of the semester was Anygit, which attempted to reverse the Git object model and support queries mapping a SHA1 to all parent objects or repositories containing that SHA1. I spent many hours architecting and rearchitecting its components, trying to build it in a way that would scale to hold all of the Git objects in the world. I taught myself a lot of Git internals as well as researched various datastores, learned how to diagnose database performance issues, and just generally explored an ecosystem I’d never really touched before.
In comparison, my technical learning through classes was felt paltry. The classes I was taking were good, but they weren’t great. I would voraciously wolf down the material, and find myself disappointed that there was no more to consume. I was required to take the introductory computer science class, 6.01, and though I was fortunate enough to be able to convince the professor to let me get credit for being a Lab Assistant, I still found myself deeply unsatisfied with how I was spending my time academically.
On the whole though, I don’t think I had ever been happier. I had finally found a home, surrounded myself with the kinds of people I had been seeking, and was completely immersed in computer science. I was exploring a new environment, learning the physics of this new world, and at the same time learning more about myself. My one regret was that I still hadn’t picked up a research project, but that’s what next semester was for, right?
by gdb at November 26, 2011 09:12 AM
Someone wrote a Stripe plugin for WordPress. Stripe gets a lot of requests for a WordPress plugin, so I’m pretty excited about it. I’m writing this post to test it out:
by gdb at November 23, 2011 08:01 PM
Once I had made the decision to transfer to MIT, my planning turned from high level strategy to implementation details. First and foremost, I needed a place to live. The MIT housing office sent pointed me to an application form with four dropdowns for you to select your top four dorms. I put East Campus for all four of them, and I added a heartfelt note about how much I wanted to be in EC.
Secondly, I needed to decide how to spend my January. Harvard had just shifted their schedule so that finals were before winter break, adding a J-term that coincided with MIT’s IAP. I looked into doing on some CS research at Harvard, but I realized that all I wanted to with my J-term was start poking around MIT and get acclimated to my new home.
About this time, an email arrived on the hcs-jobs mailing list from an MIT professor looking for some students to build a prototype of his startup idea over IAP. It sounded perfect–I would get to spend time around MIT, I would get to know a professor, and I would do all this while working on a cool project. I sent in an extremely excited application email, and I was overjoyed when I found out that I had been hired.
Once I was fully committed to joining the physical community, I wanted to join the virtual one as well. I started hanging around Zephyr, an MIT-internal chat system. I was amazed at how people whom I had briefly met at SIPB meetings started subscribing to my personal class (kind of the analog of a Facebook wall) and there engaging in fascinating conversations, technical and otherwise.
My MIT housing didn’t kick in until the end of IAP, so my original IAP plan had been to stay in the Harvard dorms, where I had procured special permission to be allowed to remain on campus (due to budget constraints, Harvard was trying to kick everyone out of the dorms for J-term). The day before I was going to head home for Christmas break, the “J-Term Housing Committee” emailed me to say that, because I had no housing for the spring term, and because students were to stay in their spring-time housing, I was actually not allowed to stay in the dorms over J-term.
I began to panic. I thought about appealing the decision, but since I was about to head home, I had no time to do so. I brought up my conundrum on Zephyr, and I was inspired by the number of people who jumped in to say that I should swing by their living group, which would be happy to take me in over IAP. It was at that moment that I knew I had made the right decision in transferring. Harvard had chosen to kick me out even when it had agreed to let me stay, and the decision maker was the faceless “J-Term Housing Committee”. At MIT, my soon-to-be peers were the gatekeepers, and they were all so excited about their living groups that they wanted to show them off to a total stranger.
That night, I stopped by pika for dinner. The residents took me on a tour, and they told me they’d be in touch shortly about whether I could stay over IAP. And sure enough, a few days later, I received an email saying that I was welcome to do so.
My Christmas break was full of SIPB project work. Mitch, a maintainer of the scripts project, set me up with a project with which I found myself enthralled. Mitch taught me various technical skills, such as how to do Fedora packaging, and he was on the whole a font of knowledge and advice–exactly what I had been looking for at MIT. I also started hacking on XVM, trying to make its packages usable outside of MIT’s infrastructure.
Having found myself spending nearly every waking moment working on SIPB projects or Zephyring with members of the SIPB community, I knew that I had to run for an officer position in SIPB whenever the first opportunity to do so would be. I wanted to make a difference, to improve the community as much as it was already improving me, and to help other people join and become as involved as I. I started asking around and was told that officer elections were coming up at the end of January. The one problem–only full members were eligible to run, which would mean I’d have to be elected to membership before then.
Soon the beginning of January rolled around and I headed back to MIT. On weekdays, I’d work 8-10 hours at the professor’s office, go hang out in the SIPB office for another few hours, and then head back to pika. On weekends I got to spend the entire day at SIPB. I made a point of spending some time in the SIPB office every single day. I loved being there, meeting the people I had barely dared to dream existed, and the worst part of every day was when I had to leave the office and head back home.
After a few weeks, I was elected to full membership. I was thrilled–I could now run for office! The next week was officer elections. When nominations for chair were called, someone called out my name. Along with two other candidates, I went up to the front of the room and answered a barrage of questions about how I would run SIPB. We were sent to the hall while the votes were tallied, and Jessica (incidentally the person who had told me to check out pika) ended up winning. The next position was vice-chair. I was again nominated, and this time after the votes were tallied I was shocked to see that I had in fact won.
On this happy note, the next semester began. I had finally found the community I had been looking for. I was working on awesome projects, and once again there were mentors to teach and guide me along the way. And I was at the same time in a position to give back, to help cultivate and shape the community that was at the same time molding me. I had truly found paradise.
by gdb at November 21, 2011 08:41 AM
On Monday, November 21, 2011, all of the scripts.mit.edu servers will be upgraded from Fedora 13 to Fedora 15, which was released on May 25. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information: athena% add scripts
athena% firefox-test
by Alexander Chernyakhovsky at November 06, 2011 02:00 AM
For those of you who do a lot of Debian upgrade path testing…
I got annoyed at equivs not supporting Breaks: lines at all (Debian bug #571638), so I wrote a small shell script to spit out the minimal debhelper 7 rules file-using package. You can then edit debian/control the way you’d edit an equivs control file and run debuild, or you can continue writing a package for actual software, since it’s a real package and not just an equivs control file.
I also got annoyed at apt not really being able to install packages files on local disk, so I wrote an even tinier shell script to add all packages (recursively) in the current directory to an apt repo and add it via the file protocol to sources.list. It turns out dpkg-scanpackages does this, it’s just easier to have the two-line shell script to remember for you what the command is and what the sources.list line should be.
newpackage and fakerepo are in my Athena home directory. See the comments at the top for usage instructions. I might find a better place for these at some point, but I don’t anticipate ever having to update them…
by geofft at August 16, 2011 01:01 AM
I’ve posted the final slides from my talk this year at DEFCON and Black Hat, on breaking out of the KVM Kernel Virtual Machine on Linux.
[Edited 2011-08-11] The code is now available. It should be fairly well-commented, and include links to everything you’ll need to get the exploit up and running in a local test environment, if you’re so inclined.
In addition, as I mentioned, this bug was found by a simple KVM fuzzer I wrote. I’m also going to clean that up and release it, but don’t expect it too soon.
I had a great time meeting lots of interesting people at BlackHat and DEFCON, some that I’d met online and others I hadn’t. If any of you are ever in Boston, drop me a note and we can grab a beer or something.
by nelhage at August 08, 2011 05:32 PM
Our whitelist of file types served directly to the web has for a long time included .doc, .xls, and .ppt. With the advent of new XML-based Microsoft Office formats, and with the popularity of LibreOffice and OpenOffice.org, there have been requests for whitelisting these additional file types. As of yesterday, the new Office XML filetypes — .docx, .xlsx, .pptx, etc. — as well as ODF file types — .odt, .ods, .odp, etc. — will also be served directly to the web.
In addition to files you place in your locker, this also affects files uploaded to your website via the standard upload feature of apps such as MediaWiki and WordPress.
The full list of whitelisted extensions is posted in our FAQ.
by geofft at June 09, 2011 04:24 AM
The answer to this was much less obvious than I expected it to be, so I figured I’d blog it. If you’re using Google’s ClientLogin API for turning a username and a password into an auth token, what do you do with the token? The result of POSTing to /accounts/ClientLogin gets you three values, and Google says to “reference” the Auth=foo line and discard the rest.
The answer is to send back a header of the form “Authorization: GoogleLogin auth=foo”. (That is to say, your authentication token doesn’t go anywhere in the POST data, which is what I tried for a while…)
Incidentally, I’m doing this because I’m playing with Google Cloud Print. I successfully created a cloud printer printed a PDF from my phone to, uh, a shell wget command. More on that later.
by geofft at May 13, 2011 06:41 AM
If you program in Python, you’re probably familiar with the
pickle serialization library, which provides for efficient
binary serialization and loading of Python datatypes. Hopefully,
you’re also familiar with the warning printed prominently near the
start of pickle‘s documentation:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Recently, however, I stumbled upon a project that was accepting and unpacking untrusted pickles over the network, and a poll of some friends revealed that few of them were aware of just how easy it is to exploit a service that does this. As such, this blog post will describe exactly how trivial it is to exploit such a service, using a simplified version of the code I recently encountered as an example. Nothing in here is novel, but it’s interesting if you haven’t seen it.
The vulnerable code was a Twisted server that listened over SSL. The code looked roughly like the following:
class VulnerableProtocol(protocol.Protocol): def dataReceived(self, data): # Code to actually parse incoming data according to an # internal state machine # If we just finished receiving headers, call verifyAuth() to check authentication def verifyAuth(self, headers): try: token = cPickle.loads(base64.b64decode(headers['AuthToken'])) if not check_hmac(token['signature'], token['data'], getSecretKey()): raise AuthenticationFailed self.secure_data = token['data'] except: raise AuthenticationFailed
So, if we just send a request that looks something like:
AuthToken: <pickle here>
The server will happily unpickle it.
So, what can we do with that? Well, pickle is supposed to allow us
to represent arbitrary objects. An obvious target is Python’s
subprocess.Popen objects — if we can trick the target
into instantiating one of those, they’ll be executing arbitrary
commands for us! To generate such a pickle, however, we can’t just
create a Popen object and pickle it; For various mostly-obvious
reasons, that won’t work. We could read up on the “pickle” format and
construct a stream by hand, but it turns out there is no need to.
pickle allows arbitrary objects to declare how they should be
pickled by defining a __reduce__ method, which should
return either a string or a tuple describing how to reconstruct this
object on unpacking. In the simplest form, that tuple should just
contain
pickle will pickle each of these pieces separately, and then on
unpickling, will call the callable on the provided arguments to
construct the new object.
And so, we can construct a pickle that, when un-pickled, will execute
/bin/sh, as follows:
import cPickle import subprocess import base64 class RunBinSh(object): def __reduce__(self): return (subprocess.Popen, (('/bin/sh',),)) print base64.b64encode(cPickle.dumps(RunBinSh()))
At this point, we’ve basically won. We can run arbitrary shell commands on the target, and there are any number of ways we could bootstrap from here up to an interactive shell and whatever else we might want.
For completeness, I’ll explain what I did, since it’s a moderately
cute trick. subprocess.Popen lets us select which file descriptors
to attach to stdin, stdout, and stderr for the new process by passing
integers for the stdin and similarly-named arguments, so we can open
our /bin/sh on arbitrarily-numbered fd’s.
However, as mentioned above, the target server uses Twisted, and it serves all requests in the same thread, using an asynchronous event-driven model. This means we can’t necessarily predict which file descriptor on the server will correspond to our socket, since it depends on how many other clients are connected.
It also means, however, that every time we connect to the server, we’ll open a new socket inside the same server process. So, let’s guess that the server has fewer than, say, 20 concurrent connections at the moment. If we connect to the server’s socket 20 times, that will open 20 new file descriptors in the server. Since they’ll get assigned sequentially, one of them will almost certainly be fd 20. Then, we can generate a pickle like so, and send it over:
import cPickle import subprocess import base64 class Exploit(object): def __reduce__(self): fd = 20 return (subprocess.Popen, (('/bin/sh',), # args 0, # bufsize None, # executable fd, fd, fd # std{in,out,err} )) print base64.b64encode(cPickle.dumps(Exploit()))
We’ll open a /bin/sh on fd 20, which should be one of our 20
connections, and if all goes well, we’ll see a prompt printed to one
of those. We’ll send some junk on that fd until we manage to get the
original server to error out and close the connection, and we’ll be
left talking to /bin/sh over a socket. Game over.
Again, nothing here should be novel, nor would I expect any of these
pieces to take a competent hacker more than few minutes to figure out,
given the problem. But if this blog post teaches someone not to use
pickle on untrusted data, then it will be worth it.
by nelhage at March 20, 2011 10:38 PM
reptyr (announced recently on this blog) takes a
process that is currently running in one terminal, and transplants it
to a new terminal. reptyr comes from a proud family of similar
hacks, and works in the same basic way: We use ptrace(2)
to attach to a target process and force it to execute code of our own
choosing, in order to open the new terminal, and dup2(2) it over
stdout and stderr.
The main special feature of reptyr is that it actually changes the
controlling terminal of the target process. The “controlling terminal”
is a concept maintained by UNIX operating systems that is independent
of a process’s file descriptors. The controlling terminal governs
details like where ^C gets delivered, and how applications are
notified of changes in window size.
Processes are grouped into two levels of hierarchical groups: sessions, and process groups. Each group is named by an ID, which is the PID of the initial leader (either “session leader” or “process group leader”). Even if the leader exits, that number is still the ID for the group. Sessions are used for terminal management — Every process in a session has the same controlling terminal, and each terminal belongs to at most one session. Process groups are a sub-division within sessions, and are used primarily for job control within the shell. For a more in-depth explanation, see part 3 of my earlier series on termios.
If you check out tty_ioctl(4), you’ll find that Linux has an
ioctl, TIOCSCTTY, that can be used to set the controlling terminal
of a process, and you could be forgiven for thinking that all we need
is to make the target call that ioctl, and we’re done.
However, if we read closer, we find that it has several restrictions. In particular:
The calling process must be a session leader and not have a controlling terminal already. If this terminal is already the controlling terminal of a different session group then the ioctl fails with EPERM […]
In the typical case, where I’m trying to attach a (say) mutt that
you spawned from your shell, mutt won’t be a session leader — your
shell will be the session leader, and mutt will be the process group
leader for a process group containing only itself.
So, we need to make the target a session leader. Conveniently, there’s
a system call for that: setsid(2).
However, reading that man page, we find a new caveat: setsid(2)
fails with EPERM if
The process group ID of any process equals the PID of the calling process. Thus, in particular, setsid() fails if the calling process is already a process group leader.
The shell creates a new process group for every job you launch, and so
our target mutt will be a process group leader, and unable to
setsid(). The usual solution for programs that want to setsid is to
fork(), so that the child is still in the parent’s session and
process group, and then setsid() in the child. However, fork()ing
our mutt and killing off the parent seems potentially disruptive, so
let’s see if we can avoid that.
So, we’re going to need to change mutt‘s process group ID, so that
there are no processes with process group IDs equal to its
PID. Following some trusty SEE ALSO links, we get to
setpgid(2). There’s a bunch of text in that man page, but the key
bit is:
If setpgid() is used to move a process from one process group to another, both process groups must be part of the same session (see setsid(2) and credentials(7)). In this case, the pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.
We need to find a process group in the same session as mutt to move
our mutt into, and then we’ll be able to setsid. We could try to
find one — the shell is a plausible candidate, for instance — but
there’s an alternate, more direct route: Create one.
While we have mutt captured with ptrace, we can make it fork(2)
a dummy child, and start tracing that child, too. We’ll make the child
setpgid to make it into its own process group, and then get mutt
to setpgid itself into the child’s process group. mutt can then
setsid, moving into a new session, and now, as a session leader, we
can finally ioctl(TIOCSCTTY) on the new terminal, and we win.
It turns out I didn’t invent this technique — injcode and neercs work the same way. But I did discover it independently of them, and it was a fun little hunt through unix arcana.
by nelhage at February 09, 2011 03:06 AM
Over the last week, I’ve written a nifty tool that I call reptyr. reptyr is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don’t want to interrupt it? Just start a screen, use reptyr to grab it, and then kill the ssh session and head on home.
You can grab the source, or read on for some more details.
There’s a shell script called screenify that’s been going
around the internet for nigh on 10 years now that is supposed to use
gdb to accomplish the same thing. There’s also a project called
retty that tries to do the same thing, in C using ptrace()
directly.
The difference between those programs and reptyr is that reptyr works much, much, better.
If you attach a less using screenify or retty, it will still take
input from the old terminal. If you attach an ncurses program, and
resize the window, the program probably won’t resize correctly. ^C
and ^Z will still be processed on the old terminal — typing them in
the new terminal won’t do anything useful.
reptyr fixes all of these problems and more, and is the only such tool I know of that does so. I’ve never seen a program that doesn’t behave noticeably incorrectly after attaching with retty or screenify, whereas with reptyr most programs I have tried work flawlessly.
reptyr works in the same basic way as screenify and retty — it
attaches to the target process using the ptrace API, opens the new
terminal, and dup2s it over the old file descriptors. It also copies
the termios settings from the old terminal to the new terminal.
The main thing that reptyr does that no one else does is that it
actually changes the controlling terminal of the process you are
attaching. This is the detail that makes many things Just Work,
including ^C and ^Z and window resizing.
Switching the target’s controlling terminal is not easy and involves a
fair bit of trickery with ptrace and Linux’s terminal APIs. I will
probably do another blog post some time about the dirty details of how
I make this work, but for now you can check out
attach.c if
you really want to know.
reptyr still has a number of limitations — it doesn’t generally work, for example, if the target process has any children. I know how to fix most of these problems, though, so expect it to get better with time. Please let me know if you find it useful!
(Edited to add:) Nothing is really new. A commenter on reddit pointed out that injcode
and neercs both accomplish the same thing, even using the same trick
to change the CTTY. Ah well, I had run writing it anyways, and apparently I
wasn’t the only one who didn’t know about the existing alternatives. neercs is a full screen replacement, though, and I think that reptyr should be more robust than injcode — I use a different techique for ptrace-hijacking, for example — and so hopefully this tool still has a niche as a more robust standalone utility. Certainly, judging from the amount of enthusiasm I’ve seen for this tool, this still isn’t a problem that is solved to the average user’s satisfaction.
by nelhage at January 22, 2011 01:56 AM
I’ve spent a lot of time this last week staring at decompiled Dalvik assembly. In the process, I created a couple of useful tools that I figure are worth sharing.
I’ve been using dedexer instead of baksmali, honestly mainly because the former’s output has fewer blank lines and so is more readable on my netbook’s screen. Thus, these tools are designed to work with the output of dedexer, but the formats are simple enough that they should be easily portable to smali, if that’s your tool of choice (And it does look like a better tool overall, from what I can see).
ddx.elI’m an emacs junkie, and I can’t stand it when I have to work with a
file that doesn’t have an emacs mode. So, a day into staring at
un-highlighted .ddx files in fundamental-mode, I broke down and
threw together ddx-mode. It’s fairly minimal, but it
provides functional syntax highlighting, and a little support for
navigating between labels. One cute feature I threw in is that, if you
move the point over a label, any other instances of that label get
highlighted, which I found useful in keeping track of all the “lXXXXX”
labels dedexer generates.
ddx2dotDalvik assembly is, on the whole pretty easy to read, but occasionally you stumble on huge methods that clearly originated from multiple nested loops and some horrible chained if statements. And what you’d really like is to be able to see the structure of the code, as much as the details of the instructions.
To that end, I threw together a Python script that “parses” .ddx
files, and renders them to a control-flow graph using dot. As
an example, the parseToken method from the IMAP parser
in the k9mail application for Android looks like the following,
when disassembled and rendered to a CFG:
I use the term “parses” because it’s really just a pile of regexes, line.split() and line.startswith("..."), but it gets the job done, so I hope it might be of use to someone else. The biggest missing feature is that it doesn’t parse catch directives, so those just end up floating out to the side as unattached blocks.
You’ll also notice the rounded “return” blocks — either javac or dx merges all exits from a function to go through the same return block, but I found that preserving that feature in the CFG produces a lot of clutter and makes it hard to read, so I lift every edge that would go to that common block to go to a separate block.
Both tools live in my “reverse-android” repository on github, and are released under the MIT license. Please feel free to do whatever you want with them, although I’d appreciate it if you let me know if you make any improvements or find them useful.
by nelhage at December 27, 2010 08:26 PM
Dan Rosenberg recently released a privilege escalation bug for Linux, based on three different kernel vulnerabilities I reported recently. This post is about CVE-2010-4258, the most interesting of them, and, as Dan writes, the reason he wrote the exploit in the first place. In it, I’m going to do a brief tour of the various kernel features that collided to make this bug possible, and explain how they combine to turn an otherwise-boring oops into privilege escalation.
access_okWhen a user application passes a pointer to the kernel, and the kernel wants to read or write from that pointer, the kernel needs to perform various checks that a buggy or malicious userspace app hasn’t passed an “evil” pointer.
Because the kernel and userspace run in the same address space, the most important check is simply that the pointer points into the “userspace” part of the address space. User applications are protected by page table permissions from writing into kernel memory, but the kernel isn’t, and so must explicitly check that any pointers given to it by a user don’t point into the kernel region.
The address space is laid out such that user applications get the
bottom portion, and the kernel gets the top, so this check is a simple
comparison against that boundary. The kernel function that performs
this check is called access_ok, although there are various other
functions that do the same check, implicitly or otherwise.
get_fs() and set_fs()Occasionally, however, the kernel finds it useful to change the rules for what
access_ok will allow. set_fs()1 is an internal Linux function that is used to
override the definition of the user/kernel split, for the current process.
After a set_fs(KERNEL_DS), no checking is performed that user pointers
point to userspace — access_ok will always return
true. set_fs(KERNEL_DS) is mainly used to enable the kernel to wrap
functions that expect user pointers, by passing them pointers into the
kernel address space. A typical use reads something like this:
old_fs = get_fs(); set_fs(KERNEL_DS);
vfs_readv(file, kernel_buffer, len, &pos);
set_fs(old_fs);
vfs_readv expects a user-provided pointer, so without the set_fs(), the
access_ok() inside vfs_readv() would fail on our kernel buffer, so we use
set_fs() to effectively temporarily disable that checking.
When the kernel oopses, perhaps because of a NULL pointer
dereference in kernelspace, or because of a call to the BUG() macro
to indicate an assertion failure, the kernel attempts to clean up, and
then tries to kill the current process by calling the do_exit()
function to exit the current process.
When the kernel does so, it’s still running in the same process
context it was before the oops occured, including any set_fs()
override, if applicable. Which means that do_exit will get called
with access_ok disabled — not something anyone expected when they
wrote the individual pieces of this system.
clear_child_tidAs it turns out, do_exit contains a write to a user-controlled
address that expects access_ok to be working properly!
clear_child_tid is a feature where, on thread exit, the kernel can
be made to write a zero into a specified address in that thread’s
address space, in order to notify other threads of that exit.
This is implemented by simply storing a pointer to the to-be-zeroed
address inside struct task_struct (which represents a single thread
or process), and, on exit, mm_release, called from do_exit, does:
put_user(0, tsk->clear_child_tid);
This is normally safe, because put_user checks that its second
argument falls into the “userspace” segment before doing a write. But,
if we are running with get_fs() == KERNEL_DS, it will happily accept
any address at all, even one pointing into kernel space.
So, if we find any kernel BUG() or NULL dereference, or other page
fault, that we can trigger after a set_fs(KERNEL_DS), we can trick
the kernel into a user-controlled write into kernel memory!
splice() et. al.An obvious question at this point is: How much of the kernel can an
attacker cause to run with get_fs() == KERNEL_DS?
There are a number of small special cases. For example, the binary
sysctl compatibility code works by calling the normal /proc/ write
handlers from kernelspace, under set_fs(). handful of compat-mode
(32 on 64) syscalls work similarly.
By far the biggest source I’ve found, however, is the splice()
system call. The splice() system call is a relatively recent
addition to Linux, and allows for zero-copy transfer of pages between
a pipe and another file descriptor.
As of 2.6.31, attempts to splice() to or from an fd that doesn’t
support special handling to actually do zero-copy splice, will fall
back on doing an ordinary read(), write(), or sendmsg() on the
fd … from the kernel, using set_fs() in order to pass in kernel
buffers.
What that means it that by using splice(), an attacker can call the
bulk of the code in most obscure filesystems and socket types (which
tend not to have explicit splice() support) with a segment override
in place. Conveniently for an attacker, that is also exactly a
description of where the bulk of the random security bugs tend to be.
This is also exactly the technique Dan’s exploit uses. He uses
CVE-2010-3849, an otherwise boring NULL pointer dereference I
reported in the Econet network protocol. His exploit code does a
splice() to an econet socket, causing the econet_sendsmg handler to
get called under set_fs(KERNEL_DS). When it oopses, do_exit is
called, and he gets a user-controlled write into kernel
memory. Everything else is just details.
1 Back in Linux 1.x, this function actually set the %fs register on i386. It hasn’t in years, but it’s used in too many places for changing the name to be worth it.
by nelhage at December 10, 2010 04:02 PM
As much as I hate to admit it, mathematicians tend to deal with approximations. A lot of times, formulas are just too complicated to deal with the full complicated formula, and you have to simplify it. So here’s some handy approximations, as well as about where they’re valid.
for 
for
, and in particular
and 
for 
for
, especially for 

, where
is the Euler-Mascheroni constant
, where
is the number of primes less than 
, where
is the
th prime number.
What do you think every mathematician should know?
by Patrick Hurst at December 01, 2010 09:08 PM
Most of you reading this blog probably remember CVE-2010-3081. The bug got an awful lot of publicity when it was discovered an announced, due to allowing local privilege escalation against virtually all 64-bit Linux kernels in common use at the time.
While investigating CVE-2010-3081, I discovered that several of the commonly-believed facts about the CVE were wrong, and it was even more broadly exploitable than was publically documented. I’d like to share those observations here.
The bug arose from the compat_alloc_user_space function in Linux’s
32-bit compatibility support on 64-bit
systems. compat_alloc_user_space allocates and returns space on the
userspace kernel stack for the kernel to use:
static inline void __user *compat_alloc_user_space(long len)
{
struct pt_regs *regs = task_pt_regs(current);
return (void __user *)regs->sp - len;
}
This function is only called by compat-mode syscalls, so current is assumed to
be a 32-bit process, in which case regs->sp, the user stack pointer, will be a
32-bit quantity. This, if we subtract a small len, the result should still fit
in 32 bits, which, on a 64-bit system means it is guaranteed to fall within the
user address space.
Because of this, some callers of compat_alloc_user_space were lazy, and did
not call access_ok (or a function which called access_ok) to check that the
result of compat_alloc_user_space fell within the user address space.
However, it turned out that some call sites in the kernel called
compat_alloc_user_space with a user-controlled len value, allowing the
subtraction to wrap around. On a 64-bit system, the kernel lives in the top four
gigabytes of memory, and so this wraparound is enough for a user to cause
compat_alloc_user_space to return a pointer into the kernel’s address space.
Moreover, it turned out that the functions that used a user-controlled len
also did not check access_ok on the result of the allocation. In particular,
Linux 2.6.26 introduced the compat_mc_getsockopt function, which called
compat_alloc_user_space with a user-controlled length and then copied
user-controlled data to this pointer. It is this function which the public
exploit targetted.
When an exploit was released for this bug, many sources circulated a mitigation: Disable 32-bit binaries on a system. Prevent compat-mode processes from running, the logic goes, and you prevent anyone from making a compat-mode syscall that triggers the vulnerable path.
This mitigation indeed prevented the public exploit from working (it included 32-bit inline assembly, and so couldn’t even easily be recompiled as a 64-bit binary), and many observers seemed to believe it closed the bug entirely.
However, this was not the case! It turns out, on an amd64 system, a
64-bit process can still make a compat-mode system call using the int
$0x80 instruction, which is the traditional 32-bit syscall mechanism!
Even though the process is running in 64-bit mode, int $0x80
redirects to the compat-mode syscall table.
After realizing this, modifying the public exploit to work when compiled in 64-bit mode was a simple matter of porting the inline assembly, and changing a small handful of types. I’ve posted the modified exploit and the diff against the original for the curious.
Once you’ve realized that you can make compat-mode system calls from a 64-bit
process, a little bit of thought reveals something else
interesting. compat_alloc_user_space subtracts the len value off of the
userspace stack pointer. Previously, we relied on subtracting a large value from
a 32-bit stack pointer in order to end up with a kernel pointer. However, while
a 32-bit is limited to a 32-bit stack pointer, a 64-bit process can write a full
64-bit value into %rsp, and thus regs->sp! There’s no need for underflow at
all — you can just write a 64-bit value into %rsp and do an int $0x80, and
make compat_alloc_user_space return any value you please!
The condition for exploitability thus drops from “user-controlled
len and no access_ok” to simply “no access_ok“.
This is interesting, because it turns out that some very old kernels, before 2.6.11, including RHEL 4, have the following function:
int siocdevprivate_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
{
struct ifreq __user *u_ifreq64;
...
u_ifreq64 = compat_alloc_user_space(sizeof(*u_ifreq64));
/* Don't check these user accesses, just let that get trapped
* in the ioctl handler instead.
*/
copy_to_user(&u_ifreq64->ifr_ifrn.ifrn_name[0], &tmp_buf[0], IFNAMSIZ);
__put_user(data64, &u_ifreq64->ifr_ifru.ifru_data);
return sys_ioctl(fd, cmd, (unsigned long) u_ifreq64);
}
Remember, we can make compat_alloc_user_space return an arbitrary
value. The copy_to_user will call access_ok and fail, but that
return value will be discarded, and the __put_user will scribble 32
bits of user-controlled data at a user-controlled address. Bingo,
local root.
It turns out this function was present in Linux 2.4.x, too, meaning that this exploit even affected RHEL3 and anyone else still running a 2.4-based system!
Based on this exploit, I’ve produced a working proof-of-concept exploit for RHEL4, based on the released exploit for RHEL5. Contact me if you’re interested, but it’s pretty straightforward.
As far as I know, neither of these facts has been publically documented prior to this post. I shared this information with Red Hat, and they requested I keep it private until they released fixes for RHEL 3, which happened last week. I would not be at all surprised to learn that someone else has private exploits that incorporate either or both of these observations, though.
One important moral here is you must be very careful when declaring a system unaffected by a vulnerability, or declaring a mitigation to be complete. Software systems have gotten tremendously complex, and it’s often impossible to be totally confident you understand every last way an attacker could tickle a vulnerability.
by nelhage at November 30, 2010 04:58 PM
Much has been made in the news recently about Facebook’s relative lack of privacy controls, and the degree to which they’re hidden and made unintuitive to use. Naturally, people have been speculating about why they do this. The intuitive answer, and the one that I’ve heard a lot of people claim, is that it allows them to sell data to advertisers; the reasoning goes that they can’t sell your data to third parties and make money off of you if your profile settings are all set to private. But as far as I know this isn’t the case; I’ve got all my Facebook data pretty locked down; the only information about me if you’re not my friend is my name and a picture of me, which I figure is enough to allow people I know to friend me while being certain that they’re getting the right me. Yet I still get ads that I know are targeted to me because they mention my college, my location, etc.
Instead, I suspect that the real money that Facebook makes off of people who don’t have their information private is off advertising impressions. A lot of people, when they want to find more about someone, will immediately check Facebook to see if they have a profile, and if they do, they’ll spend a while ‘stalking’ them on it. If the ‘stalkee’ has their information private and the stalker isn’t friends with them, then they have to send a friend request, which means that most of the time they’ll back off. But if the stalkee’s information is public, then the stalker can spend large amounts of time looking at their information. Which means large amounts of pageviews, and large amounts of advertisements being displayed to them, which means more money for Facebook. And I think that that’s one point that a lot of people miss when they write about Facebook; not only do they want you to put all of your information on there so they can sell it, they want you to put your information on there so that other people will spend time on their site looking at your information.
by Patrick Hurst at November 22, 2010 11:34 PM
I’ve recently started playing with scons a little for some small
personal projects. It’s not perfect, but I’ve rapidly come to the
conclusion that it’s a probably far better choice than make in many
cases. The main exceptions would be cases where you need to integrate
into legacy build systems, or if asking or expecting developers to
have scons installed is unreasonable for some reason.
The main reason that scons is cool to me, and the thing that makes
it fundamentally different from make, is the introduction of actual
scoping.
make has a single global scope. This is one of the main reasons that
people write recursive Makefiles; By giving you one file per
directory, you get one scope per directory, which makes it possible to
have per-directory pattern rules, variables, and all that other stuff,
without driving yourself insane.
make‘s awful syntax, confusing varieties of variable,
whitespace-sensitivity, and all the other things that people love to
bitch about are annoying, but to my mind, the single scope that makes
recursive Makefiles the dominant (and, really, the only scalable)
paradigm is the one thing that really sucks.
scons solves this by baking various kinds of scoping into the
tool. scons lets you include sub-build-scripts (typically named
SConscript, by convention). Those scripts run in their own namespace
and can establish their own variables, rules, etc., but the end result
is then merged back into the global rule list (handling sub-directory
paths intelligently), so that the scheduler can work globally, instead
of having to recurse.
Furthermore, because of this explicit scoping, you can pass variables,
including targets, between build files, letting you explicitly set up
cross-directory dependencies or share CFLAGS or other variables,
making it easy for different directories to share exactly as much or
as little configuration as you want.
In addition, scons has the concept of “build environments”, which
are objects that include build rules, variables, and so on. By
reifying what make just represents as the global environment into
objects, it makes it much easier to scope and program things. For
example, if you have a set of targets that should be built using the
global default rules, except with debugging enabled, you can do:
myenv = env.Clone()
myenv.Append(CFLAGS = ['-g'])
myenv.Program(...)
By making it (optionally) explicit which sets of rules and variables
are being used in each place, it becomes much easier to share multiple
kinds of targets and rule sets in a single file, without necessitating
lots of sub-files just for scoping, like make tends to lead to.
scons is cool for a bunch of reasons. It eliminates most of the
stupid little annoyances you’ve probably had with make. But, in my
mind, this is the thing that makes it cool. They’ve added sane scoping
to the build tool, so that you can construct non-recursive build
systems without going insane.
I’ll definitely be considering scons for any new projects I write going
forward. I hate make, and this definitely feels like a path forward.
by nelhage at November 07, 2010 10:00 PM
I love VMware workstation. I keep VMs around for basically every version of every major Linux distribution, and use them heavily for all kinds of kernel testing and development.
This post is a quick writeup of my networking setup with VMware Workstation, using dnsmasq to assign my VMs addresses and provide a DNS server to resolve VM addresses.
I want to be able to resolve my VM’s hostnames so that I can ssh to
them, or run other network services and access them from the host. I
could just assign static addresses and put them in /etc/hosts, but
that’s totally lame, and liable to be a source of error and
frustration, because I have dozens of VMs, and add and remove them
frequently.
We’re going to set things up so that when VMs get addresses from DHCP,
their hostnames automatically become resolvable, using the .vmware
domain. To do this, we’re going to set up a piece of software called
dnsmasq, which is a flexible DNS and DHCP server, designed for
basically exactly this purpose.
Because I use my VMs for local testing, I just keep most of them on a
local NAT on my machine. I configure that virtual network inside
VMware as follows (run vmware-netconfig, or follow the appropriate
menus):
Note how I disable “Use local DHCP service to distribute IP addresses to VMs” — we’re going to set up dnsmasq to prove DHCP, so we don’t want it fighting with VMware’s.
Notice that the subnet I’m using here is 172.16.37.* — if you
choose a different one, you’ll need to adjust accordingly later.
dnsmasqThen, I install dnsmasq, and configure /etc/dnsmasq.conf as
follows:
listen-address=172.16.37.1
listen-address=127.0.0.1
no-dhcp-interface=lo
server=192.168.1.1
local=/vmware/
no-hosts
no-resolv
domain=vmware
dhcp-fqdn
dhcp-range=172.16.37.3,172.16.37.200,12h
dhcp-authoritative
dhcp-option=option:router,172.16.37.2
Here’s what each of those lines mean, in order:
listen-address=172.16.37.1
listen-address=127.0.0.1
no-dhcp-interface=lo
We don’t want dnsmasq serving DHCP or DNS to the outside world or
other virtual networks, so we only tell it to listen on the local
interface — so that we can talk to it from the host — and to the
virtual network we set up in the previous step. We don’t want it
serving DHCP to localhost, though, so we tell it not to.
server=192.168.1.1
local=/vmware/
Here we tell dnsmasq how to forward DNS requests to the outside
world. We’re going to be using dnsmasq as our primary nameserver,
and having it forward requests for things it doesn’t understand to a
real DNS server. In my case, that’s my LAN’s router, at
192.168.1.1. The local line tells dnsmasq that the .vmware
domain is local, and it should never forward requests to resolve
things in that domain.
If I needed something more complicated, it might be possible to use
the resolv-file option or similar, but I don’t, personally.
no-hosts
no-resolv
These options tell dnsmasq not to look at resolv.conf or
/etc/hosts when resolving names — we want it only to resolve VMs
itself, and to forward everything else.
domain=vmware
dhcp-fqdn
This tells dnsmasq to assign the .vmware domain to hosts it hands
out DHCP to, so that we can resolve VMs in the .vmware domain.
dhcp-range=172.16.37.3,172.16.37.200,12h
dhcp-authoritative
And finally, we configure the DHCP server. We give it a range of
addresses to assign on the subnet we created earlier. I stop at
.200, so that I can leave the last few open for static IPs if I need
for some reason, and we start at .3 — .1 is the host, and .2 is
the address of VMware’s router. dhcp-authoritative enables some
optimizations when dnsmasq knows it is the only DHCP server around.
dhcp-option=option:router,172.16.37.2
Finally, we need dhcp-option to tell DHCP clients to use the
VMware-provided router at .2 as their gateway, instead of using the
host, at .1. We could configure the host to be a NAT server using
Linux’s NAT, but that’s outside the scope of this document.
Now, we need to configure the host to use dnsmasq as our DNS
server. This is a simple matter of telling the host to use 127.0.0.1
as our DNS server, and to add .vmware to our search path. If we’re
editing resolv.conf directly, it would look like:
search vmware
nameserver 127.0.0.1
We need to configure our guests to send a hostname along with their
DHCP requests, so that dnsmasq can add them to its address
table. How to do this varies by OS, but most modern OSes do it
automatically. If they don’t, here are a few hints:
For RHEL-based distros, edit /etc/sysconfig/network-scripts/ifcfg-INTERFACE, and add a line like
DHCP_HOSTNAME=centos-5-amd64
For most other Linux distributions, you can often edit dhclient.conf
(usually in /etc/ or /etc/dhclient/) to include:
send host-name "centos-5-amd64";
Or, with a recent dhclient,
send host-name "<hostname>";
will make it look up the machine’s actual hostname.
That’s all there is to it. This is a pretty simple setup, but
hopefully someone else will find this useful. If you need dnsmasq to
do something more subtle, the documentation is mostly quite
good.
by nelhage at October 25, 2010 03:15 AM
A common problem in software engineering is avoiding confusion and errors when dealing with multiple types of data that share the same representation. Classic examples include differentiating between measurements stored in different units, distinguishing between a string of HTML and a string of plain text (one of these needs to be encoded before it can safely be included in a web page!), or keeping track of pointers to physical memory or virtual memory when writing the lower layers of an operating system’s memory management.
Unless we’re using a richly-typed language like Haskell, where we can
use newtype, the best solutions tend to just rely on
convention. The much-maligned Hungarian Notiation evolved
in part to try to combat this sort of problem — If you decide on a
convention that variables representing physical addresses start with
pa and virtual addresses start with va, then anyone who encounters
a uintptr_t pa_bootstack; can immediately decode the intent.
It turns out, though, that we can get something very much like
newtype in familiar old C. Suppose we’re writing some of the paging
code for a toy x86 architecture. We’re going to be passing around a
lot of physical and virtual addresses, as well as indexes of pages in
RAM, and it’s going to be easy to confuse them all. The traditional
solution is to use some typedefs, and then promise to be very
careful to mix them up:
typedef uint32_t physaddr_t;
typedef uint32_t virtaddr_t;
typedef uint32_t ppn_t;
We have to promise not to mess up, though — the compiler isn’t going
to notice if I pass a ppn_t to a function that wanted a
physaddr_t.
This example was inspired by JOS, a toy operating system used by MIT’s
Operating Systems Engineering class. JOS remaps all of physical memory
starting at a specific virtual memory address (KERNBASE), and so
provides the following macros:
/* This macro takes a kernel virtual address -- an address that points above
* KERNBASE, where the machine's maximum 256MB of physical memory is mapped --
* and returns the corresponding physical address. It panics if you pass it a
* non-kernel virtual address.
*/
#define PADDR(kva) \
({ \
physaddr_t __m_kva = (physaddr_t) (kva); \
if (__m_kva < KERNBASE) \
panic("PADDR called with invalid kva %08lx", __m_kva);\
__m_kva - KERNBASE; \
})
/* This macro takes a physical address and returns the corresponding kernel
* virtual address. It panics if you pass an invalid physical address. */
#define KADDR(pa) \
({ \
physaddr_t __m_pa = (pa); \
uint32_t __m_ppn = PPN(__m_pa); \
if (__m_ppn >= npage) \
panic("KADDR called with invalid pa %08lx", __m_pa);\
(void*) (__m_pa + KERNBASE); \
})
Because the typedefs are unchecked by the compiler, though, it is a common mistake to use a physical address where a virtual address is meant, and nothing will catch it until your kernel triple-faults, and a long, painful debugging session ensues.
Inspired by Haskell’s newtype, though, it turns out we can get the
compiler to check it for us, with a little more work, by using a
singleton struct instead of a typedef:
typedef struct { uint32_t val; } physaddr_t;
If we wanted to be overly cute, we could even use a macro to mimic
Haskell’s newtype:
#define NEWTYPE(tag, repr) \
typedef struct { repr val; } tag; \
static inline tag make_##tag(repr v) { \
return (tag){.val = v}; \
} \
static inline repr tag##_val(tag v) { \
return v.val; \
}
NEWTYPE(physaddr, uint32_t);
NEWTYPE(virtaddr, uint32_t);
NEWTYPE(ppn, uint32_t);
Given those definitions, PADDR and KADDR become:
#define PADDR(kva) \
({ \
if (virtaddr_val(kva) < KERNBASE) \
panic("PADDR called with invalid kva %08lx", virtaddr_val(kva)); \
make_physaddr(virtaddr_val(kva) - KERNBASE); \
})
#define KADDR(pa) \
({ \
uint32_t __m_ppn = physaddr_val(pa) >> PTXSHIFT; \
if (__m_ppn >= npage) \
panic("KADDR called with invalid pa %08lx", physaddr_val(pa)); \
make_virtaddr(physaddr_val(pa) + KERNBASE); \
})
We have to use some accessor and constructor functions, but in
exchange, we get strong type-checking: If you pass PADDR a physical
address (or anything other than a virtual address), the compiler will
catch it.
The wrapping and unwrapping is slightly annoying, but we can for the
most part avoid having to do it everywhere, by pushing the wrapping
and unwrapping down into some utility functions. For instance, a
relatively common operation at this point in JOS is creating a
page-table entry, given a physical address. If you want to construct
the PTE by hand, you need to use physaddr_val every time. But a
better plan is a simple utility function:
static inline pte_t make_pte(physaddr addr, int perms) {
return physaddr_val(addr) | perms;
}
In addition to losing the need to unwrap the physaddr everywhere, we
gain a measure of clarity and typechecking — if you remember to use
make_pte, you’ll never accidentally try to insert a virtual address
into a page table.
We can add similar functions for converting between types, as well a a struct
Page, used to track metadata for a physical page. As an experiment, I went and
reimplemented JOS’s memory management primitives using these definitions, and
only needed to use FOO_val or make_FOO a very few times outside of the
header files that defined KADDR and friends.
While the typechecking is nice, any C programmer implementing a
memory-management system is probably going to want to know: How much does it
cost me? You’re creating and unpacking these singleton structs everywhere –
does that have a cost?
The answer, though, in almost all cases is “no” — A half-decent compiler will
optimize the resulting code to be completely identical to the code without the
structs, in almost all cases.
Also, the in-memory representation of the struct is going to be exactly the
same as the bare value — it’s even guaranteed to have the same alignment and
padding constraints, so if you need to embed a physaddr inside another struct,
or into an array, the representation is identical to the physaddr_t typedef.
On i386, parameters are passed on the stack, so that means that passing the
struct is identical to passing the uint32_t. On amd64, as described last week,
small structures are passed in registers, and so, again, the calling convention
is identical.
Unfortunately, the i386 ABI specifies that returned structs always go on the
stack (while integers go in %eax), so you do pay slightly if you want to
return one of these typedef’d objects. amd64 will also break it down into a
register, though, so on a 64-bit machine it’s again identical.
If you’re worried, though, you can always use the preprocessor to make the checks vanish for a production build:
#ifdef NDEBUG
#define NEWTYPE(tag, repr) \
typedef repr tag; \
static inline tag make_##tag(repr v) { \
return v; \
} \
static inline repr tag##_val(tag v) { \
return v; \
}
#else
/* Same definition as above */
#endif
Because the types have identical representations, you can safely serialize your structs and exchange them between code compiled with either version. On amd64, you can probably even call between compilation units defined either way.
The next time you’re writing some subtle C code that has to deal with multiple types with the same representation, I encourage you to consider using this trick.
I didn’t invent this trick, although as far as I know the NEWTYPE macro is my
own invention (Edited to add: A commenter points out that I’m not the first
to use the newtype name in
C,
although I think I prefer my implementation).
. I learned this trick from the Linux kernel, which uses it for a
very similar application — distinguishing entries in different levels of the
x86 page tables. page.h on amd64 includes following definitions [Taken from an
old version, but the current version has equivalent ones):
/*
* These are used to make use of C type-checking..
*/
typedef struct { unsigned long pte; } pte_t;
typedef struct { unsigned long pmd; } pmd_t;
typedef struct { unsigned long pud; } pud_t;
typedef struct { unsigned long pgd; } pgd_t;
I claimed above that the struct and the bare type will have the same alignment and padding. I don’t believe this is guaranteed by C99, but the SysV amd64 and i386 ABI specifications both require:
Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object’s alignment.
(text quoted from the amd64 document, but the i386 one is almost identical).
And C99 requires (§6.7.2.1 para 13):
… A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
I believe these requirements, taken together, should be enough to ensure that
the struct and the bare type will have the same representation.
by nelhage at October 11, 2010 05:11 PM
On Sunday, September 26, 2010, all of the scripts.mit.edu servers will be upgraded from Fedora 11 to Fedora 13, which was released on May 25. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information: athena% add scripts
athena% firefox-test
by ezyang at September 18, 2010 10:04 PM
Parkinson’s Law, the famous quote from his 1955 essay, claims, “Work expands so as to fill the time available for its completion.” It seems there is an analogous principle for disk space:
Date: Mon, 10 Sep 2007 23:05:25 -0400 (EDT)
To: accounts@mit.edu
Subject: AFS quota increaseHi,
Could I get my quota increased a little bit? I have a LOT of medium-size files for various projects… I tried to clear up my homedir about a month ago but it didn’t help more than about 100 MB. And today my usage jumped from 98% to 99% when I uploaded my first pset (for 6.170)…
Thanks,
Date: Thu, 4 Sep 2008 01:32:00 -0400 (EDT)
To: accounts@mit.edu
Subject: Quota bumpHi,
Classes already having started, it’s going to be hard to spend time cleaning out my AFS volume and e-mail this week, although I’ll do so soon (I should probably move the things I don’t need regular access to to other storage, anyway). Any chance I could please have a little bit of a quota bump on each so I can do the first assignment for my classes without worrying?
Thanks,
Date: Tue, 1 Sep 2009 15:05:00 -0400 (EDT)
To: accounts@mit.edu
Subject: quota bump for my lockerHi Accounts,
Can I get more Athena quota on the grounds that it would be helpful for continuing to work on Athena, or something? :)
Also, term is starting soon… I did look around for things that seem potentially likely to be removable, but nothing stands out. I can look harder or try moving things to other storage if you’d prefer I start with that.
Thanks a bunch,
Here are the current official disk policies:
- maximum volume sizes
- reference snapshots (eg unchanging CD/DVD images) — 25G
- swmaint/rel-eng — 12G (but negotiable)
- course/cwis/leased — 12G
- quotas (user homedirs and activity lockers)
- default: 2G
- if they ask for more: 4G
- and justify it: 6G
- and are Thesing and have a signed letter from the Pope: 8G
- (sufficient academic need may substitute for the Papal letter)
My freshman year, the second half was 1G/1.5G/2G/3G (unchanged since 2004), so at least there’s more breathing room. But still, note carefully,
$ fs listquota Volume Name Quota Used %Used Partition user.geofft 8000000 7586330 95%<< 78% <<WARNING
I guess it makes sense: in the normal case, there isn’t a lot of incentive for me to care about disk space, and it’s not a worthwhile use of my time until pressure appears. Even so, I find it amusing that no matter how many quota bumps I’ve already had or how many iterations of Moore’s Law have doubled my quota without even asking, it has become a tradition that every fall (except my freshman year), before classes start, I’ve used 95% of my quota. Files expand so as to fill the disk available for their storage.
Looking at the policy, I guess this time I don’t have a choice but to clean it up.
by geofft at August 30, 2010 06:22 AM
I wrote a little utility the other day before I found myself in need of it: it takes its argument, compiles it as the body of a C program, and runs it.
This is way more convenient than you’d think at first glance: it turns out that, for one- or two-line programs where you’re just testing how something works, when 80% of your program is include boilerplate and making a source file and running gcc followed by the program, you’re somewhat disinclined to test small things. When you have a utility like this, you end up using it a lot…
How many file descriptors can I open?
dr-wily:~ geofft$ gccrun 'int fd; while ((fd = open("/dev/null", 0)) > 0) printf("%d\n", fd);' | tail -1
1023
What error does that get me?
dr-wily:~ geofft$ gccrun 'while (open("/dev/null", 0) > 0); perror("open");'
open: Too many open files
Can I go higher by not having things open on lower fds, and just opening something on fd 1024?
dr-wily:~ geofft$ gccrun 'dup2(0, 1023); perror("dup2");'
dup2: Success
dr-wily:~ geofft$ gccrun 'dup2(0, 1024); perror("dup2");'
dup2: Bad file descriptor
What is errno 2?
dr-wily:~ geofft$ gccrun 'printf("%s\n", strerror(2));'
No such file or directory
What is my “session leader”?
dr-wily:~ geofft$ gccrun 'printf("%d\n", getsid(0));'
18291
dr-wily:~ geofft$ ps 18291
PID TTY STAT TIME COMMAND
18291 pts/28 Ss 0:00 /bin/bash
This is, of course, no substitute for reading formal docs or writing rigorous tests, but it’s great for getting an intuition about how something works.
gccrun is a very tiny shell script. Most of the convenience comes from it including a bunch of headers by default; recommendations on headers to add, and patches in general, welcome.
by geofft at August 23, 2010 09:58 AM
I wrote another post last week for the Ksplice blog: Strace — The Sysadmin’s Microscope. If you’re running a Linux system, or just writing or maintaining a complex program, sometimes strace is indispensable — it’s the tool that tells you what a program is really doing. My post explains why strace is so good at showing the interesting events in a program (hint: it gets to sit between the program and everything else in the universe), describes some of its key options, and shows a few ways you can use it to solve problems.
Unfortunately there’s only so much you can say in a blog post of reasonable length, so I had to cut some of my favorite uses down to bullet points. Here’s one such use, which I can’t bear to keep off of the Web, just because I thought I was so clever when I came up with it in real life a couple of months ago.
(If you haven’t already, I encourage you to go read the main post first. I’ll be here when you come back.)
Sometimes you start a command, and it turns out to take forever. It’s been three hours, and you don’t know if it’s going to be another three hours, or ten minutes, or a day.
This is what progress bars were invented for. But you didn’t know this command was going to need a progress bar when you started it.
Strace to the rescue. What work is your program doing? If it’s touching anything in the filesystem while it works, or anything on the network, then strace will tell you exactly what it’s up to. And in a lot of cases, you can deduce how far into its job it’s gotten.
For example, suppose our program is walking a big directory tree and doing something slow. Let’s simulate that with a synthetic directory tree and a find that just sleeps for each directory:
$ mkdir tree && cd tree $ for i in $(seq 1000); do mktemp -d -p .; done >/dev/null $ find . -exec sleep 1 \;
Well, this is taking a while. Let’s open up another terminal and ask strace what’s going on:
$ pgrep find
2714
$ strace -p 2714
[...]
open("tmp.HvbzfbbWSa", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
fchdir(5) = 0
getdents(5, /* 2 entries */, 4096) = 48
getdents(5, /* 0 entries */, 4096) = 0
close(5) = 0
open("..", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 5
fstat(5, {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
fchdir(5) = 0
close(5) = 0
newfstatat(AT_FDCWD, "tmp.MiHDWiBURu", {st_mode=02, st_size=17592186044416, ...}, AT_SYMLINK_NOFOLLOW) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb19c92a770) = 13044
wait4(13044, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13044
--- SIGCHLD (Child exited) @ 0 (0) ---
open("tmp.MiHDWiBURu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
fchdir(5) = 0
[...]
The find just looked at tmp.HvbzfbbWSa, and now it’s going into tmp.MiHDWiBURu. How far is that into the total? ls will tell us the list of directories that the find is working from; we just have to tell it to give them to us in the raw, unsorted order that the parent directory lists them in, with the -U flag. And then grep -n will tell us where in that list the entry tmp.HvbzfbbWSa appears:
$ ls -U | grep -n tmp.HvbzfbbWSa 258:tmp.HvbzfbbWSa $ ls -U | grep -n tmp.MiHDWiBURu 259:tmp.MiHDWiBURu $ ls -U | wc -l 1000
So tmp.HvbzfbbWSa is entry 258 out of 1000 entries in this directory — we’re 25.8% of the way there. If it’s been four minutes so far, then we should expect about twelve more minutes to go.
I’d be remiss if I taught you this hackish approach without mentioning that if you realize you want a progress bar before you start the command, you can do it much better — after all, the ‘progress bar’ above doesn’t even have a bar, except in your head.
Check out pv, the pipe viewer. In my little example, you’d have the command itself print out where it is, like so:
$ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \;
.
./tmp.BToqLElOGC
./tmp.xnuhzmGbOP
[...]
and then you could get a real, live, automatically-updated progress bar, like so:
$ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \; \
| pv --line-mode --size=$(find . | wc -l) >/dev/null
175 0:02:57 [0.987/s ] [=====> ] 17% ETA 0:13:55
Here we’ve passed --line-mode to make pv count lines instead of its default of bytes, and --size with an argument to tell it how many lines to expect in total. Even if you can’t estimate the size, pv will cheerfully tell you how far you’ve gone, how long it’s been, and how fast it’s moving right now, which can still be handy. pv is a pretty versatile tool in its own right — explaining all the ways to use it could be another whole blog post. But the pv man page is a good start.
There’s lots of other ways to use strace — starting with the two I described in my main post, and the three more, besides this one, that I only mentioned there. I don’t really know anymore how I used to manage without it.
Liked this post? Subscribe and keep ‘em coming.
by Greg Price at August 16, 2010 04:50 AM
Powered by Planet Venus!
Last updated: May 16, 2012 07:37 PM