The CoW forking approach is clever but I'm curious about the memory
isolation story under load. When you have 1000 concurrent forks all
writing to pages, you're triggering a lot of CoW faults simultaneously.
How does that behave at the tail — is the p99 degradation mostly from
page fault handling or KVM VM creation overhead?
Also wondering about the snapshot staleness problem. If I pre-load numpy
into the template VM, the snapshot captures that state. But numpy itself
has internal state (random seeds, BLAS thread pools, etc.) — do forks
inherit that, and is there a story for re-snapshotting when the runtime
or dependencies update?
Not skeptical, genuinely asking — the 265KB memory footprint vs ~128MB
for E2B is the number that jumps out most. If that holds under real
workloads this is a meaningful architectural difference, not just a
benchmark optimization.
Don't forget about entropy! You've just created two identical copies of all of your random number generators, which could be very very bad for security.
The firecracker team wrote a very good paper about addressing this when they added snapshot support.
Good callout. We seed entropy before snapshot to unblock getrandom(), but forks still share CSPRNG state. The proper fix per Firecracker’s docs is RNDADDENTROPY + RNDRESEEDCRNG after each fork, plus reseeding userspace PRNGs like numpy separately. On the roadmap. https://github.com/firecracker-microvm/firecracker/blob/main...
Re-seeding is easy. The hard parts are (a) finding everything which needs to be reseeded -- not just explicit RNGs but also things like keys used to pick outgoing port numbers in a pseudorandom order -- and (b) making sure that all the relevant code becomes aware that it was just forked -- not necessarily trivial given that there's no standard "you just got restarted from a snapshot" signal in UNIX.
Off the cuff, the first step to ASLR is don’t publish your images and to rotate your snapshots regularly.
The old fastCGI trick is to buffer the forking by idling a half a dozen or ten copies of the process and initialize new instances in the background while the existing pool is servicing new requests. By my count we are reinventing fastCGI for at least the fourth time.
Long running tasks are less sensitive to the startup delays because we care a lot about a 4 second task taking an extra five seconds and we care much less about a 1 minute task taking 1:05. It amortizes out even in Little’s Law.
Nice to see this work! I experimented with this for exe.dev before we launched. The VM itself worked really well, but there was a lot of setup to get the networking functioning. And in the end, our target are use cases that don't mind a ~1-second startup time, which meant doing a clean systemd start each time was easier.
That said, I have seen several use cases where people want a VM for something minimal, like a python interpreter, and this is absolutely the sort of approach they should be using. Lot of promise here, excited to see how far you can push it!
I’ve been a big fan of “what’s the thinnest this could be” interpretations of sandboxes. This is a great example of that. On the other end of the spectrum there’s just-bash from the Vercel folks.
The tricky part of doing this in production is cloning sandboxes across nodes. You would have to snapshot the resident memory, file system (or a CoW layer on top of the rootfs), move the data across nodes, etc.
Agreed, cross-node is the hard next step. For now single-node density gets you surprisingly far. 1000 concurrent sandboxes on one $50 box. When we need multi-node, userfaultfd with remote page fetch is the likely path.
Also wondering about the snapshot staleness problem. If I pre-load numpy into the template VM, the snapshot captures that state. But numpy itself has internal state (random seeds, BLAS thread pools, etc.) — do forks inherit that, and is there a story for re-snapshotting when the runtime or dependencies update?
Not skeptical, genuinely asking — the 265KB memory footprint vs ~128MB for E2B is the number that jumps out most. If that holds under real workloads this is a meaningful architectural difference, not just a benchmark optimization.
The firecracker team wrote a very good paper about addressing this when they added snapshot support.
The old fastCGI trick is to buffer the forking by idling a half a dozen or ten copies of the process and initialize new instances in the background while the existing pool is servicing new requests. By my count we are reinventing fastCGI for at least the fourth time.
Long running tasks are less sensitive to the startup delays because we care a lot about a 4 second task taking an extra five seconds and we care much less about a 1 minute task taking 1:05. It amortizes out even in Little’s Law.
That said, I have seen several use cases where people want a VM for something minimal, like a python interpreter, and this is absolutely the sort of approach they should be using. Lot of promise here, excited to see how far you can push it!
More than the sub ms startup time the 258kb of ram per VM is huge.
https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...
Are there parallels?
https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...