Skip to content

[JS SDK] uploadFile buffers entire build context in memory; large contexts (~2.5 GB+) cause memory pressure / OOM #1301

@rzgrw

Description

@rzgrw

Bug Report

Summary

Follow-up to #1243. The fix for #1243 (buffer the tar archive so fetch
sets Content-Length and S3 doesn't 501 on chunked encoding) means
uploadFile in packages/js-sdk/src/template/buildApi.ts allocates the
entire build context as a single Buffer in memory before issuing the PUT.

For large contexts (we hit it at ~2.5 GB) this:

  • Allocates a multi-GB Buffer alongside the running tar process
  • Risks OOM on memory-constrained environments (CI runners, small cloud
    VMs, containers with low memory limits)
  • Causes long GC pauses on machines that do have the RAM

Current behavior

packages/js-sdk/src/template/buildApi.ts (lines ~131–141):

const { buffer } = await dynamicImport<
  typeof import('node:stream/consumers')
>('node:stream/consumers')
const uploadBody = await buffer(
  uploadStream as unknown as AsyncIterable<Buffer>
)

const res = await fetch(url, {
  method: 'PUT',
  body: uploadBody,
})

uploadStream (a tar.Pack) is fully drained into a single Buffer.

Reproduction

  1. Build context directory of ~2.5 GB or larger (e.g., a Dockerfile that
    bundles a large model or dataset).
  2. Run e2b template build (or call the SDK directly).
  3. Observe peak RSS during upload — the full tar is held in memory.
  4. On a ~2 GB-RAM runner, expect JavaScript heap out of memory.

The Python SDK has the same shape (tar_buffer.getvalue() → bytes), so
this is a cross-SDK parity concern, but the JS path is where it bites first.

Environment

  • e2b JS SDK: v2.18.0 and current main (commit at filing time)
  • Node.js: v22
  • Storage: any S3 / S3-compatible (independent of provider)

Why a streaming fix isn't trivial

The constraint from #1243 still applies: S3 presigned PUT requires
Content-Length and rejects Transfer-Encoding: chunked with 501. So
the body must have a known length before the request begins. Possible
directions, none obvious:

  • A. Tar to a temp file first, stat for size, then stream-PUT the
    file with explicit Content-Length. Constant memory, ~2.5 GB transient
    disk. Closest to what curl / aws-cli do.
  • B. Two-pass tar: first pass counts bytes, second pass streams.
    Avoids tmpfile but reads the source twice and risks non-deterministic
    tar output between passes (mtimes, etc.).
  • C. S3 multipart upload. Bigger change — requires a different
    presigned-URL flow on the server side, not just an SDK change.

Filing as the data point that #1243's fix didn't fully cover the
large-context case. Happy to discuss preferred direction before any PR.

Workaround

Run builds on a machine with roughly RAM ≥ 3 × build_context_size
(buffered Buffer + tar internals + Node heap headroom).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions