Engineering · Jun 22, 2026 · David Wang

Improved Context Management: Wayframe's New Agent Architecture

How we built an autonomous, tool-calling design agent with git-backed long-term memory on Cloudflare Workers

We rebuilt Wayframe's canvas agent from the ground up. The new architecture has two pillars:

A split client/server agent loop built on the Vercel AI SDK, so the model can both mutate the live document and run web search, memory tools in the same turn.
Context repositories for improved memory, providing long-term, versioned memory per design file, using Cloudflare Artifacts.

The memory model is inspired by Letta's Context Repositories and MemFS; the Worker-side git follows Cloudflare's own isomorphic-git on Artifacts guide. We stitched these ideas into something that fits a real-time, collaborative design editor.

The problem: an agent that edits a document it can't see

Wayframe is a visual editor. The document is a live, collaborative scene graph that lives in the browser. The model that drives the agent lives on a server. This split is the single most important constraint in the whole design.

The agent has roughly two kinds of tools:

Canvas tools (write_html, create_frame, update_styles, get_screenshot, …) that read and mutate the live in-browser document. These must run on the client, because that's where the document is.
Server tools (web_search, and the new memory_* family) run on the server. They don't have document access, output is just text fed back to the model.

The first thing we built was a loop that spans both sides cleanly.

Pillar 1: the split client/server agent loop

Each request to the agent runs one model step. The server streams back text, tool calls, and the messages to append; the client owns the loop. It sends the conversation, receives a step, executes any client tool calls against the live document, appends the results, and opens the next step. A turn ends when a step produces no client tool calls.

This is the classic agent loop, but the twist is where the tools run.

Server tools run inline, without a client round-trip

We don't want a web_search or a memory_read to require a full client → server → client → server bounce. It adds extra latency, and it pollutes the document-editing loop with calls the client can't even handle.

The AI SDK gives us the perfect seam. Tools that have an execute function run on the server; tools without one halt the run and get handed back to us. So:

Client tools have no execute. When the model calls one, the run stops and we relay the call to the client.
Server tools have an execute, so the SDK runs them inline and keeps the model going until it either answers or calls a client tool.

When a step finishes, we filter out the server tool calls before handing anything back. They already ran inline and their results are already in the model's context, so the client only ever sees calls it can actually execute. A single request might internally run memory_read → web_search and then emit a write_html for the client, all surfaced to the user as smooth status updates.

Observability

Every step emits OpenTelemetry gen-ai spans through the AI SDK (model, token usage, finish reason, one span per server-tool call), collected by our Sentry integration. We deliberately record metadata only.

Pillar 2: the context repository (long-term, versioned memory)

This allows the design agent to learn the user's design preferences over time.

The idea

Letta's Context Repositories reframed agent memory as a git-backed directory of markdown files rather than a blob of key-value memory or a vector store. We found the model compelling for three reasons:

Progressive disclosure. The agent sees a tree of files with one-line descriptions, and only loads a file's full contents when it needs them. The context window stays lean.
A system/ hierarchy. Files under system/ are always loaded into the system prompt. These are the things the agent must never forget (its identity, the user's durable design preferences). Everything else is load-on-demand.
Versioning. Every change is a git commit with a message. You get a complete, auditable changelog of what the agent learned and when, plus the ability to roll back a bad “learning.”

We mapped this onto Wayframe with one repository per design file. A file's agent remembers that file's brand tone, typographic preferences, naming conventions, and project decisions, across every session. The seeded layout looks like this:

README.md                      # overview
system/identity.md             # who the agent is on this file  (pinned)
system/design-preferences.md   # durable visual/UX preferences   (pinned)
project/notes.md               # file-specific facts & decisions (load on demand)

Each file carries a one-line description in its frontmatter, which is what powers the tree view without loading the file bodies.

The four memory tools

We expose memory to the model as four server tools (which, recall, run inline). Letta Code agents do this with a real shell against a real filesystem; Workers have no shell and no disk, so we model the same operations as explicit tools:

Tool	What it does
memory_list	List the file tree with pinned flags + descriptions. Discovery before reading.
memory_read	Load one markdown file's full contents.
memory_write	Create/update a file and commit it, with a required commit message.
memory_log	Show recent commits, including what was learned before, to avoid duplication.

Running git inside a Worker

Cloudflare Workers have no local filesystem and no git binary. isomorphic-git solves the git half (it's pure JS), but it still expects an fs-like object to read and write a working tree. Cloudflare's own guide recommends supplying an in-memory filesystem for exactly this.

So we wrote a small one, just enough of the Node fs.promises surface for git to clone, stage, commit, and push. It has no dependencies and does exactly what git needs and nothing more.

Storage: Cloudflare Artifacts as the git remote

Cloudflare Artifacts is “versioned storage that speaks Git.” You create a repo through a Workers binding and it hands you a remote URL and a short-lived token. We treat it as the durable home for each file's memory; the in-memory filesystem is just a transient working tree that exists for the duration of one operation.

A memory_write, end to end, is: mint a short-lived write token → shallow-clone the remote into a fresh in-memory tree → write the markdown, commit it → push back to Artifacts. Artifacts holds the history; our database holds the identity (a small table mapping each file to its repo) so we can resolve a file's memory on every step, enforce ownership, and avoid listing Artifacts.

Results

We've been running this internally and with early users. A few takeaways:

Memory changes the feel of the product. The biggest qualitative shift is that the agent stops re-litigating settled decisions. Once a preference is committed to memory, it shows up correctly on the first try in later sessions. The “I told you this last week” failure mode is gone for anything the agent chose to remember. The git log doubles as a surprisingly readable record of what the agent thinks it knows about your file, and when it gets something wrong you can read the commit and rewrite it.

Isomorphic Git on Cloudflare Artifacts. Our worry was that cloning per memory operation would be too slow. Because each repo is one design file's worth of small markdown files and we shallow-clone, the working trees stay tiny and operations are fast. Treating Artifacts as a plain git remote means we get versioning, auth, and durability without building any of it.