This website uses cookies

Read our Privacy policy and Terms of use for more information.

AI agent workflows are still trapped in chat history. In this guest post, Raymond Weitekamp explains how OpenProse – it’s open sourced! – turns successful Claude Code and Codex sessions into reusable, reviewable programs written in logical English. Read along!

I thought that I was getting really good at using Claude Code. I wrote my own custom skills and CLIs. I set up my OpenClaw to learn from its past mistakes. I configured my destructive command guard. I was using teams of agents to bring my ideas to life quickly, but I couldn’t help but feel that now my job had become babysitting.

One day my team of agents built me a fully-featured SaaS application. The next day Claude Code accidentally emptied the entire contents of my Solana wallet.

Today’s agents are mismanaged geniuses. The rate-limiting factor in my daily collaboration with Claude Code and Codex is not a model capability issue – it is a trust issue.

I don’t need the agents to be any “smarter”, I need them to be more reliable, which is why I was so excited to discover OpenProse at the beginning of the year. I felt hopeful that with this new “agent language”, I might be able to finally find a more repeatable way to get excellent work from AI. Perhaps this is the end of babysitting…finally now I can realize the true leverage that I know is possible with these mismanaged geniuses!

From the editor: OpenProse is a natural-language programming system for AI agent workflows. It lets developers describe multi-step work in logical English, turn that description into a .prose.md program, and run it through coding agents such as Claude Code or Codex. The goal is to make agent work reusable, reviewable, versioned, and inspectable instead of trapped in chat history.

The idea of specifying an entire multi-agent workflow in plain English was extremely enticing. If OpenProse works, it could become the “git for agent workflows”: a way to preserve, version, review, and reuse knowledge, instead of losing it in chat history.

Now imagine if your best-ever Claude Code session could become a reusable asset. In this article, I will show you exactly how to achieve that. 

What OpenProse is (not)

OpenProse is not an agent harness. You can run it in your favorite coding agent – whether that’s Claude Code, Codex, Hermes, or pi.

OpenProse is not a framework either. Everyone and their mother has an agent framework that they want you to adopt. In my personal experience, the useful half-life of these agent frameworks is typically shorter than the time it will take you to figure out how to use them for your project.

Technically, OpenProse is a programming language. But unlike all other programming languages I’ve ever heard of, it is not actually compiled by the computer. It is “compiled” by the coding agent.

Category

What it does

Where OpenProse differs

Prompt

Gives an agent instructions for one session

OpenProse makes the workflow reusable and reviewable

Agent skill

Gives an agent a capability

OpenProse declares when and where skills should run

Agent framework

Orchestrates agents from outside

OpenProse runs inside the coding agent

Workflow tool

Automates steps in a fixed system

OpenProse lets agents interpret logical English contracts

OpenProse program

Defines agent work as a .prose.md contract

It can be versioned, inspected, rerun, and improved

To give you a little bit of a sense of how that works, you can think of it as one very large and very weird prompt.

OpenProse is packaged as an agent skill, which makes it easy to get started with:

npx skills add openprose/prose

At the same time, it is much more comprehensive than most skills, as it is forcing the agent to “become the virtual machine”. In other words, the OpenProse skill is incepting your coding agent into behaving like a compiler.

At the end of this article we’ll go much deeper under the hood of OpenProse, and you can skip ahead to the end if you really feel the itch, but for the purposes of understanding how we can use OpenProse to create reliable agentic workflows, the key idea is this:

OpenProse is logical natural language that both humans and large language models can understand – a shared contract to express and execute what needs to get done and how to do it.

Logical English is enough to get started

Don’t get intimidated by “the language” aspect of OpenProse. All you need to get started is to be able to express your workflow in logical English. The prose write command will handle the rest.

The prose write function takes your logical English as an argument. It outputs a .prose.md file that you can review and edit, which gets handed to your favorite coding agent, who runs the program with prose run.

Under the hood, prose write is actually its own .prose.md program, which will develop comprehensive OpenProse programs and test that they are valid before finishing. This can be a really great way to get a feel for the syntax and learn what is possible. In the same way that I would advise you to read your Claude Code or Codex “plans” before having a team of sub-agents race to implement them, it is probably a good idea to read the output <name>.prose.md file before running it, even if all of the details don’t make sense to you.

The other nice thing about the fact that OpenProse is “compiled” inside the LLM is that the agent can correct for any technically incorrect syntax. So if you want to start writing your own by hand, you don’t need to get too hung up on the syntactic accuracy to get a sense of how your programs will run.

If you want to stop reading and start doing:

You have hidden agent workflows on your computer

Do you have some really good Claude Code or Codex sessions? Turn them into reusable and repeatable programs that your agents can run!

Some of the most infuriating experiences I’ve had in the past year are when an agent will just absolutely nail something, everything is going perfectly, and then I will go into a new session assuming this very high standard of quality, only to be disappointed and frustrated that I can’t reproduce the magic of that golden session.

So I wrote session-to-prose, which turns a Claude Code, Codex, or Pi JSONL session log into a reusable OpenProse *.prose.md program. It does not merely summarize the session. It extracts the reusable workflow: phases, contracts, decision gates, loops, parallel work, strategies, errors, and validation evidence.

Even if you don’t know it, you are probably sitting on a goldmine of “implicit workflows”, buried in your session logs. The data is already there – OpenProse gives you the format to crystallize it into something you can run again on demand.

Explicitly declare skills as dependencies

Agent skills are amazing, and many workflows simply do not make sense without them. But at the same time, skills got some things wrong. Geoffrey Huntley argues that the content of skills should be deterministically allocated, not up to the agent to decide. In the standard implementation, skills are surfaced via progressive disclosure, which means the coding agent has to “find” the skill on its own – and it may not. When skills aren’t declared up front, your workflow’s success becomes a coin flip on whether the agent thought to reach for the right one.

I was recently using Claude Code to help a friend reformat an academic article from one journal format (which had rejected it) to another journal format for resubmission. Claude Code didn’t do a perfect job of actually reformatting everything, but the thing that blew this person’s mind was that Claude Code made comments and edits credited to Claude Code inside the .docx file itself.

Their immediate response was: “Well, how did you do that?”

I had to explain that Claude Code has a .docx skill. So this is just a very simple example of a workflow where you’re going to get completely different results with and without that skill. And if one of the stages of your multi-agent workflow is to have an agent edit a word document, then you absolutely need to declare that that skill is loaded before that sub-agent runs.

This is also a good place to explicitly call out that OpenProse is a powerful way to coordinate with multiple agents – because the different sub-agents can have different skills. As a fun and arguably extreme example, I made auto-pocock. This is a fully headless OpenProse program that runs a deterministic sequence of Matt Pocock’s engineering skills, all based from one input: a description of the feature you want built.

How and when we use skills is critical. Jude Gao at Vercel recently demonstrated that a carefully curated skill can actually be worse than a simple index of the documentation. OpenProse can give us the best of both worlds - use the skills that are trusted, with only the subagents that need them, in exactly the part of the workflow where they are called for.

Skills are capabilities. Prose programs are contracts for when those capabilities should run.

Under the hood of OpenProse

OK, I promised we’d go deeper, and if you’ve read this far you’ve earned it. This is the part where I try to explain how “a programming language compiled by a coding agent” actually works without hand-waving.

The one idea to hold onto is the one I keep circling back to: the coding agent itself is the compiler. There is no OpenProse server sitting between you and Claude Code, intercepting tool calls and orchestrating them from the outside. The agent reads the contract and becomes the virtual machine. Everything else – the file layout, the receipts – is just scaffolding that makes that role visible and keeps it from evaporating when the session closes. OpenProse runs with the agent, not around it.

The contract is the program

A Prose program is a Markdown file (*.prose.md) that declares a service in logical English. The two sections that matter most are ### Requires (the inputs the service needs before it can start) and ### Ensures (what must be true when it’s done). Around those you can declare ### Services it depends on, ### Strategies for how to exercise judgment, and an optional ### Execution block when you care about the order things happen in.

If you’ve ever written a function signature and a docstring and wished the agent would just honor them, that’s the feeling. Requires and Ensures are the contract. The agent is on the hook to satisfy them, and – this is the part that matters for trust – it has to leave evidence that it did.

Sessions, sub-agents, and the wall between them

Every service runs in its own isolated sub-agent session. This is the multi-agent part, and the isolation is the whole point: scratch work, half-formed reasoning, and dead-end files stay inside that session. The only thing that crosses back out is what you declared in ### Ensures, copied across an explicit binding. So a sub-agent can make a mess in private, and the workflow only inherits the clean, named artifact it promised. That boundary is what keeps a five-stage program from turning into one giant polluted context window.

When you need real control: ProseScript and real tools

Declarative contracts get you surprisingly far, but sometimes you need to say “do this, then that, retry three times, and loop until the reviewer approves.” That’s what ProseScript is – an imperative layer inside the ### Execution block for explicit ordering, conditionals, loops, and retries.

And when you need genuinely deterministic behavior, you don’t ask the model to be deterministic – you declare a real tool. A ### Tools block lets a service depend on an actual executable on your PATH (cli:jq, a script of your own) or an MCP server. I tried this: a service that declared cli:jq, run against a malformed JSON blob, really did shell out to jq – a genuine exit-5 parse error, caught at the right column. The determinism came from jq, not from the model imitating jq. But here is the seam, and it matters: having a real tool wired in does not guarantee the agent calls it at the right moment. The determinism lives in the script; the decision to invoke it does not. I’ll come back to this in the limitations, because it’s the single most important caveat in the whole piece.

Receipts and state: where the trust actually comes from

This is the part that, for me, is the actual answer to the babysitting problem.

The design is that every run leaves a receipt under runs/{run-id}/ – the inputs, the outputs, the logs, the artifacts each service produced. An audit trail, so that when the agent claims it’s done I don’t have to take its word for it; I can read what it actually did. “Done” stops being a vibe and starts becoming something you can inspect.

Longer-lived goals – the standing responsibilities that have to stay true over time, not just answer once – keep their memory in state/ between runs.

The full filesystem layout is src/ (what you author), dist/ (the compiled manifest the runtime reads), runs/ (receipts), state/ (durable cross-run memory), deps/ (pinned dependencies), and a prose.lock. If that looks suspiciously like a normal software project – source, build output, lockfile – that’s the point. It’s meant to live in git and be reviewed like anything else you’d review.

Why it can run anywhere

Because the agent is the compiler, the same .prose.md source runs on any harness that can play the part – Claude Code, Codex, OpenCode, Hermes, Pi (with sub-agents extension), whatever you trust. OpenProse calls this being “Prose Complete”, most coding agents that we’ve tested are “complete enough” to be useful, as long as they have a filesystem, shell and sub-agents. The upshot: your workflows get better as the models get better, without you rewriting anything. You wrote down the contract once; every future model that can satisfy it gets to.

How it compares

  • vs. DSPy. In many ways, it shares similar goals to DSPy: create a layer of abstraction that enables you to author programs instead of writing prompts. However, the implementation couldn’t be more opposite. Where DSPy erects extremely strict scaffolding around the LLM calls, OpenProse asserts the entire language contract inside of the LLM.

  • vs. LangChain and CrewAI. I’ll just say it plainly: I am personally allergic to agent frameworks. Before it was agents it was RAG, and the pattern is always the same – you go deep enough to actually understand the framework, you hit the one thing it won’t do, and now you’re either monkeypatching it into a fork or rolling your own anyway. As a serious AI engineer I need control over the context and all the execution details, and generic frameworks abstract exactly those away. The reason OpenProse doesn’t trip my allergy is that it isn’t asking me to move into anything. The work still runs through the coding agent I already use; OpenProse just puts a contract around the workflow. That’s a different category from “adopt our orchestration layer and live there.”

Caveats and limitations

I’d rather you trust the honest version of this than the hype version, so here are the real edges:

  • The LLM is still non-deterministic. This is the big one. OpenProse can make a workflow far more explicit, inspectable, and repeatable – but it does not turn a language model into deterministic infrastructure. If you have genuinely mission-critical code that must run the same way every time, that code belongs in ordinary scripts and tests, orchestrated and verified outside OpenProse. You can declare a real tool and call out to tested, deterministic code, but as I said above, you are then trusting the agent to invoke it at the right point. The power of OpenProse – the thing that lets it not be a framework – comes precisely from the fact that the coding agent itself is the compiler. That is the magic and the trade-off in the same sentence.

  • A contract only encodes the judgment you put into it. A bad Prose program will faithfully and repeatably do the wrong thing. The operator still has to know what good looks like.

  • There is overhead. Not every prompt deserves to become a program. Reserve this for the workflows that are actually worth making repeatable.

  • The host matters, especially the model. What can actually run depends on the affordances of your coding agent – filesystem access, the ability to spawn isolated subagent sessions, environment-variable handling. Most importantly - only the best frontier models can currently do a good job running prose programs. That will shift eventually, but as of today we recommend using the latest and greatest models and agent harnesses to run OpenProse.

Now Author Your First Outcome

As I shared before, the bottleneck is not intelligence. The bottleneck is trust and reliability. 

I am still babysitting my agents. But I am also slowly starting to write the next page for how I want to collaborate with agents, not in prompts, but in prose.

I hope that this gives you a sense of what is possible with OpenProse! Here are some very easy ways to get started:

This guest post was written by Raymond Weitekamp. We thank OpenProse for supporting Turing Post’s mission to bring clarity to the AI landscape. We encourage you to try it – it’s open source.

FAQ

What is OpenProse?

OpenProse is a natural-language programming system for AI agent workflows. It lets developers describe multi-step work in logical English and turn it into reusable .prose.md programs that coding agents can run.

Is OpenProse an agent framework?

No. OpenProse is not an agent framework or harness. It does not replace Claude Code, Codex, or other coding agents. Instead, it gives those agents a structured contract for what work should be done and how success should be verified.

How does OpenProse work?

OpenProse uses prose programs written in Markdown. These programs define requirements, expected outcomes, services, tools, strategies, and execution steps. The coding agent reads the program and acts as the “compiler” or virtual machine that executes the workflow.

Why use OpenProse instead of prompts?

Prompts are usually temporary and hard to reproduce. OpenProse makes workflows reusable, reviewable, and versionable. A good agent session can become a durable workflow instead of disappearing into chat history.

Does OpenProse make AI agents deterministic?

No. OpenProse makes agent workflows more explicit and inspectable, but the LLM is still non-deterministic. For mission-critical deterministic behavior, use ordinary scripts, tests, and tools, then let OpenProse orchestrate and verify them.

Reply

Avatar

or to participate

Keep Reading