AI workflow patterns are repeatable arrangements of decisions, actions, and human checkpoints that turn an input into an output. In organizations, the workflow is the unit you can actually inspect, automate, and improve: not the model, not the agent brand, not a vague use case, but the path work takes from trigger to result.
This article is part of our The Org Age of AI series, It is co-written by Will Schenk (TheFocus.AI) and Ksenia Se. Previous episodes: #1: AI Feels Powerful. So Why Is the ROI Still Missing?, #2: The Unsexy Truth of AI Adoption, #3: How to Build an AI-Native Startup from Day One, #4: There Are No AI-Native Enterprises Yet.
If you need an unbiased view on your transition to becoming AI-native, you can schedule a 1-on-1 consultation with Will here. Will Schenk is a co-founder of TheFocus.AI, where he works directly with companies navigating these transitions.
What's in today's episode:
What an AI workflow actually is
The seven primitives of AI workflows
Eight AI workflow patterns that recur in production
Which AI workflows should you automate first?
How AI workflow patterns chain into pipelines
What AI workflows mean for AI adoption
What Is an AI Workflow? Definition and Core Concept
This is the fifth article in the series, and we've used the word "workflow" in every one of them without saying what we mean by it. Time to fix that.
The thesis underneath this whole series is that AI adoption conversations keep happening at the wrong unit. People debate models, agents, frameworks, use cases. The thing you can actually point to and change is smaller. It's the workflow.
A workflow is a repeating sequence of decisions and actions that turns an input into an output, with points along the way where a human exercises judgment.
Judgement points! Strip those out and you have a pipeline – a cron job, a script, plumbing. Have them in and you have a workflow.
When we think about a workflow, it's something you have to tease apart through conversations about how information and processes move through the organization. So many of them are informal and under-specified, and in some ways happen in spite of the company. There are mid-level heroes every day finding ways to make something work.
These informal and under-specified parts can now be tackled with LLMs of various sophistications and specificities of prompting. A lot of the gray area can now be addressed. We have intelligence on tap. Things that used to have to be extremely tightly fitted can now be loosely coupled.
The interesting question is always: which judgment points can an agent handle, which ones still need a human, and how does the human know when to step in?
Most organizations have dozens of workflows running in every department – support, finance, engineering, sales, etc. Most have never been written down, because the humans running them absorbed the complexity years ago. The first step is discovering them: not the automated pipelines, but the decision processes made of people working around flawed interactions.
That is the L0-to-L2 problem: making the organization legible to itself. And once you can see the workflows, the next question is which ones to pick.
Across roughly thirty production systems we operate at TheFocus.AI – content pipelines, financial reconciliation tools, engineering automation, deal-scoring platforms, API monitors, newsletter delivery – the same eight compositions keep appearing. Each one is a specific arrangement of primitives with a specific shape of human involvement.
But to understand them, we first need to learn the vocabulary →
The seven primitives: what an agent actually does in a single step
When you strip away the domain language – the invoices, the tickets, the pull requests, the deals – what an agent actually does in any single step reduces to seven actions:
Primitive | What it does | Simple test | Example |
|---|---|---|---|
Watch | Waits for a trigger or condition. | Has something happened yet? | A file appears, a threshold is crossed, a schedule fires. |
Validate | Checks against known criteria. | Is this correct? | File headers match, invoice matches the purchase order, tests pass. |
Classify | Assigns a category or route. | What kind of thing is this? | Billing vs. technical issue, data problem vs. code problem. |
Enrich | Adds useful information to existing data. | What can we add to make this more useful? | Tag a transcript, score a deal, calculate spending from invoices. |
Generate | Produces a new artifact. | What should be created? | Draft an email, write a report, build a slide deck. |
Execute | Takes an action with consequences. | Should this action happen now? | Send the email, post the tweet, load data, deploy code. |
Elicit | Asks a human to reduce ambiguity. | What do we still need to know? | Confirm scope, choose an approach, decide whether to include historical data. |
These seven show up in every workflow we have built or seen built. None of them are sufficient on their own. A single "validate" call is not a workflow. But chain a few together with branching logic and a human checkpoint or two, and you have one.
Eight AI Workflow Patterns That Recur in Production
These eight patterns recur across every production system we run. We originally thought there were five – but a deeper audit of our codebase, plus operational patterns from Bloomberg, Zapier, Cursor, and OpenRouter, surfaced three more. Each one is a specific arrangement of primitives with a specific shape of human involvement.
Pattern | Shape | Human |
|---|---|---|
Triage | Classify → route | Usually none |
Investigation | Validate + enrich → recommend | Decides |
Draft & review | Generate → review | Edits/approves |
Approval | Propose → execute | Gates |
Monitoring | Watch → escalate | Handles exceptions |
Elicitation | Ask → refine | Supplies context |
Sync | Transform → load | Usually none |
Curation | Collect → synthesize → deliver | Receives |
It all comes from the real use cases. And it’s gold. Let’s discuss each in detail →
Pattern 1: Triage
Composition: classify → route
What it does: An item arrives. The agent decides what kind of item it is and sends it down the right path. No artifact is produced. No action is taken. The value is in the routing decision.
Human involvement: Usually none. Triage is the pattern most likely to run fully autonomously, because a misroute is cheap to fix – the downstream pattern catches the error.
Example – Tezlab operations. Almost standard now, but like operating, pulling in DevOps and support events to figure out what level of reaction needs to happen, if it needs to be in a priority queue, if it's a temporary network error or something in between, like the user's credit card got denied and so their account was disabled and they don't know why.
Example – twitter-collator. We operate a system that watches lists of X accounts, and classifies each tweet by immediate relevance, high engagement tweets and "breaking news" gets routed to a priority notification queue while low-engagement tweets go into an archive for later retrieval. The classification happens without human input. If something gets misclassified, the digest just includes a less-interesting tweet – annoying, not catastrophic.
When to use it: When you have a stream of incoming items and the bottleneck is figuring out which bucket each one belongs in, not what to do once it is in the bucket.
Pattern 2: Investigation and recommendation
Composition: validate + enrich → generate recommendation
What it does: The agent gathers context from multiple sources, analyzes what it finds, and produces a recommendation for a human to act on. The recommendation is not a draft of anything – it is a judgment call presented with evidence.
Human involvement: The human reviews the recommendation and decides what to do. The agent does the legwork; the human makes the call.
Example – hedge fund. We subscribe to SEC filings and and do content analysis on, for example, a press release and to decide if an executive departure was a orderly planned event or if it was in response to some scandal and route the event to different places based on the model's understanding of the press release.
Example – qbsync reconciliation reports. We built a system that syncs data between QuickBooks and Google Sheets for a construction firm. The interesting part is the reconciliation. The agent pulls spending data from QB, compares it against the project budget in Sheets, calculates burn curves by cost code, and generates a variance report. "Demolition is 40% over budget at the 20% completion mark. Finishing work has not started spending yet, which is normal at this stage. Plumbing shows an unusual spike in soft costs – three invoices from a new supplier categorized differently than the existing ones."
The project manager reads that report and decides whether to call the subcontractor. The agent did thirty minutes of cross-referencing in two minutes. But the decision to act is still human.
When to use it: When the bottleneck is not "what should we do?" but "we do not have time to gather and cross-reference all the information we would need to make a good decision."
Pattern 3: Draft and review
Composition: generate artifact → human reviews → optionally revise
What it does: The agent produces a complete artifact – a document, an email, a report, a slide deck – and presents it to a human for review before it goes anywhere.
Human involvement: The human reads the draft, edits it or approves it, and decides when it ships. The agent handles the first-draft labor; the human provides taste and judgment.
Example – deal decks at $2B under-managment PE firm. We built a system for a private equity firm that generates draft Go-To-Offer PowerPoint decks from deal materials. The inputs are meeting notes captured in Markdown, confidential information memoranda as PDFs, and financial exports from QuickBooks. The agent parses all of this, extracts the key business metrics, and produces a two-page deck draft: business overview on page one, investment merits and risks on page two, plus a backup Excel workbook with P&L rollups, balance sheet summaries, and customer concentration analysis.
The deal lead reviews the deck. They will always edit it – the tone, the emphasis, the risk framing – but they start from something that took the agent minutes instead of the hours it used to take an analyst. The system validates its output against three known-good historical decks to make sure the format is right. The content judgment is the human's.
When to use it: When a human needs to produce an artifact regularly, the structure of that artifact is predictable, and most of the effort is assembly rather than invention.
Pattern 4: Execution with approval
Composition: propose action → human approves → execute
What it does: The agent is ready to take an action that has real-world consequences – sending an email, posting to social media, deploying code, loading data into a production system – and it waits for a human to say "go."
Human involvement: The human is the gate. They see what the agent intends to do, and they approve or reject it. The key design question is how granular the approval is: per item, per batch, or per parameter.
Example – email pipeline. We use a system built on the Buttondown API that lets an agent draft newsletter emails from Markdown source files. The agent creates the draft and pushes it to Buttondown as an unsent email. A human reviews the draft in the Buttondown console – checking tone, checking links, checking that the content is actually what they want to send to their subscriber list – and then schedules it for a specific send time. The agent does the formatting, the API calls, and the scheduling mechanics. The human decides whether and when the thing actually goes out. Drafts are editable; scheduled sends can be canceled before the send time. The reversibility window is clear.
When to use it: When the action is consequential enough that you want a human to see it before it happens, but routine enough that the agent can do all the preparation.
Pattern 5: Monitoring and escalation
Composition: watch → validate → branch (ok → silent / problem → escalate to human)
What it does: The agent watches for conditions on a schedule. If everything is normal, nothing happens. If something breaks or drifts outside acceptable bounds, the agent surfaces the problem to a human with enough context to act.
Human involvement: None on the happy path. The human only shows up when something goes wrong. Their job is to handle the exception the agent cannot resolve.
Example – usage-monitor. We track API keys and credit balances across fourteen AI providers – OpenAI, Anthropic, Google, Mistral, and 23 others – for multiple organizations. Every hour, a cron job checks each key: is it valid, is the balance above the warning threshold, is the rate limit healthy? Results are sorted by severity in a dashboard. Errors surface first. If a key fails or a balance drops below the critical threshold, alerts fire to Slack, Discord, or email, depending on the organization's preference.
Nobody looks at the dashboard when everything is green. The whole point is that you do not have to. The value is in the two AM alert that says "your Anthropic key for the production org just failed authentication" – with the specific key, the error message, and a link to fix it.
When to use it: When the cost of checking is low, the cost of missing a problem is high, and the human does not need to be involved unless something is actually wrong.
Pattern 6: Elicitation
Composition: elicit → human answers → elicit → … → hand off to generate or execute
What it does: The agent and the human co-construct a specification through a structured back-and-forth. The agent asks questions one at a time, refuses to act until it has enough shared understanding, and only then produces a spec or begins work.
Human involvement: The human is the source material, not the reviewer. They are not evaluating an output – they are providing the input that makes a good output possible.
Example – LOI drafting for investment. We built a system for a private equity firm that drafts Letters of Intent from conversational deal-term input. The deal lead describes terms the way they would brief an associate: "Purchase price is $12M, $2M rollover, 3-year earnout tied to EBITDA, standard non-compete." The agent does not immediately generate a draft. First it asks clarifying questions – drawing on a corpus of every executed LOI the firm has ever done. "The last five LOIs with earnouts also included an acceleration clause on change of control – do you want that here?" "Your non-compete language in healthcare services deals has been different from your standard – which version?" "This rollover percentage is lower than your typical range for deals of this size – is that intentional?"
The document generation is the easy part. The value is in the questions. A junior associate pulling up comparable LOIs would take an hour and miss half the relevant precedent. The agent surfaces it in a conversation. By the time the draft exists, it is right, because the ambiguity was eliminated before any language was written.
When to use it: When the human has domain expertise but has not fully specified what they want – and when the cost of discovering missing requirements during review is higher than the cost of asking upfront. Legal documents, architectural decisions, complex configurations, anything where "I forgot to mention the edge case" creates expensive rework.
Pattern 7: Sync and transform
Composition: watch (or trigger) → validate → enrich → execute (load)
What it does: Data moves from system A to system B, transformed along the way. There is no human judgment on the happy path. The transform rules are known in advance. The agent's contribution is doing it reliably, catching errors, and handling the edge cases that used to require manual intervention.
Human involvement: None in normal operation. The human gets involved when the sync breaks, when a new data format appears, or when the transform rules need updating. This is the pattern most likely to run entirely unattended for weeks at a time.
Example — marketing research data loading. We built a media analytics platform that ingests data from five different providers: show ratings as CSVs, streaming platform metrics as Excel workbooks, celebrity awareness data from research from spss survey files, social follower counts from analytics providers, and audience affinity scores from yet another surveying platform. Each source has its own format, its own column naming conventions, and its own update schedule. Files land in various s3 buckets with various timing.
The load pipeline for each source does four things. First, it validates the file: right extension, right headers, not already processed. Second, it renormalizes column names as positions, since things will change over time. Third, it normalizes the data — title names are all variously encoded across platforms, Spanish-language content can get deduced. Fourth, it cross-references across sources: a celebrity name from one database gets matched to a talent ID, show names get cross referenced, special events that warp expectations (the presidential debates!) get tagged in, building a unified universe of data.
After every load, a suite of data integrity tests runs — does the top-shows-by-platform list look right for this month? Do the historical aggregates still hold? Are the month-over-month calculations consistent? When a test fails, the human examines the logs, identifies whether it is a normalization bug, a column mapping change, or a genuinely bad file from the provider, and either fixes the config and reruns or pushes the problem back to the data source.
The human is not involved in any individual row. They are not reviewing each celebrity match or each title normalization. They set up the column mappings, they define the validation tests, and they handle the exceptions when something changes upstream — a new column appears in an export, a provider changes their naming convention, a file comes in with an unexpected sheet structure. The rest runs without them.
When to use it: When you have data arriving from multiple external sources on a regular schedule, each source has its own format and conventions, the transform logic is complex but deterministic once defined, and the main risks are silent schema drift and cross-source matching errors.
Pattern 8: Curation and scheduled delivery
Composition: watch (clock) → classify + enrich → generate → execute (deliver)
What it does: On a schedule, the agent collects material from multiple sources, synthesizes it, and delivers a finished product to subscribers. No human triggers it. No human reviews it before delivery. The human's only decision is whether to subscribe or unsubscribe.
Human involvement: The human is the audience, not the operator. They receive the output. If they do not like it, they unsubscribe or change their topic preferences. The quality control is built into the pipeline – the agent curates by design, not by human review.
Example – distill email digests. We operate an email digest platform where users select topics – predefined categories, custom keywords, specific Twitter accounts, specific people, newsletters, YouTube channels – and receive twice-daily curated digests at 7 AM and 6 PM in their local timezone. Every hour, a cron job checks which users should receive a digest based on their timezone. For each eligible user, the system fetches recent tweets matching their topics from the Twitter API, deduplicates and groups them by conversation, passes the grouped tweets to Claude for summarization, and sends the digest as an HTML email via Resend.
Nobody reviews the digest before it ships. The quality comes from the topic selection (which the user controls), the summarization prompt (which we tuned), and the deduplication logic (which prevents repeats). If something is off, the user adjusts their topics or unsubscribes. The system processes thousands of digests without a human ever looking at an individual one.
When to use it: When the delivery is frequent enough that per-item human review would be impractical, the content sources are well-defined, the quality bar can be met by prompt engineering and filtering, and the cost of an occasional dud is low (the subscriber skips it, nobody gets fired).
Selection criteria: which workflows to automate first
Not every workflow is worth automating. Some are too rare to justify the setup. Some are too consequential to trust. Some are too messy – too many exceptions, too little structure – to work reliably without a human in every step, which defeats the purpose.
Four variables tell you where to start.
Frequency
How often does this workflow run? An invoice that comes in once a quarter is not worth building a pipeline for. An invoice that comes in sixty times a week is. Frequency does two things:
it amortizes the cost of building the automation,
it creates enough volume for the agent to get good at the task – whether through prompt tuning, through discovering edge cases, or just through the human learning which parts of the output to trust.
The rule: Start with workflows that run at least weekly. Daily is better. Hourly is where full automation starts paying off.
Reversibility
What happens if the agent gets it wrong? If it misclassifies a support ticket, someone re-routes it and the customer waits an extra ten minutes. If it sends the wrong email to ten thousand subscribers, you have a real problem. If it loads corrupted data into the production database, you might have a very expensive weekend.
Reversibility determines where you put the human checkpoints.
High reversibility (a draft, a classification, a recommendation) means you can run with lighter oversight – review a sample, not every item.
Low reversibility (a send, a deploy, a financial transaction) means you need an explicit approval gate or a narrow blast radius.
The rule: Automate high-reversibility workflows first. Move toward low-reversibility workflows only after the high-reversibility ones have built trust.
Verifiability
Can you check whether the agent did it right? Some tasks are easy to verify: code either passes the tests or it does not. A reconciliation either matches the numbers or it does not. A classification can be spot-checked against historical decisions.
Other tasks are hard to verify: Is this essay well-written? Is this strategic recommendation sound? Is this the right tone for this customer? When verification is hard, the human stays in the loop longer – not because the agent cannot do the work, but because nobody can tell whether the agent did the work well without reading the whole output.
One of the most useful frameworks we have seen for this comes from Factory AI: the ease of training an agent to solve a task is proportional to how verifiable the task is.
Tasks with objective truth, fast verification, and strong signal are where agents get good fastest.
Tasks where quality is subjective and verification takes as long as doing the work – those are where agents stay assistants longest.
The rule: Start with workflows where "good" is defined before the agent runs, not after. If you cannot describe what a correct output looks like in advance, the agent is going to produce plausible-looking work that you cannot evaluate efficiently.
Exception rate
What percentage of inputs are weird? If ninety-five percent of invoices follow the standard format and five percent are handwritten notes on the back of a napkin, the agent handles the ninety-five percent and escalates the five percent. That is a good workflow to automate. If fifty percent of inputs are exceptions, the agent spends more time escalating than working, and the human is still doing half the job.
Exception rate interacts with the other three variables.
A high-frequency, high-exception-rate workflow can still be worth automating if the exceptions are classifiable – the agent triages the exceptions into categories, handles the ones it can, and escalates the rest with context that makes the human's job faster.
But a low-frequency, high-exception-rate workflow is almost never worth the setup cost.
The rule: Measure the exception rate before you build. If it is above 30%, consider whether the real problem is that the workflow needs to be redesigned before it can be automated.
How to Match Your Workflow to the Right AI Pattern
The four variables predict which pattern you will end up using:
Frequency | Reversibility | Verifiability | Exception rate | Likely pattern |
|---|---|---|---|---|
High | High | High | Low | Curation / Sync-and-transform (no human checkpoint) |
High | High | Medium | Medium | Monitoring-and-escalation (human on exception only) |
High | Low | High | Low | Execution-with-approval (human approves before action) |
Medium | High | Low | Low | Draft-and-review (human reviews artifact) |
Medium | Medium | Medium | High | Triage first, then investigation-and-recommendation |
Low | Low | Low | High | Elicitation first – you probably need to co-specify before anything else |
If your workflow does not fit neatly into one row, it is probably a pipeline – multiple patterns chained together. That is normal. Most real workflows are.
How AI Workflow Patterns Chain into Pipelines
In practice, most production workflows are not a single pattern. They are two to four patterns composed in sequence, with the output of one feeding the input of the next.
We built a theme-research scoring system for a hedge fund that illustrates this. The firm needed to evaluate five hundred private equity deals against a set of investment criteria. The pipeline:
Sync and transform: Ingest the deal log from an Excel export. Parse each deal into a structured record. (No human involved.)
Triage: Classify each deal into one of eighty-seven industry themes based on the company description, acquirer, and target. Run in parallel – ten deals at a time. (No human involved.)
Enrichment within execution: Score each classified deal on seven investment criteria – growth, market size, pricing power, return on tangible capital, fragmentation, valuation, risk. Run in parallel – five at a time, checkpointed so it can resume if interrupted. (No human involved.)
Draft and review: Aggregate scored deals by theme. Produce an Excel workbook with three tabs: theme rankings, individual deal scores, and the full deal log. The deal team reviews the output. (Human reviews the final artifact.)
The human sets the parameters at the start (how many deals, what concurrency, whether to resume from a checkpoint) and reviews the output at the end. Everything in between runs autonomously. The human's role has shifted from "score each deal" to "review the scoring and catch the ones that look wrong."
This is the general pattern we see when workflows mature: the human migrates from the middle of the workflow to the edges. They define the parameters at the start and review exceptions at the end. The middle – the repetitive, cross-referencing, formatting, classifying, loading work – is where the agent lives.
The marketing analytics QA pipeline we are building now goes further. It is monitoring-and-escalation as the outer frame: watch for a data file to land in a bucket. Inside that, validation: does the file match the expected schema? If not, enrichment: diagnose what is wrong. Then classification: is this a data problem or a code problem? Then escalation with context. On the happy path, the file gets loaded and analyzed, the output gets verified, and the results pass downstream without a human ever touching them. On the unhappy path – and there is always an unhappy path – the agent has already done the investigation, classified the problem, and handed the human a diagnosis instead of a symptom.
The agent does not replace the human. It changes when the human shows up and what they see when they get there. Instead of "something is broken, go figure it out," the human gets "here is what is broken, here is what I checked, here is why I think it is a data problem and not a code problem, and here are the three invoices that do not match." That is a fundamentally different starting point.
What AI Workflow Patterns Mean for Enterprise AI Adoption
Article #1 laid out three transformations: tacit knowledge into context, context into bounded action, human correction into feedback loops.
The workflow is where all three happen in practice.
Tacit knowledge → context is the elicitation primitive. It is the spec-building conversation. It is the six weeks we spent with the bookkeeping company writing down which suppliers put fuel service fees into soft costs and which ones do not. The knowledge existed. It was stored in people. The workflow is the structure that forces it into a form an agent can use.
Context → bounded action is the validate-and-execute chain. The agent has the context. It knows the rules. It takes the action – but within bounds. It loads the data, but only if the headers match. It sends the email, but only after the human approves. It scores the deal, but against criteria the firm defined in advance. The action is bounded because the workflow has checkpoints.
Human correction → feedback loops is the escalation path and the review gate. When the agent gets it wrong, the human corrects it – and that correction can feed back into the next run. The digest gets better topic filtering. The reconciliation report learns a new cost-code mapping. The deal scorer handles a new industry classification. The correction does not disappear into someone's head. It goes back into the system.
That is what a workflow is for. It is the unit where organizational knowledge becomes operational. It is the structure that turns "we use AI" into "AI does this specific thing, this often, with this level of oversight, and we know when it is working because we can check."
The workflow is the true adoption unit of AI. Find yours, name the pattern, identify the primitives, decide where the human enters, and build.
Next in the series: #6 – When the Loop Closes: What Happens When Workflows Run Themselves. Autoresearch, recursive self-learning, agents iterating on their own output. The work loop is closing inside the research labs right now. The org loop downstream of it hasn't caught up, and the verification, learning, and measurement infrastructure to absorb this inside companies doesn't exist yet.
How did you like it?
← Previous: There Are No AI-Native Enterprises Yet | Next → When the Loop Closes: What Happens When Workflows Run Themselves
FAQ
What is an AI workflow?
An AI workflow is a repeatable sequence of decisions, actions, and human checkpoints that turns an input into an output.
AI workflow vs pipeline: what is the difference?
A pipeline runs predefined steps with little judgment. An AI workflow includes judgment points, ambiguity, and explicit decisions about when humans or agents should intervene.
What are the main AI workflow patterns?
The article identifies eight recurring patterns: triage, investigation and recommendation, draft and review, approval, monitoring, elicitation, sync and transform, and curation and scheduled delivery.
When should companies automate AI workflows?
Companies should start with workflows that are frequent, reversible, verifiable, and have a manageable exception rate.
Where should humans stay in the loop?
Humans should stay at high-consequence approval gates, ambiguous specification points, and exception-handling moments where judgment matters more than throughput.










