HITL in Agentic AI: From Gatekeepers to Human-as-a-Tool

❝

“But lo! men have become the tools of their tools.”

Walden by Henry David Thoreau

In our ongoing series exploring agentic workflows, we’ve covered reasoning, memory, reflection, action execution, tool integration, and such practical things as MCP (still a trending article on Hugging Face). Today, we turn to something equally foundational: how humans participate in these workflows: Human-AI Co-Agency.

Thoreau wrote this phrase “men have become the tools of their tools,” in 1845-1847, talking about 19th-century technologies like the railroad, telegraph, and farming equipment. These tools were meant to serve us, but he saw people reorganizing their lives around them. The fear wasn’t new tech – it was losing agency to the systems we create. His quote is a reminder that such fears is nothing new and echoed through every wave of new technology. What’s different now is that, for the first time in history, our tools can actually meaningfully reply and make decisions. Which is why we need to think about co-agency – how we live, work, and decide with it.

There are two sides of that. Both are absolutely fascinating and have deep histories:

First, co-agency as something practical and structural: Human in the loop (HITL). Sometimes it becomes “human as a tool” in the context of tool calling. Yes, you heard that right – from ultimate decision maker to just another callable function in an agent’s toolbox. Not always, not in every industry – but this setup is becoming more common. That’s what we’ll dig into today.
Second, co-agency as something experiential and conversational – how we communicate with agentic workflows, and how new interfaces are evolving to support that (with due homage to Vannevar Bush and Douglas Engelbart). We’ll cover all that in the next episode.

Are you ready to unpack HITL and see where in the AI loop a human is? Let’s go.

What’s in today’s episode?

What is HITL in Agentic Workflows? And why we cover it after MCP
Key Milestones in HITL Evolution
- HITL 1.0: Humans as Gatekeepers
- HITL 2.0: The Crowd in the Loop
- HITL 3.0: From Labels to Feedback to Preferences
- HITL 4.0: The Human as a Tool. Wait, what?!
- HITL 5.0: Co-Agency
How HITL Shapes the Behavior of Modern AI Agents (two research papers)
Where HITL is Going?
Concluding Thoughts
Resources to dive deeper

What is HITL in Agentic Workflows? (and why we cover it after MCP)

In the last two episodes, we showed how agents act (via UI/API tools) and how those actions are now structured, thanks to MCP (that article is still trending on Hugging Face!). Autonomy is cheap, action is easy – and that means orchestration is now a human problem.

Human-in-the-Loop (HITL) is the safety net that makes agentic AI systems usable in the real world. As AI agents take on more autonomous, multi-step tasks, they also run into familiar issues: hallucinations, shaky reasoning, and unpredictable decisions. HITL is the antidote.

It’s a design pattern (not a quick fix), where humans are built into the decision loop to validate outputs, steer actions, or override the machine when necessary. We traced the historical roots of human-machine collaboration — from Licklider's "man-computer symbiosis" to ambient intelligence — in our weekly digest: FOD#93: When AI Meant Ambient Intelligence. Think of a chatbot that pauses to ask for clarification (instead of making stuff up) or a workflow where the AI waits for a human sign-off before pulling the trigger, or self-driving cars navigate autonomously but allow human override during complex or unexpected scenarios. As an active user of a self-driving car, I don’t want to be out of the loop but also, it’s quite annoying when the car keeps asking me to take control every few minutes, interrupting what should be a smooth ride. So HITL is also very much about balance.

It might sound trivial, but it’s easy to forget to include HITL as a design element, especially in multi-agent systems.

HITL Evolution: 5 Stages From Gatekeepers to Co-Agency

I was reading J. C. R. Licklider’s Man-Computer Symbiosis from 1960 and found myself thinking again: “Man, we need to reread things from the past.” What strikes me is Licklider’s precisely right focus. He acknowledged that machines might one day surpass human cognitive abilities, but saw symbiosis as an essential interim phase – potentially the most intellectually rich and productive in human history. Why don’t we talk more about this, instead of fussing so much over a vaguely defined AGI? Anyway, Licklider was in pre-HITL era, forming the vision.

Let’s discuss what followed as computer science and machine learning started to evolve →

Apologize my Claude for not being able to fit words into frames. I tried

HITL 1.0: Humans as Gatekeepers

The first wave of Human-in-the-Loop AI emerged in the era of symbolic AI and expert systems in the 1970s and 1980s. MYCIN was a pioneering AI expert system from the 1970s that guided doctors through diagnosing and treating bacterial infections using a rule-based dialogue. It included an interactive questioning mechanism – it would ask the physician a series of questions, gather data (e.g., symptoms, lab results), and then output its diagnosis and recommendations. Design philosophy at the time was clear: AI advises. Human decides.

This approach continued into the early days of machine learning in the 1990s, when human involvement was still central. Models needed labeled data, and it was people who provided it. Feature selection was manual, evaluation relied on human-defined metrics, and tuning hyperparameters meant running and rerunning training jobs. These were learning systems, but the human-in-the-loop dynamic remained strong.

Later, methods like active learning and interactive machine learning started to shift that dynamic. Systems began to ask which data points to label, making the process more efficient. Feedback loops became quicker and more targeted. Humans were still involved, but the systems began to adapt to their input in more thoughtful ways.

HITL 2.0: The Crowd in the Loop

In the 2000s, the loop scaled. With the rise of Amazon Mechanical Turk and projects like ImageNet, we entered the age of the crowd-in-the-loop. Rather than relying on a single domain expert, AI development now enlisted thousands of human labelers to supply training data.

In resulted in phenomenal scale – and systems that could finally tackle perception tasks like vision and language at a new level. Deep learning models fed on this massive supply of human-labeled data, propelling the breakthroughs we saw in the 2010s. But the role of the human was still "offline": we helped train the models, but weren't involved once they were deployed.

That changed when AI started making decisions in the wild – and sometimes making the wrong ones.

HITL 3.0: From Labels to Feedback to Preferences

As reinforcement learning and neural networks advanced, a new realization emerged: not all desired behaviors could be expressed with a clean loss function. In real-world scenarios, especially where ethics, ambiguity, or open-ended goals were involved, it was easier to show a system what we wanted than to write it down.

That’s how Reinforcement Learning from Human Feedback (RLHF) was born.

Instead of programming a reward function, humans would rank outputs – "A is better than B" – and a model would learn to optimize for what people preferred. This shift put humans back in the loop, not as labelers, but as judges of quality, alignment, and intent.

It’s what made ChatGPT possible. Its secret sauce.

Today’s best LLMs don’t only predict the next word – they’ve been trained to follow instructions, using thousands of hours of human feedback to shape their tone, helpfulness, and truthfulness. The preferences are diverse, evolving, and sometimes contradictory. Which means the loop can't be closed. The human is still essential. This same feedback loop is now critical for keeping synthetically generated training data grounded and safe from model collapse — where humans validate and rank outputs before they re-enter training. See how OpenAI, Microsoft, and NVIDIA apply HITL to synthetic data pipelines in practice.

But here’s the twist.

HITL 4.0: The Human as a Tool. Wait, what?!

Enter the agents.

The rise of agentic AI – systems that plan, reason, and act using tools – has led to a curious inversion of roles. In many modern frameworks (LangChain, AutoGPT, etc.), AI agents have access to a toolbox: a web browser, a calculator, a code interpreter, a database... and a human.

That’s right. The human is now a tool.

In LangChain, for example, the HumanInputRun tool allows an agent to ask the user a question mid-task. The agent pauses, collects the answer, and resumes. The same logic applies in the Bee Agent framework (IBM’s open source, no-code platform): if the agent gets stuck or uncertain, it calls the "human tool" to ask for help.

Hello there, my fellow human tool!

This is a wild shift from traditional thinking. From being the central authority, the human becomes just another resource – invoked only when needed. We’ve gone from supervisor to API.

Of course, this is not universal. In medicine, law, or finance, humans still hold ultimate authority. But in consumer apps, productivity agents, or low-stakes creative work, the human-as-tool model is spreading. And it works.

It reduces friction. It keeps agents autonomous until they really need help. It lets users inject high-value input (clarifications, constraints, missing facts) without needing to micromanage.

So, where does that leave us?

HITL 5.0: Co-Agency

All is good: rather than being just an occasional tool in the box, we are increasingly moving toward treating humans and AI as co-agents. With that we go all the way back to J. C. R. Licklider’s and his Man-Computer Symbiosis.

In multi-agent systems like CAMEL or AutoGen, humans can participate as one of the "agents" – a teammate among other autonomous systems. You’re not supervising. You’re collaborating. Maybe one agent gathers data, another proposes a plan, and you, the human, critique and refine.

These systems constructed to be dialogue partners. They reflect a design philosophy of shared initiative: sometimes the AI leads, sometimes the human does. It’s a dance.

Mixed-initiative planning, first explored in the 90s, is now real.

What’s different is that today’s LLMs can reason through ambiguous inputs, generate plans, and even reflect on their own failures. Which means your role as a human collaborator in many cases is shifting from controller to creative partner.

If Henry David Thoreau were still around, he might exclaim, 'But look! Men have befriended their tools!'

That’s why human-AI communication becomes such an important discussion topic.

How HITL Shapes the Behavior of Modern AI Agents

KNOWNO: Teaching Robots to Know When They Don't Know"

Robots leveraging LLMs to execute complex tasks based on natural language instructions are transforming robotics, promising intuitive human-robot interactions. Yet, a major limitation lurks behind these models’ impressive abilities: LLMs frequently hallucinate, confidently making incorrect decisions without awareness of their uncertainty.

Allen Z. Ren and colleagues from Princeton University and Google DeepMind propose KNOWNO, a method designed to ensure robots "know when they don’t know." KNOWNO leverages conformal prediction (CP) to align an LLM planner’s uncertainty, allowing robots to request human help precisely when uncertainty arises. Unlike typical prompt-engineering approaches, KNOWNO provides rigorous statistical guarantees, achieving a desired balance between autonomy and human assistance.

Image Credit: The Original Paper

Experimental results from diverse robotic setups – spanning spatial reasoning tasks, numeric ambiguities, and complex Winograd schemas – demonstrate KNOWNO’s effectiveness. The approach consistently outperforms baseline methods, significantly improving efficiency and reducing unnecessary human intervention by 10-24%.

KNOWNO offers a robust, lightweight solution that complements evolving LLM capabilities, paving the way for safer, more reliable human-robot collaboration.

HULA – Human-In-the-Loop Software Development Agents

Human-in-the-loop software development agents are increasingly becoming the next leap forward in practical, AI-assisted software engineering. The recent paper, "Human-In-the-Loop Software Development Agents" by Wannita Takerngsaksiri et al., introduces the HULA framework, a system leveraging LLMs integrated directly within Atlassian's JIRA platform. I’m taking it as an example here, because the framework’s real-world deployment and extensive evaluations provide lots of insights for researchers and industry practitioners looking to leverage LLM-driven development agents effectively.

The core idea of HULA is to bring three key agents together – an AI Planner Agent for file localization and coding plan generation, an AI Coding Agent for source code creation and iterative refinement, and a Human Agent for feedback, oversight, and approval. Unlike traditional autonomous multi-agent systems, HULA emphasizes cooperative, human-guided interactions at each step, aligning AI outputs closely with practical software engineering needs.

Image Credit: The Original Paper

In extensive evaluations – both offline (using SWE-Bench and internal Atlassian datasets) and online (live deployment within Atlassian teams) – HULA shows impressive practical value. During live deployment, practitioners approved generated coding plans in 82% of cases, and about 59% of pull requests created with HULA-generated code were successfully merged into Atlassian's repositories.

There are, of course, some challenges: particularly regarding the completeness and quality of generated code. Practitioners highlighted that while HULA significantly reduced initial development effort, human intervention was often essential to achieve high-quality results, especially for complex tasks. The need for detailed task descriptions emerged as both a strength and challenge – while fostering good documentation practices, it also required considerable upfront effort from users.

Researchers list two lessons from working on HULA:

Lesson Learned 1: The performance of LLM-based software development agent heavily relies on a detailed input description, but what key information is needed?
Lesson Learned 2: Evaluating functional correctness should go beyond passing unit test cases.

In short, HULA represents a practical step towards integrating AI seamlessly into everyday software engineering workflows, positioning itself as a valuable assistant rather than an autonomous replacement.

The Future of HITL in Agentic AI Systems

Part of that is a discussion for the next episode. But let me just outline a few ideas that are becoming a reality as we speak. Here’s what Human-AI Co-Agency actually looks like:

Interfaces Adapt
UIs shift to your skill level. Oversight becomes interaction.
Language Is the API
Talk, don’t prompt. Agents take plain-language commands and explain themselves.
Live Loops
Real-time corrections. Token-by-token collaboration across writing, coding, design.
Proactive by Default
Agents anticipate. They pull you in – or step back – before things go wrong.
Learning in Context
Every edit teaches. Less repetition, more autonomy over time.
Collaborative Oversight
One human isn’t enough. Agents query panels, experts, even crowds.
Trust = Ongoing
Regular check-ins. Transparent logs. No rubber-stamping.

In LangChain, there’s an actual HumanInputRun tool. In Bee Agent, there’s a "human consultation" action. Agents that encounter uncertainty or ambiguity don’t just guess anymore. They can pause and explicitly call for help, mid-chain.

It’s a major philosophical shift: humans are no longer the external auditors of a finished result. They are invoked during reasoning. The model says, "I don’t know Eric’s last name. Let me ask." And then it asks you.

That pattern – agents pulling humans into the loop when needed – scales well with the complexity and unpredictability of real-world tasks.

Concluding Thoughts

Human-in-the-loop integration in agentic systems is shaping the next era of AI – one where human-AI communication is continuous and cooperative, rather than one-off or adversarial.

As Licklider envisioned, the age of man-computer symbiosis may be the most intellectually productive era we’ve ever seen. We’re finally starting to build like we believe it.

Current implementations across various platforms demonstrate the viability and advantages of HITL: from LangChain agents asking users for missing info, to multi-agent research frameworks incorporating human supervisors, to real-world autonomous systems that wisely pause for human approval on critical actions.

Looking ahead, the line between “the AI’s job” and “the human’s job” will blur. We can expect HITL to evolve into more sophisticated human-AI co-agency.

We put this to Olga Megorskaya, CEO of Toloka — a company that has been building human-AI data pipelines for over a decade. Her definition of co-agency: "when an AI agent and a human agent are solving the same task together." And the hardest part, she says, isn't engineering — it's teaching humans when not to trust the plan. → Full interview

And that’s why we really need to learn how to communicate well with machines – to be understood and be effective. In the next episode, we’ll explore how this collaboration actually happens – how we communicate with agentic systems, and how that communication shapes the experience.

Please share this article – it helps us grow and reach more people – thank you!

Sources: