Agent frameworks that actually work

Everyone's building agents. Most are demos. Here's what teams are actually shipping to production—and what's still just Twitter hype.

Most "agents" are just chains

Let's be honest: 90% of production "agents" are really just LLM chains with some tool calls. True autonomous agents that plan, execute, and recover from errors? Still rare outside demos.

That's not a bad thing. Chains are reliable. Agents are unpredictable. Most business problems are better solved by a well-designed chain than a fancy autonomous loop.

Here's how to think about it:

Fixed workflow, predictable steps → Chain (LangChain, plain code)
Dynamic routing, branching logic → Graph (LangGraph)
Multiple "personas" collaborating → Multi-agent (CrewAI, AutoGen)
Fully autonomous, open-ended → Still experimental

What teams are actually using

LangGraph
In production

If you're building something serious, this is probably what you want. LangGraph lets you define agent workflows as graphs—nodes are actions, edges are transitions. You get explicit control over the flow while keeping the flexibility to branch and loop.

The key insight: it's a state machine for LLM apps. You can checkpoint, resume, and debug. When something fails, you know exactly where.

"We switched from a ReAct loop to LangGraph and our error rate dropped 60%. The graph structure forced us to think about edge cases we'd been ignoring." — Engineering lead at a customer support startup
Stateful Checkpointing Human-in-loop

Used by: LinkedIn, Uber, Elastic, and a lot of YC companies. The LangChain ecosystem means tons of integrations.

Docs →
CrewAI
Popular

Multi-agent framework where you define "agents" with roles, goals, and backstories. They collaborate on tasks like a team. Sounds gimmicky, but the role-based prompting actually works well for complex workflows.

Think: researcher agent finds info, writer agent drafts content, editor agent reviews. Each has clear responsibilities. Less prompt engineering, more org design.

researcher = Agent(role="Researcher", goal="Find accurate data")
writer = Agent(role="Writer", goal="Create clear content")
crew = Crew(agents=[researcher, writer], tasks=[...])
Role-based Task delegation Memory system

Good for: content pipelines, research workflows, anything where "team of specialists" is a natural mental model.

Docs →
OpenAI Agents SDK
New

OpenAI's answer to LangChain. Assistants API with function calling, code interpreter, file handling. If you're all-in on OpenAI models, this is the path of least resistance.

The ChatGPT "Apps" (Spotify, Gmail, Photoshop integrations) are built on this. Tight integration with OpenAI's models, but you're locked in.

Function calling Code interpreter File handling

Tradeoff: easiest to start, hardest to switch away from. Good for prototypes, think twice for production.

Docs →

The rest of the landscape

AutoGen
Microsoft · Multi-agent

Microsoft's multi-agent framework. Good for research, complex for production. Being merged into Microsoft Agent Framework (coming 2026).

Docs →
LlamaIndex Agents
LlamaIndex · RAG-focused

Best if you're doing RAG. Query engines, document understanding, knowledge bases. Less general-purpose than LangGraph.

Docs →
Anthropic MCP
Anthropic · Protocol

Model Context Protocol. Standard for connecting LLMs to tools. Not a framework—a protocol. Growing ecosystem.

Docs →
Semantic Kernel
Microsoft · Enterprise

Microsoft's enterprise SDK. Good C# support. Being merged with AutoGen. If you're in Azure, worth considering.

Docs →

What's overhyped (for now)

Fully autonomous agents
Still hype

The "give it a goal and walk away" dream. Manus, Devin, AutoGPT—exciting demos, but production usage is limited. Error recovery is hard. Costs spiral. Reliability isn't there yet.

That said, it's improving fast. Worth experimenting, not worth betting your product on.

Agent swarms
Mostly demos

Hundreds of agents collaborating on complex tasks. Great for papers and Twitter threads. In practice, 2-4 specialized agents usually beat a swarm. Coordination overhead kills you.

What to actually build

Start simple. Plain Python + function calling. No framework. See if that solves your problem. Often it does.

Add structure when you need it. Hitting reliability issues? Add LangGraph for explicit control. Need multiple perspectives? Try CrewAI's roles.

Don't go autonomous too early. Human-in-the-loop is your friend. Let the agent draft, human approves. Gradually reduce oversight as you build confidence.

Instrument everything. LangSmith, Langfuse, or roll your own. You can't improve what you can't measure. Agent debugging is hard—make it visible.

Just starting → Plain Python + function calling
Need reliability → LangGraph
Content/research workflows → CrewAI
RAG-heavy → LlamaIndex
OpenAI-only → Assistants API
Enterprise/Azure → Semantic Kernel

More from BuiltForAI