BotsFor.ai — Agent frameworks that actually work in production

Reality check

Most "agents" are just chains

Let's be honest: 90% of production "agents" are really just LLM chains with some tool calls. True autonomous agents that plan, execute, and recover from errors? Still rare outside demos.

That's not a bad thing. Chains are reliable. Agents are unpredictable. Most business problems are better solved by a well-designed chain than a fancy autonomous loop.

Here's how to think about it:

Fixed workflow, predictable steps → Chain (LangChain, plain code)

Dynamic routing, branching logic → Graph (LangGraph)

Multiple "personas" collaborating → Multi-agent (CrewAI, AutoGen)

Fully autonomous, open-ended → Still experimental

Production ready

What teams are actually using

LangGraph

LangChain · MIT License

In production

If you're building something serious, this is probably what you want. LangGraph lets you define agent workflows as graphs—nodes are actions, edges are transitions. You get explicit control over the flow while keeping the flexibility to branch and loop.

The key insight: it's a state machine for LLM apps. You can checkpoint, resume, and debug. When something fails, you know exactly where.

"We switched from a ReAct loop to LangGraph and our error rate dropped 60%. The graph structure forced us to think about edge cases we'd been ignoring." — Engineering lead at a customer support startup

Stateful Checkpointing Human-in-loop

Used by: LinkedIn, Uber, Elastic, and a lot of YC companies. The LangChain ecosystem means tons of integrations.

Docs →

CrewAI

CrewAI · MIT License

Popular

Multi-agent framework where you define "agents" with roles, goals, and backstories. They collaborate on tasks like a team. Sounds gimmicky, but the role-based prompting actually works well for complex workflows.

Think: researcher agent finds info, writer agent drafts content, editor agent reviews. Each has clear responsibilities. Less prompt engineering, more org design.

          researcher = Agent(role="Researcher", goal="Find accurate data")
writer = Agent(role="Writer", goal="Create clear content")
crew = Crew(agents=[researcher, writer], tasks=[...])
        

Role-based Task delegation Memory system

Good for: content pipelines, research workflows, anything where "team of specialists" is a natural mental model.

Docs →

OpenAI Agents SDK

OpenAI · Closed source

New

OpenAI's answer to LangChain. Assistants API with function calling, code interpreter, file handling. If you're all-in on OpenAI models, this is the path of least resistance.

The ChatGPT "Apps" (Spotify, Gmail, Photoshop integrations) are built on this. Tight integration with OpenAI's models, but you're locked in.

Function calling Code interpreter File handling

Tradeoff: easiest to start, hardest to switch away from. Good for prototypes, think twice for production.

Docs →

Also worth knowing

The rest of the landscape

AutoGen

Microsoft · Multi-agent

Microsoft's multi-agent framework. Good for research, complex for production. Being merged into Microsoft Agent Framework (coming 2026).

Docs →

LlamaIndex Agents

LlamaIndex · RAG-focused

Best if you're doing RAG. Query engines, document understanding, knowledge bases. Less general-purpose than LangGraph.

Docs →

Anthropic MCP

Anthropic · Protocol

Model Context Protocol. Standard for connecting LLMs to tools. Not a framework—a protocol. Growing ecosystem.

Docs →

Semantic Kernel

Microsoft · Enterprise

Microsoft's enterprise SDK. Good C# support. Being merged with AutoGen. If you're in Azure, worth considering.

Docs →

Hype check

What's overhyped (for now)

Fully autonomous agents

Still hype

The "give it a goal and walk away" dream. Manus, Devin, AutoGPT—exciting demos, but production usage is limited. Error recovery is hard. Costs spiral. Reliability isn't there yet.

That said, it's improving fast. Worth experimenting, not worth betting your product on.

Agent swarms

Mostly demos

Hundreds of agents collaborating on complex tasks. Great for papers and Twitter threads. In practice, 2-4 specialized agents usually beat a swarm. Coordination overhead kills you.

Practical advice

What to actually build

Start simple. Plain Python + function calling. No framework. See if that solves your problem. Often it does.

Add structure when you need it. Hitting reliability issues? Add LangGraph for explicit control. Need multiple perspectives? Try CrewAI's roles.

Don't go autonomous too early. Human-in-the-loop is your friend. Let the agent draft, human approves. Gradually reduce oversight as you build confidence.

Instrument everything. LangSmith, Langfuse, or roll your own. You can't improve what you can't measure. Agent debugging is hard—make it visible.

Just starting → Plain Python + function calling

Need reliability → LangGraph

Content/research workflows → CrewAI

RAG-heavy → LlamaIndex

OpenAI-only → Assistants API

Enterprise/Azure → Semantic Kernel

Agent frameworks that actually work

Agent dev weekly

Most "agents" are just chains

What teams are actually using

The rest of the landscape

What's overhyped (for now)

What to actually build