The Open Source AI Agent Landscape in 2026
In short
Six open source AI agent frameworks dominate production work in 2026: LangGraph, CrewAI, AutoGen / AG2, LlamaIndex, Semantic Kernel, and Pydantic AI. All are MIT or Apache 2.0 licensed. LangGraph and CrewAI are the two we ship most often.
The open source agent space consolidated in 2025. By April 2026, six frameworks account for the overwhelming majority of production agent deployments we see.
All six are free to self-host. All ship under permissive licenses (MIT or Apache 2.0). The differences that matter are architectural — not licensing.
- LangGraph — graph-based state machines. 16K+ GitHub stars. MIT license. Built by the LangChain team.
- CrewAI — role-based multi-agent collaboration. 28K+ GitHub stars. MIT license. Independent of LangChain.
- AutoGen / AG2 — Microsoft Research origin. Split into AutoGen v0.4+ (Microsoft) and AG2 (community fork) in late 2024.
- LlamaIndex agents — agent primitives built on top of the LlamaIndex data framework. RAG-native by design.
- Semantic Kernel — Microsoft's enterprise SDK. C#, Python, and Java parity. Strongest fit for .NET stacks.
- Pydantic AI — type-safe Python agents from the Pydantic team. Released late 2024, gaining production traction in 2025-2026.
The rest of this article focuses on LangGraph vs CrewAI — the two frameworks we recommend for most new open source agent projects in 2026.
The Alice Labs Production Score Methodology
In short
The Alice Labs Production Score is a proprietary 0-10 score combining developer experience, production readiness, observability, scalability, and ecosystem maturity. It is calibrated against 18 real production agent deployments shipped between 2024 and 2026.
We needed a way to compare agent frameworks beyond GitHub stars and marketing claims. The Alice Labs Production Score is the result.
The score weights five dimensions equally — each contributes up to 2.0 points to a 10.0 total. Every score is calibrated against actual deployment data, not vibes.
- Developer experience (DX) — 2.0 pts. How fast can a competent engineer ship the first working agent? Measured in days from kickoff to demo.
- Production readiness — 2.0 pts. Native support for retries, checkpoints, idempotency, durable execution, and graceful degradation.
- Observability — 2.0 pts. Quality of traces, evals, prompt versioning, replay, and integration with tools like LangSmith and Langfuse.
- Scalability — 2.0 pts. Performance under concurrent users, session isolation, multi-tenant safety, and horizontal scaling story.
- Ecosystem maturity — 2.0 pts. Pre-built integrations, community size, release cadence, and long-tail of community recipes.
The dataset behind the score: 18 production agent deployments shipped by Alice Labs between 2024 and 2026 across LangGraph, CrewAI, AutoGen, and Semantic Kernel. Industries covered include financial services, media, and the public sector.
Frameworks we have not personally shipped to production (LlamaIndex agents, Pydantic AI) are scored conservatively from prototype work and public references — and we label that distinction explicitly in our scoring.
Beyond LangGraph and CrewAI: The Other 4 Frameworks
In short
AutoGen / AG2, LlamaIndex agents, Semantic Kernel, and Pydantic AI each excel in narrower contexts: research-style conversations, RAG-native agents, .NET enterprise stacks, and type-safe Python respectively. They are situational picks rather than defaults.
LangGraph and CrewAI cover the majority of production needs. The remaining four frameworks each shine in a specific context.
AutoGen / AG2
Microsoft Research's multi-agent conversation framework. In late 2024 it split: Microsoft continued an AutoGen v0.4+ rewrite, while the community forked the proven v0.2 lineage as AG2 (ag2.ai).
Best for research-style agent conversations, code-execution agents, and group chat patterns. Picking between AutoGen v0.4+ and AG2 should be deliberate — the APIs diverge.
LlamaIndex (agents)
LlamaIndex started as a data framework for LLMs and now ships first-class agent primitives. The strongest pick when the agent's primary value is reasoning over indexed private data.
Best for RAG-first agents — agents whose job is to query and synthesize from your knowledge base, vector DB, or document store.
Microsoft Semantic Kernel
Microsoft's enterprise AI orchestration SDK with C#, Python, and Java parity. The best fit when your enterprise lives on .NET / Azure infrastructure.
Best for enterprises on Microsoft / Azure that need first-class C# support, tight Azure OpenAI integration, and a plugin model that maps to enterprise governance.
Pydantic AI
Type-safe agent framework from the Pydantic team. Released late 2024, it brings FastAPI-style ergonomics — strict types, dependency injection, structured responses — to agent code.
Best for Python teams that prioritize type safety and predictable IO. Newer than other frameworks, with a smaller ecosystem but cleanest DX in the type-safe niche.
Stuck between LangGraph and CrewAI?
We've shipped 18 production agent deployments across LangGraph, CrewAI, AutoGen, and Semantic Kernel. Book a 30-minute architecture call and we'll help you pick the right open source stack for your use case.
Book an architecture callProduction Deployment Patterns from Alice Labs Engagements
In short
Across 18 Alice Labs production agent deployments, three patterns dominate: LangGraph + LangSmith for stateful workflows with HITL, CrewAI for content and research crews, and Semantic Kernel for .NET-bound enterprises. Observability tooling is paired with the framework from day one.
Looking at the 18 production agents Alice Labs has shipped between 2024 and 2026, three deployment patterns recur often enough to be considered defaults.
Pattern 1: LangGraph + LangSmith for stateful workflows
Used in roughly 9 of 18 deployments — most often where the agent must survive crashes, escalate to humans, or resume across sessions.
LangGraph models the workflow as a graph; LangSmith captures every trace; checkpoints persist state to Postgres. Postgres-backed checkpointers are the production default.
Pattern 2: CrewAI for content and research crews
Used in roughly 5 of 18 deployments — content generation, research synthesis, and coordinated drafting workflows.
CrewAI defines researcher, writer, and reviewer agents declaratively. We pair it with Langfuse or AgentOps for observability — CrewAI's built-in logging is good enough for development, not for production debugging.
Pattern 3: Semantic Kernel for .NET enterprises
Used in roughly 3 of 18 deployments — all of them on Azure with C# back ends and enterprise governance requirements.
Semantic Kernel slots into existing .NET service architectures and integrates natively with Azure OpenAI, Azure AI Search, and Application Insights. The plugin model maps cleanly to enterprise approval workflows.
The remaining deployment used AutoGen for a research-style multi-agent prototype that we later rebuilt on LangGraph for production hardening.
When to Use Which: A Decision Tree
In short
Start from your dominant constraint. If you need explicit control and durable state, pick LangGraph. If you need a fast role-based prototype, pick CrewAI. If you're on .NET, pick Semantic Kernel. If you're RAG-first, pick LlamaIndex. If you prioritize Python type safety, pick Pydantic AI.
We use a single decision rule across client engagements: identify the dominant constraint, then pick the framework whose core abstraction matches it.
- Need explicit control, durable state, observability? LangGraph + LangSmith. The default for stateful production workflows.
- Need a working multi-agent prototype in days? CrewAI. Define roles, assign tasks, ship.
- Building research-style conversational agents? AutoGen v0.4+ or AG2 — pick deliberately based on which API your community uses.
- Stack is .NET / C# / Azure? Semantic Kernel. C# parity, Azure integration, enterprise plugin model.
- Agent's main job is reasoning over private data? LlamaIndex. Retrievers, indexes, and query engines are first-class.
- Python team that values strict types? Pydantic AI. FastAPI ergonomics applied to agent code.
One last rule: framework switching is expensive. Plan to commit to your pick for at least the first year of production. Prompts and tool definitions port between frameworks; orchestration code does not.
Which should you choose?
Choose LangGraph if…
- Your workflow has explicit branching, retries, or human-in-the-loop approvals.
- You need durable state and the ability to resume agents after a crash or restart.
- Observability is non-negotiable — you need traces, evals, and replay (LangSmith).
- Your team ships in TypeScript / JavaScript as well as Python.
- You're building a long-running agent that must survive across sessions and tenants.
Choose CrewAI if…
- Your work decomposes cleanly into roles like researcher, writer, and reviewer.
- You need a working multi-agent prototype in days, not weeks.
- Your team prefers declarative agent definitions over graph modelling.
- You're shipping a Python-only stack and want the lightest dependency footprint.
- Your use case is content generation, research synthesis, or coordinated drafting.
Our verdict
Choose LangGraph when you need explicit control, durable state, and production-grade observability — it is the framework with the highest Alice Labs Production Score (8.9). Choose CrewAI when work decomposes naturally into roles and time-to-prototype is your dominant constraint — it scores 8.6 and ships demos faster than any other framework we use.
About the Authors & Reviewers

Co-Founder, Alice Labs
Co-Founder at Alice Labs. Author of 7 research reports on AI adoption, governance and labor markets cited across EU, OECD and US benchmarks.
- 8+ years in AI strategy & implementation
- Top-5 AI Speaker, Sweden (Mindley 2025)
- 100+ enterprise AI engagements
Frequently Asked Questions
Further reading
- LangGraph documentation· langchain-ai.github.io
- CrewAI documentation· docs.crewai.com
- AG2 (community continuation of AutoGen v0.2)· ag2.ai
- Pydantic AI documentation· ai.pydantic.dev
- Semantic Kernel documentation· learn.microsoft.com
Related services
Related reading
Best AI Agent Frameworks 2026: 6 Compared
The pillar comparison — wider scope including managed platforms.
11 min quick takeWhat Is an AI Agent?
The definitional companion to this comparison.
6 min quick takeWhat Is Agentic AI?
Agents vs agentic systems — the architectural distinction.
6 min deep diveWhy AI Projects Fail: 7 Root Causes
Failure patterns that no framework can fix on its own.
10 minSources
- LangGraph (langchain-ai/langgraph) — official repository(accessed 2026-04-28)
- LangGraph documentation — LangChain AI(accessed 2026-04-28)
- CrewAI (crewAIInc/crewAI) — official repository(accessed 2026-04-28)
- CrewAI documentation(accessed 2026-04-28)
- AG2 (ag2ai/ag2) — community continuation of AutoGen v0.2 lineage(accessed 2026-04-28)
- Microsoft AutoGen (microsoft/autogen) — v0.4+ rewrite(accessed 2026-04-28)
- LlamaIndex (run-llama/llama_index) — official repository(accessed 2026-04-28)
- Microsoft Semantic Kernel — official documentation(accessed 2026-04-28)
- Pydantic AI (pydantic/pydantic-ai) — official repository(accessed 2026-04-28)
- Alice Labs internal production scoring — 18 agent deployments (2024-2026)(accessed 2026-04-28)
Next scheduled review:
