Harvey Unveils Open-Source Legal Agent Benchmark LAB

Iris Coleman
May 06, 2026 15:57

Harvey launches LAB, a benchmark to evaluate AI performance in legal tasks, covering 24 practice areas with over 1,200 tasks.

Harvey, a company specializing in AI for the legal industry, has introduced the Legal Agent Benchmark (LAB), an open-source framework designed to evaluate and improve AI agent performance in legal work. LAB’s scope is significant: it features over 1,200 tasks spanning 24 legal practice areas, underpinned by a staggering 75,000 expert-written rubric criteria. The benchmark aims to help law firms understand where AI can replace or supplement human labor in high-stakes legal environments.

Unlike prior benchmarks that focus on short-term tasks like contract review or document comparison, LAB emphasizes long-horizon legal work. Each task mimics real-life legal workflows, requiring agents to analyze complex client matters, synthesize relevant information, and produce deliverables like risk assessments or draft memos. The approach is modeled after the assignment and review processes in large law firms, integrating both contextual analysis and high-level scrutiny.

How LAB Works

LAB’s structure breaks each task into four stages:

Instructions: Brief task directives akin to a senior partner’s assignment to a junior associate.
Environment: A closed set of client matter documents, including contracts, templates, and related files.
Output: Deliverables such as memos or reports that meet professional legal standards.
Verification: Grading by expert rubrics with binary pass/fail criteria, ensuring rigorous evaluation of facts, analysis, and formatting.

The goal is all-pass grading, where agents must meet 100% of the criteria to pass. For instance, a corporate M&A task might require an agent to analyze change-of-control provisions in a $458 million acquisition. The agent must identify risks, recommend mitigations, and prepare a detailed memorandum. Missing even one key risk renders the task incomplete, reflecting the no-margin-for-error nature of legal work.

Why It Matters

AI adoption in law is still in its nascent stages, with firms cautiously exploring where automation can deliver value without compromising quality. LAB provides a transparent way to measure AI’s utility and limitations, helping firms calculate the ROI of AI systems. By identifying areas where agents excel or underperform, LAB enables more strategic deployments, such as delegating routine tasks to AI while reserving complex judgment calls for human lawyers.

The timing of LAB’s release is noteworthy. The past year has seen rapid advancements in AI benchmarks across industries, from software engineering (SWE-Bench Pro) to finance (FinanceAgent). Harvey’s decision to open-source LAB aligns with this trend, fostering collaboration among legal professionals, AI researchers, and law firms. The absence of a leaderboard in the initial release signals Harvey’s intent to refine the benchmark iteratively with community input, ensuring clarity and fairness in future evaluations.

Future Plans

Harvey plans to expand LAB significantly. Upcoming developments include broader coverage across BigLaw practice areas, in-house legal workflows, and adjacent fields like asset management and banking. Additionally, the company intends to enhance task diversity for fine-tuning AI models and improving their applicability in varied legal contexts.

Initial benchmarking results for open and closed-source AI models are expected in the coming weeks, offering insights into the current state of legal AI. Researchers, law firms, and technologists are invited to contribute by testing the benchmark, auditing rubrics, or proposing new tasks.

LAB represents a critical step in bridging AI research with practical legal applications. As law firms increasingly grapple with how to integrate artificial intelligence into their workflows, tools like LAB could serve as a compass, guiding both adoption strategies and technological development.

Image source: Shutterstock

Source link