Application Reports

Comprehensive legal technology platform providing advanced research, document analysis, and case management solutions for legal professionals

Vals Logo

Counsel Stack

LAST UPDATED: 10/15/2025

VendorCounsel Stack
Founded2023
VLAIR EvaluationLegal research evaluated

Counsel Stack offers multiple specialized LLMs for legal research, document review and client communications. In the VLAIR legal research study, their research product achieved the highest performance across all evaluation criteria, ranking first among all participants including legal AI products, generalist AI, and the lawyer baseline.


Performance Summary

Counsel Stack was evaluated on legal research capabilities across 200 U.S. legal research questions, with responses scored on three weighted criteria: accuracy (50% weight), authoritativeness (40% weight), and appropriateness (10% weight).

Aggregate Weighted Score
About This Task
Overall performance across all criteria - weighted combination of accuracy (50%), authoritativeness (40%), and appropriateness (10%).
Counsel Stack
78%
Lawyer baseline
69%
Avg of all participants
76%
Accuracy
About This Task
Substantive correctness without incorrect elements - whether the response answers all elements correctly without misinterpretations or factual errors.
Counsel Stack
81%
Lawyer baseline
71%
Avg of all participants
80%
Authoritativeness
About This Task
Sufficiency and strength of cited sources - whether the response cites relevant primary sources that are valid and support the statements.
Counsel Stack
77%
Lawyer baseline
68%
Avg of all participants
73%
Appropriateness
About This Task
Form, format and suitability for purpose - whether the response is easy to understand and could be immediately shareable with colleagues or clients.
Counsel Stack
71%
Lawyer baseline
60%
Avg of all participants
68%

Counsel Stack demonstrated strong performance across all evaluated criteria, achieving the highest scores among all participants with 81% accuracy, 77% authoritativeness, and 71% appropriateness, resulting in a 78% aggregate weighted score.


Key Strengths

Based on VLAIR evaluation results, Counsel Stack demonstrates several notable capabilities:

  1. Highest overall performance: Achieved the top score across all three evaluation criteria (accuracy, authoritativeness, and appropriateness) among all participants in the study, including other legal AI products, generalist AI, and the lawyer baseline.

  2. Superior accuracy: At 81% accuracy, Counsel Stack achieved the highest accuracy score, demonstrating strong capability in providing substantively correct responses with minimal misinterpretations or factual errors.

  3. Strong source citation: Scored 77% on authoritativeness, leading all participants in identifying and citing relevant and valid primary law sources, an essential differentiator for legal AI products over generalist alternatives.

  4. Comprehensive legal coverage: Successfully provided responses to 196 of 200 questions, with only 4 technical timeout issues, demonstrating broad coverage of U.S. legal research questions across federal and state jurisdictions.

  5. Specialized LLM architecture: Built with multiple specialized LLMs specifically designed for legal research, enabling the product to handle the unique requirements of legal source identification and citation.

  6. Consistent quality: Outperformed the lawyer baseline by 9 percentage points overall (78% vs 69%), demonstrating the ability to augment and enhance legal research capabilities for practicing attorneys.

  7. Superior to generalist AI on sources: Achieved significantly higher authoritativeness scores compared to generalist AI products, reflecting the value of access to proprietary legal databases and specialized training on legal sources.


Based on the Vals Legal AI Report (VLAIR) - Legal Research, October 2025 - Independent benchmarking study of legal AI products on U.S. legal research questions contributed by Am Law 100 law firms.

Join our mailing list to receive benchmark updates

Model benchmarks are seriously lacking. With Vals AI, we report how language models perform on the industry-specific tasks where they will be used.

By subscribing, I agree to Vals' Privacy Policy.