AI evaluation platforms: AI search visibility ranking (2026)
How AI search engines rank ai evaluation platforms by visibility and citations. 20 brands measured monthly across Google AI Mode: which brands the AI names in answers, which domains it cites as sources, and how the leaders compare. AI evaluation platforms used to benchmark LLM outputs, score quality, automate evals, and improve reliability before and after deployment. Composite score: 70% visibility (% of AI answers naming the brand) + 30% citation rate (% citing the brand's domain). Full methodology →
Refreshed Jun 18, 2026Download this ranking as a PDF
We'll email it to you. One-off send — no list, no follow-up, no surprise marketing.
At a glance
What we observed in this categoryauto-generated
Braintrust leads the AI evaluation platforms category with a composite score of 43.8, nearly double that of second-ranked Galileo at 21.2. Both share an identical visibility score of 25.0 percent, which is ten times the category average of 2.5 percent, meaning only two brands are meaningfully present in AI-generated responses. The remaining 18 brands, including well-known names such as LangSmith, Arize AI, and Promptfoo, register zero composite score, making the gap between the top two and the rest near total.
The most striking divergence in this data is between visibility and citation. Braintrust holds 87.5 percent citation despite matching Galileo on visibility at 25.0 percent. Galileo's citation rate is only 12.5 percent, meaning Google AI Mode names Galileo at a similar frequency but trusts Braintrust as a source at seven times the rate. Maxim AI presents the inverse case: zero visibility but a 25.0 percent citation rate, indicating it is pulled as a reference without being surfaced as a recommended brand.
Google AI Mode is the sole engine recorded across all 20 brands in this audit, making it the exclusive anchor for AI visibility in this category. The top cited sources extend beyond brand domains to include reddit.com, youtube.com, purdue.edu, and gartner.com, suggesting the AI draws on community discussion and institutional authority alongside vendor content. Braintrust.dev and getmaxim.ai are the only brand-owned domains appearing directly in the cited sources list.
Movers & shakers since last refresh
Biggest visibility risers
-
Galileo 12% → 25% · rank #2 → #2+12pp
Biggest visibility fallers
-
LangSmith 12% → 0% · rank #3 → #4-12pp
The ranking
| # | Brand | Visibility | Citation | Top engine |
|---|---|---|---|---|
| 1 |
braintrust.dev
|
25% | 88% | Google AI Mode |
Braintrust leads all brands with a composite score of 43.8 and an 87.5 percent citation rate, making it the most trusted source in a category where the average citation rate is just 6.2 percent. |
||||
| 2 |
galileo.ai
|
25% | 12% | Google AI Mode |
Galileo matches Braintrust on visibility at 25.0 percent but converts that presence into only 12.5 percent citations, reflecting a named-but-not-trusted gap compared to the category leader. |
||||
| 3 |
getmaxim.ai
|
0% | 25% | Google AI Mode |
Maxim AI has zero visibility yet earns a 25.0 percent citation rate, the third highest in the category, indicating Google AI Mode references it as a source without surfacing it as a brand recommendation. |
||||
| 4 |
langchain.com
|
0% | 0% | Google AI Mode |
LangSmith fell from rank 3 to rank 4 after losing all 12.5 percentage points of its previous visibility, dropping to a composite score of zero with no citation presence to compensate. |
||||
| 5 |
humanloop.com
|
0% | 0% | Google AI Mode |
Humanloop sits at rank 5 with zero visibility, zero citations, and a composite score of zero, indistinguishable in the data from the six brands ranked directly below it. |
||||
| 6 |
arize.com
|
0% | 0% | Google AI Mode |
| 7 |
promptfoo.dev
|
0% | 0% | Google AI Mode |
| 8 |
parea.ai
|
0% | 0% | Google AI Mode |
| 9 |
patronus.ai
|
0% | 0% | Google AI Mode |
| 10 |
giskard.ai
|
0% | 0% | Google AI Mode |
| 11 |
trulens.org
|
0% | 0% | Google AI Mode |
| 12 |
athina.ai
|
0% | 0% | Google AI Mode |
| 13 |
honeyhive.ai
|
0% | 0% | Google AI Mode |
| 14 |
wandb.ai
|
0% | 0% | Google AI Mode |
| 15 |
fiddler.ai
|
0% | 0% | Google AI Mode |
| 16 |
comet.com
|
0% | 0% | Google AI Mode |
| 17 |
openai.com
|
0% | 0% | Google AI Mode |
| 18 |
vellum.ai
|
0% | 0% | Google AI Mode |
| 19 |
explodinggradients.com
|
0% | 0% | Google AI Mode |
| 20 |
confident-ai.com
|
0% | 0% | Google AI Mode |
Sources AI engines trust in this category
Across the 8 buyer-intent queries we ran on ai evaluation platforms, these are the domains Google AI Mode cited most often. If you're not on this list — or if your competitors are — that's a concrete PR / linkbuilding target.
How to read this ranking
Four things worth knowing before you act on the numbers above. These are the same definitions across every industry page — for category-specific observations, see the What we observed section above (where available) and the per-brand insights inline in the ranking.
Visibility = being named
A brand's visibility % is the share of AI answers that mention it by name in the response prose. This is who AI engines actively recommend to the buyer.
Citation rate = being trusted
Citation rate is the share of AI answers that include the brand's domain as a clickable source link. This is what the AI treats as authoritative evidence — different from being named.
Top engine differs by brand
The "top engine" column shows which AI surface each brand performs best on. Big gaps between a brand's score across engines usually points to specific content or schema gaps.
Rankings move month to month
AI engines re-crawl and re-rank on shorter cycles than classical search. We re-audit every brand on this list at least every 30 days and refresh this page automatically.
Get your own ai evaluation platforms brand audited
The brands above were curated from public market-leader lists. Want the same measurement against your own brand — including the queries you appear on, which competitors get named instead, and a prioritised fix list? Run a free preview.
Frequently asked about ai evaluation platforms AI visibility
Who leads AI visibility in AI evaluation platforms?
Braintrust leads with a composite score of 43.8, driven by a 87.5 percent citation rate that far exceeds the category average of 6.2 percent. Galileo is the only other brand with any meaningful visibility, at a composite score of 21.2.
What is the average AI visibility for brands in this category?
The average visibility across all 20 brands is just 2.5 percent, and the average citation rate is 6.2 percent. Only two brands, Braintrust and Galileo, exceed the average visibility figure at all.
Which brand has the biggest gap between being named and being cited?
Galileo shows the clearest gap, matching Braintrust on 25.0 percent visibility but achieving only 12.5 percent citations versus Braintrust's 87.5 percent. Maxim AI presents the reverse pattern, with zero visibility but a 25.0 percent citation rate.
What sources does Google AI Mode cite most when answering AI evaluation platform queries?
The top cited sources include braintrust.dev, reddit.com, youtube.com, augmentcode.com, getmaxim.ai, purdue.edu, gartner.com, and rhesis.ai. This mix shows the AI anchors on a combination of vendor content, community platforms, and institutional references.
Which brand has seen the biggest positive visibility change recently?
Galileo is the only recorded riser, gaining 12.5 percentage points in visibility to reach 25.0 percent, though its citation rate did not change alongside that gain.
Which brand lost the most AI visibility in this audit period?
LangSmith dropped 12.5 percentage points in visibility to reach zero, falling one rank from third to fourth. Its citation rate also remained at zero throughout.