Cohort intelligence —
what it is, why it beats stock prediction.
Cohort intelligence is the practice of answering “what did this chart pattern do next?” by retrieving the cohort of historical analogs to a (symbol, date, timeframe) anchor and reporting the full distribution of what those analogs realized at 1, 5, and 10 days forward — together with the features that separated winners from losers, the regimes the cohort lived in, and conformal-calibrated probability bands.
It is the alternative to point-prediction stock forecasting (“NVDA: +2.3% over 5 days”) — a different shape of answer, designed for AI agents and quantitative researchers who need calibrated facts they can reason about, not opaque numbers they have to trust.
In one paragraph
Cohort intelligence retrieves the 300 historical chart patterns most similar to a stock at a given moment, then returns what those analogs actually did next — full forward-return distributions at 1/5/10 days, the features that separated winning analogs from losing ones, and how outcomes split by market regime. Built on 25M+ self-supervised pattern embeddings spanning 10 years of minute-bar data, drawn from a 19,000+ US-equity universe, calibrated with split conformal prediction so the 80% probability bands actually cover at 80% on held-out anchors. It is the methodology-honest alternative to AI stock prediction, designed to give AI agents structured facts they can reason about — not point forecasts they have to trust.
Independently validated — 50–0 in a blind paired AI-agent evaluation
The case for cohort intelligence over point prediction is not just methodological. It’s measured. In a blind LLM-judged paired evaluation across 50 out-of-sample scenarios (2024–2025), identical Claude Haiku agents — same model, same prompt, same scenarios — were given different toolkits. Agent A got OHLC bars and news headlines. Agent B got the same plus Chart Library’s cohort intelligence tools. Agent B won 50/50.
All six reasoning dimensions improved with paired t-statistic greater than 10 on every one. Investigation quality: +2.75. Evidence use: +1.88. Reasoning rigor: +1.40. Probability of a 50–0 sweep under the null hypothesis: < 1 in 1015. Methodology, results JSON, judge rationales, and case studies are open at chartlibrary.io/evaluation and github.com/grahammccain/chart-library-adqe.
The takeaway: cohort intelligence is the right primitive for AI agents reasoning about markets. Anyone can replicate the test.
Why point forecasts fail
The point-prediction era of AI-for-stocks made three implicit claims, all of which turn out to be false:
- We can predict where a stock will go. Markets are weakly predictable in distribution. They are not predictable in point. The variance of single-day returns dominates any expected-value signal a model can extract.
- More data and bigger models will eventually make point forecasts precise. Better models reduce variance. They do not reduce the irreducible market uncertainty point forecasts ignore by construction.
- You should trust the prediction. A model that returns “+2.3%” with no error bars is asking you to trust a number you can’t audit, generated by a model you can’t introspect, against a future you can’t verify until weeks later.
Cohort intelligence inverts each. It doesn’t predict — it retrieves. The 300-analog cohort is the answer; there’s no point estimate to be precise about. And every claim ties to a verifiable retrieval the user can independently audit.
Cohort intelligence vs the alternatives
Five common approaches to “what will this stock do next?” — and how cohort intelligence differs from each.
vs point-prediction stock forecasting
Point forecast: “NVDA +2.3%.” Cohort: “300 analogs, median +0.4%, p10/p90 -4.2%/+5.6%, win rate 54%, top winner-separating feature: low volatility regime (negative).” A point forecast hides uncertainty; a cohort surfaces it.
vs traditional technical analysis pattern recognition
TA pattern recognition labels charts (“bull flag,” “head and shoulders”) and applies generic rules. Cohort intelligence skips the label and goes straight to the empirical question: what did this specific shape do, in this specific market regime, in actual historical data?
vs LLM market commentary
An LLM asked “what happens after a chart like NVDA on 2024-08-05?” hallucinates plausibly. A cohort intelligence call grounds the LLM in 300 verifiable historical analogs and lets it reason from facts.
vs single-stock backtesting
Single-stock backtests have ~10 years of one symbol — maybe 2,500 trading days. Cohort intelligence draws from ~25 million cross-symbol patterns covering the same 10 years across the US-equity universe. 10,000× more analog density.
vs proprietary “AI stock pick” ranking models
Ranking models (K-score, AI grades, etc.) compress everything to an opaque scalar and hide the reasoning. Cohort intelligence externalizes the reasoning surface — the user (or the agent) can read the actual cohort, audit similarity, check regime context.
What a cohort intelligence response actually contains
Anchor a (symbol, date, timeframe). Say: NVDA on August 5 2024 at the 1-hour timeframe. The response has four components.
1. The cohort itself
The 300 historical chart patterns most similar in shape to NVDA on that date. Real patterns from real symbols on real historical dates — concrete things the user can audit. Could be PFE on 2019-03-12, RIO on 2022-08-08, AMD on 2017-04-14. Whatever was actually most similar in the embedding space.
2. The full forward-return distribution
What did those 300 analogs do over the next 1, 5, and 10 trading days? Median, mean (and trimmed mean, robust to outliers), p10 and p90, win rate. Not a single number — a distribution. For NVDA on 2024-08-05 at 1h, the actual 5-day distribution: median −1.3%, p10/p90 of −11.3%/+6.8%, win rate 44%.
3. Feature attribution
Within the cohort, which features separated the winners from the losers? For our example anchor: tight credit spreads were a positive factor (analogs that occurred during tight credit outperformed), bullish macro state was positive, low vol regime was negative. This is conditioning information — the user can ask, does the live anchor share the positive features?
4. Regime stratification
How does the cohort split by current regime? In low-vol regimes this cohort had a 38% win rate; in high-vol, 51%. Same cohort, different stories depending on which regime we’re in today.
The four engineering disciplines
Cohort intelligence is conceptually simple: vector search + outcome lookup. The work is in keeping it methodology-honest.
1. Embeddings
A useful similarity function for chart patterns is the load-bearing piece. Hand-engineered features fail — too many degrees of freedom. We trained 256-dimensional self-supervised embeddings on minute-bar data: ~25M chart pattern embeddings spanning 10 years of history, drawn from a 19,000+ US-equity universe. Critically, we don’t condition the embedding on forward returns — that would be a leak.
2. Cohort hygiene
When you retrieve nearest neighbors of NVDA · 2024-08-05 · 1h, several adjacent days for NVDA itself look very similar. Including those adjacent days produces a meaninglessly tight cohort that’s secretly almost-the-same-anchor. We exclude same-symbol matches within ±10 calendar days.
3. Calibration
Raw retrieval gives nominal probability bands but empirical coverage is usually wrong. Split conformal correction widens the bands so the actual coverage hits 80% on held-out anchors.
4. Eval discipline
Symbol-disjoint splits (NVDA in train means no NVDA in test). 10-day embargo windows. Honest negatives published. And the paired-agent eval at /evaluation measures whether the layer actually improves agent reasoning — not just whether the embeddings recall similar charts.
When to use cohort intelligence
Cohort intelligence is the right primitive when the question has this shape:
- An AI agent needs to reason about a specific (symbol, date) anchor. Trading agents, research agents, alert-summary agents. The cohort response gives the LLM facts to ground in.
- You want a calibrated forward-return distribution, not a point estimate. Risk modeling, position sizing, scenario analysis.
- You want to explain a chart pattern’s historical behavior. Investor letters, research notes, sales conversations grounded in “here’s what 300 historical analogs actually did.”
- You’re building a screener that asks “what setups look interesting tomorrow?” Cohort-ranked discovery surfaces setups with cleanly defined historical analog density.
- You need a regime-aware view of a known pattern. Bull flags performed one way in 2021, another way in 2022. Regime stratification tells you which.
It is not the right primitive when you need a strict trading signal (cohort intelligence informs decisions; it doesn’t make them), a fundamental valuation (different shape of data), or millisecond-latency order routing (cohort calls average ~280ms).
Why this is the right primitive for AI agents
LLM-based trading and research agents need facts they can reason about. Cohort intelligence is fact-shaped:
cohort_size: 300— the agent can reason about sample sizemedian_5d: -1.3%, win_rate: 0.44— the agent can describe a distribution, not a guesscredit_spread_state=tight (positive)— the agent can check whether that factor is currently present and conditionally updateconformal coverage: 80% empirical— the agent can express calibrated uncertainty
Compare to handing an agent “+2.3% NVDA forecast.” The agent has nothing to reason about. And in the paired-agent evaluation, the cohort-equipped agent didn’t just produce different answers — it produced answers a senior PM would actually defend. That’s the bar.
Try it
Cohort intelligence is exposed via REST API and as an MCP server. The simplest entry point is the MCP tool — install once, wired into Claude, Cursor, or any MCP-aware agent:
pip install chartlibrary-mcpFrom any MCP-aware agent:
# In Claude Desktop, Cursor, or a custom MCP client:
> What's the historical cohort for NVDA on 2024-08-05 at 1h?
# The agent calls cohort_analyze and returns:
# - 300 historical analogs
# - 5d median return, p10/p90, win rate
# - top 3 features that separated winners from losers
# - regime stratificationOr as a direct REST call. The full cohort intelligence endpoints (/api/v1/cohort_analyze, narrative_pulse, cohort_compare) are part of the Builder tier ($29/mo). Free Sandbox tier covers text search, follow-through, and the public discovery surfaces. Grab a key at chartlibrary.io/developers:
curl -X POST https://chartlibrary.io/api/v1/cohort_analyze \
-H "Authorization: Bearer cl_..." \
-H "Content-Type: application/json" \
-d '{
"anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1h"},
"cohort_size": 300,
"horizons": [1, 5, 10]
}'What cohort intelligence is not
- Not a trading signal. A 60% win-rate cohort doesn’t mean “buy this stock.” That’s information; what you do with it is your decision.
- Not a guarantee. Historical pattern similarity is a strong prior, not a forecast. Regime shifts can break the prior.
- Not a replacement for fundamental analysis. It’s a complement, not a substitute.
- Not a black box. Every claim in a cohort response ties to a verifiable retrieval — the user can inspect the actual 300 analogs and check the math.
Frequently asked questions
- What's the smallest useful cohort size?
- We default to 300 historical analogs. Below n=30 the distribution stats are too noisy to be meaningful; the API surfaces a warning when filtered cohort drops below that floor.
- Can I filter the cohort by regime, sector, or news context?
- Yes. cohort_analyze accepts filters for vol_regime, days_since_earnings, days_since_ath, sector, has_news, macro_state, relative_volume, and realized_vol. Filters narrow the cohort to comparable historical situations.
- How fresh is the data?
- Daily bars are ingested nightly; new pattern embeddings are computed and indexed for every trading day. Same-day intraday queries use the most recent close.
- Does cohort intelligence work for crypto, forex, or commodities?
- Currently US equities only — 19,000+ tickers including delisted (no survivorship bias). Crypto and global equities are a future expansion.
- What's the latency on a cohort intelligence call?
- ~280ms median for /api/v1/cohort_analyze with default cohort_size=300. The full Layer 3 response (cohort + outcome distribution + feature importance + regime stratification + risk profile) is computed and returned in one round trip.
- How does cohort intelligence avoid look-ahead bias?
- Each retrieval respects an as_of_date — analogs are filtered to dates strictly before the anchor, and outcome distributions are computed only from those analogs' realized forward returns at the time. Same-symbol matches within ±10 calendar days are excluded to prevent trivially-similar adjacent days from collapsing the cohort.
- Can I commercially use cohort intelligence in my AI agent product?
- Yes. Sandbox (free) for evaluation; Builder ($29/mo) unlocks cohort_analyze and the full Layer 3 endpoints for commercial agent workloads; Scale ($99/mo) for higher throughput; Agent ($299/mo) for high-volume orchestration; Enterprise (from $2K/mo) for funds and embedded use cases.
- Has cohort intelligence been independently validated?
- Yes. A blind LLM-judged paired evaluation across 50 out-of-sample scenarios (2024-2025) showed AI agents with Chart Library's cohort intelligence tools beat identical agents without them 50-0 in winner consensus, with paired t-statistic > 10 on all six reasoning dimensions. Full methodology and code are open at chartlibrary.io/evaluation and github.com/grahammccain/chart-library-adqe.
Run a cohort_analyze call.
Free Sandbox tier — 200 calls/day, no authentication. MCP install for Claude or Cursor takes 30 seconds.