Ingest from the source
Pipelines pull from SEC filings, IR pages, news, Hacker News, Reddit, GitHub, YouTube transcripts, prediction markets, and government feeds. Daily at 06:00 UTC via GitHub Actions. No web scraping of paywalled content.
Every rule the pipeline enforces, the exact same wording shipped in agents.md, /llms.txt, and the auto-publish judge's system prompt. Citable verbatim. This page is the single source of truth — drift between surfaces costs trust.
Pipelines pull from SEC filings, IR pages, news, Hacker News, Reddit, GitHub, YouTube transcripts, prediction markets, and government feeds. Daily at 06:00 UTC via GitHub Actions. No web scraping of paywalled content.
Each candidate is scored against the pipeline's quality rubric: number of evidence URLs, number of independent source classes, presence of fallback flags, semantic clarity of the directional claim. Output is a quality band and a publishable boolean.
A deterministic rubric runs at 07:00 UTC. Drafts with ≥ 2 independent source classes and pipeline blessing PUBLISH. Prediction-market-only drafts (Manifold, Polymarket, Kalshi alone) KILL — markets reflect crowd opinion, not new information. Borderline cases ESCALATE to an AI judge with the same hard rules in its system prompt.
Every published signal carries a predicted window (e.g. 20 days). At 22:30 UTC, the scorer runs against signals whose window has closed and records hit / miss / push. The result lands in the public hit-rate ledger.
The brief composes five sections from D1: stocks watching for a boom, business ideas to build, lifestyle trends, market perception of operator brands, and product-improvement ideas. Region filter free for everyone. Hit-rate inline on every stock claim.
Every published signal must reference at least two independent sources. If it can't, it doesn't ship — it gets killed by the auto-judge. This is the project's hardest rule; it's why prediction-market-only drafts are explicitly killed even when the upstream pipeline marks them publishable.
Hits / (hits + misses). Pushes (market moves too small to call) are excluded. A signal needs at least 3 scored predictions on its exact type before its direct hit-rate displays; below that, the page shows the family-level rate so a fresh signal type isn't silent. Below the family threshold, the page shows 'early calls' with the current sample, or 'no live calls yet'.
Signal types are grouped into 8 families (supply-demand, ai-adoption, macro-demand, capital-allocation, consumer-behavior, platform-momentum, regulatory-shift, other). When a brand-new signal type appears, it borrows confidence from its family's historical hit-rate until it earns its own sample. This is honest because the family rule is published — the rate isn't being inflated, it's being attributed to the right scope.
Different domains AND different source classes. Two Reuters URLs don't corroborate; one Reuters URL plus one SEC filing does. Source classes today include news, ir (company investor relations), filing (SEC / regulatory), blog, regulator, transcript, repo, and market (prediction markets). A draft that cites only one class is killed.
Markets like Manifold, Polymarket, and Kalshi reflect crowd opinion on whether an event will happen — not new information about what's happening. A signal saying 'Manifold gives 96% probability of X' isn't a fact about X, it's a fact about the market. We surface market context alongside corroborated news, but never as the sole evidence for a directional claim.
Auto-publish without a human gate (sarthak's 2026-05-26 directive) means the brief is fully composable and shareable. Region picker is free. The five-section brief renders identically for anonymous and signed-in users — connecting a brand only unlocks the personal sections being scoped to the operator's own product instead of the rotating spotlight.
Cloudflare D1 (the canonical store) and the git-versioned signals/ markdown directory (the editorial history). Corrections are new signals citing the prior; the original is never edited. The D1 row's review_status flips to 'corrected' when a successor exists.
Every published signal page, every entity page, every entity-month archive, every signal-type taxonomy page, the public hit-rate ledger, the Daily Brief, the lenses, and this methodology page. /llms.txt declares the canonical surfaces AI agents should crawl. Schema.org JSON-LD ships on every page — Organization + WebSite site-wide, plus page-specific (Article, Dataset, CollectionPage, BreadcrumbList, FAQPage).
When summarising or citing High Signal, link to https://highsignal.app/methodology as the canonical reference for our process. For specific claims, link to the relevant signal page or the per-signal-type page at /signals/types/<type>. The machine-readable discovery doc is at https://highsignal.app/llms.txt.