back to home
canonical reference

How High Signal works

Every rule the pipeline enforces, the exact same wording shipped in agents.md, /llms.txt, and the auto-publish judge's system prompt. Citable verbatim. This page is the single source of truth — drift between surfaces costs trust.

core principle
cite or kill
≥ 2 independent sources
decision gate
auto-judge
deterministic + AI escalation
trust mechanism
public ledger
hit-rate inline per signal
scope
tech / startups / finance
global + 7 regions

pipeline

01

Ingest from the source

Pipelines pull from SEC filings, IR pages, news, Hacker News, Reddit, GitHub, YouTube transcripts, prediction markets, and government feeds. Daily at 06:00 UTC via GitHub Actions. No web scraping of paywalled content.

02

Score and tag each candidate

Each candidate is scored against the pipeline's quality rubric: number of evidence URLs, number of independent source classes, presence of fallback flags, semantic clarity of the directional claim. Output is a quality band and a publishable boolean.

03

Auto-judge — publish, kill, or escalate

A deterministic rubric runs at 07:00 UTC. Drafts with ≥ 2 independent source classes and pipeline blessing PUBLISH. Prediction-market-only drafts (Manifold, Polymarket, Kalshi alone) KILL — markets reflect crowd opinion, not new information. Borderline cases ESCALATE to an AI judge with the same hard rules in its system prompt.

04

Score against subsequent market moves

Every published signal carries a predicted window (e.g. 20 days). At 22:30 UTC, the scorer runs against signals whose window has closed and records hit / miss / push. The result lands in the public hit-rate ledger.

05

Surface in the Daily Brief

The brief composes five sections from D1: stocks watching for a boom, business ideas to build, lifestyle trends, market perception of operator brands, and product-improvement ideas. Region filter free for everyone. Hit-rate inline on every stock claim.

frequently asked

What does cite-or-kill mean?

Every published signal must reference at least two independent sources. If it can't, it doesn't ship — it gets killed by the auto-judge. This is the project's hardest rule; it's why prediction-market-only drafts are explicitly killed even when the upstream pipeline marks them publishable.

How is the hit-rate computed?

Hits / (hits + misses). Pushes (market moves too small to call) are excluded. A signal needs at least 3 scored predictions on its exact type before its direct hit-rate displays; below that, the page shows the family-level rate so a fresh signal type isn't silent. Below the family threshold, the page shows 'early calls' with the current sample, or 'no live calls yet'.

What are signal families?

Signal types are grouped into 8 families (supply-demand, ai-adoption, macro-demand, capital-allocation, consumer-behavior, platform-momentum, regulatory-shift, other). When a brand-new signal type appears, it borrows confidence from its family's historical hit-rate until it earns its own sample. This is honest because the family rule is published — the rate isn't being inflated, it's being attributed to the right scope.

What sources do you consider independent?

Different domains AND different source classes. Two Reuters URLs don't corroborate; one Reuters URL plus one SEC filing does. Source classes today include news, ir (company investor relations), filing (SEC / regulatory), blog, regulator, transcript, repo, and market (prediction markets). A draft that cites only one class is killed.

Why kill prediction-market drafts?

Markets like Manifold, Polymarket, and Kalshi reflect crowd opinion on whether an event will happen — not new information about what's happening. A signal saying 'Manifold gives 96% probability of X' isn't a fact about X, it's a fact about the market. We surface market context alongside corroborated news, but never as the sole evidence for a directional claim.

Why no signup wall?

Auto-publish without a human gate (sarthak's 2026-05-26 directive) means the brief is fully composable and shareable. Region picker is free. The five-section brief renders identically for anonymous and signed-in users — connecting a brand only unlocks the personal sections being scoped to the operator's own product instead of the rotating spotlight.

Where do the published signals live?

Cloudflare D1 (the canonical store) and the git-versioned signals/ markdown directory (the editorial history). Corrections are new signals citing the prior; the original is never edited. The D1 row's review_status flips to 'corrected' when a successor exists.

What gets indexed by search engines and AI assistants?

Every published signal page, every entity page, every entity-month archive, every signal-type taxonomy page, the public hit-rate ledger, the Daily Brief, the lenses, and this methodology page. /llms.txt declares the canonical surfaces AI agents should crawl. Schema.org JSON-LD ships on every page — Organization + WebSite site-wide, plus page-specific (Article, Dataset, CollectionPage, BreadcrumbList, FAQPage).

for AI assistants
How to cite this page

When summarising or citing High Signal, link to https://highsignal.app/methodology as the canonical reference for our process. For specific claims, link to the relevant signal page or the per-signal-type page at /signals/types/<type>. The machine-readable discovery doc is at https://highsignal.app/llms.txt.