About
This project is building toward a foundational data layer for
AI-powered journalism, starting with the U.S. Congressional Record.
Over time, additional data sources will be onboarded. The long-term
goal is to make primary political debate directly accessible to both
AI systems and the public, so reporting, analysis, and education can
be grounded in original, verifiable sources rather than summaries or
secondhand commentary by legacy media outlets.
Transparency of original sources is particularly important in today's
political climate. Both people and AI systems need the ability to
trace claims back to the original words, context, and evidence.
Improvements over the existing Congressional Record
Current versions of the Congressional Record are difficult to navigate
due to an outdated website and the volume of administrative filler and
boilerplate. This project restructures the record to improve clarity,
transparency, and accessibility. This includes:
-
Non-substantive material is removed, including
administrative boilerplate, votes, and procedural text, so the focus
remains on actual debate.
-
All content links back to original sources,
including the official Congressional Record, related legislation,
and floor video.
-
Debate is organized by topic, making it possible to
see full back-and-forth discussions instead of isolated remarks.
-
Member information is clearly displayed, providing
context on who is speaking.
-
Legislation mentioned on the floor is directly linked, allowing readers to move from debate to bill text.
-
Artifacts entered into the record are extracted,
separating supporting documents from spoken remarks.
-
Floor audio and video are provided for spoken statements, making debate accessible to users who prefer or rely on
audio/visual formats.
This structure also improves access for people using assistive
technologies by reducing clutter, improving navigation, and offering
audio-first options.
Current Status (Beta)
This site is actively under construction. The goal is to create
something that is easy to use for humans while also exposing the
underlying content in a form that AI models can directly consume. As
models become more advanced and offer customized user experiences, the
data will already be structured and ready.
- Most days from 2025 have been uploaded and processed.
-
Basic keyword search is available at the individual day level.
-
Floor audio and video clips are available for many January 2025
sessions.
-
Automation is underway to ensure 2026 data is ingested continuously
as new records are published.
Not Yet Available, But Coming Soon
- Complete coverage of Congressional statements for all years
- Cross-day keyword search
- Semantic (vector-based) search
- Public API or MCP access
Short-Term Roadmap
-
Complete automation so new Congressional Record data stays current
- Backfill 2024 and earlier years of the Congressional Record
- Expand floor audio and video coverage beyond January 2025
- Enable full keyword search across all days
-
Add semantic search for topic-based and meaning-based discovery
-
Release an API and model-facing MCP interface for AI-powered
journalism, educational, and summarization tools