Skip to main content

Data Catalog

The data catalog describes the kinds of public records the site is expected to support before those records exist at scale. It keeps the data section from becoming a dumping ground for CSV files and ad hoc metrics. Each data publication needs a schema, source boundary, update cadence, and correction process before it becomes citable.

The public data surface should favor small, well-described records over broad claims. Aggregate records can support transparency, but they can also be misread if the population, exclusions, and collection window are unclear. Every dataset should state what it measures and what it does not measure.

Planned Data Surfaces

SurfaceRecord TypeRequired Context
Advisory IndexAdvisory summaries and canonical URLsStatus, affected product, publication date, remediation state
Report IndexPublic report summariesMethodology, publication state, revision history
Tool RegistryBrowser and downloadable tool metadataLocal-only flag, version, input boundary, support status
Schema RegistryJSON schemas and schema versionsVersion, compatibility notes, validation command
Aggregate MetricsCounts and timelines derived from public recordsPopulation, time window, exclusions, bias notes

Citation Rule

A data page should be citable without private context. The reader should be able to identify the schema, the source boundary, the record date, the update cadence, and the correction path from the public page. If a record cannot supply those fields, it should remain internal or be published as narrative context instead of structured data.

Correction Path

Data corrections should preserve trust by being explicit. A corrected record should identify the field changed, the reason for the change, the review date, and whether derived metrics were affected. Silent changes to public data are reserved for non-semantic formatting errors.