Skip to main content

Conversion Model

The file converter is built around a conservative conversion model. It handles data transforms that can be represented safely inside the browser without uploading input or invoking server-side parsers. The current model focuses on bytes, text, structured text, and record-shaped formats. It intentionally avoids pretending to be a universal document conversion service.

A universal converter is a different class of system. Office documents, PDFs, images, archives, and executable formats require parsers that can be large, complex, and vulnerable. They may also require rendering engines or decompression logic. Those capabilities can still be added later, but each format family needs dependency review, input limits, test fixtures, and a clear unsupported-case policy.

Format Families

FamilyCurrent SupportExpansion Rule
BytesBase64, Base64URL, hex, data URL, JSON byte arrayKeep whole-buffer limits visible
TextUTF-8, percent encoding, plain text outputPreserve encoding assumptions in output metadata
JSONPretty, minified, envelope, NDJSON, CSV, TSV, Markdown tableReject ambiguous objects when tabular output would lose data
Delimited TextCSV, TSV to rows or objectsKeep delimiter and quoting behavior documented
DocumentsNot supportedRequire reviewed parser or WASM module
ArchivesNot supported for extractionConsider metadata-only listing first
ImagesNot supported for transcodingRequire canvas/WASM review and metadata rules

Trust Boundary

The converter should not fetch remote files, upload local input, write persistent storage, or infer that conversion proves safety. It can convert representation. It cannot certify that the file is benign, complete, or semantically equivalent for every downstream tool. Output should be treated as a transformed artifact that still requires user review.

Expansion Criteria

A new format must bring tests with representative inputs, malformed inputs, large inputs, and expected failure behavior. The implementation should fail closed when a conversion would be lossy or ambiguous. If a dependency is required, the dependency ADR should describe maintenance state, package size, parser risk, browser compatibility, and why native APIs are insufficient.