AI & Provenance

Nearly all of the code in RustQC was written by AI coding agents, primarily Claude (Anthropic) and Seqera AI. Claude drove most architectural decisions, wrote tests, drafted documentation, and iterated on output-matching bugs. Seqera AI was used for writing the benchmarking pipeline and nf-test configs, running benchmarks, analysing logs, traces, and metrics, and coding. Humans guided the AI, defined the validation strategy, and reviewed the results, outputs, and documentation.

RustQC is not a clean-room reimplementation. The source code of every upstream tool was used as reference during development, both by humans reading it and by providing it as context for AI agents. This was necessary for matching output formats, numerical edge cases, and undocumented behaviour.

How correctness was validated

RustQC validates correctness by comparing its output against the original tools on real sequencing data. Integration tests run against reference files generated by the original R and Python tools, and CI runs the full test suite on every pull request. Outputs were also validated end-to-end using the nf-core/rnaseq pipeline test profiles.

Per-tool comparison tables and methodology are in the benchmark documentation.

Known validation gaps

Benchmarks focus on human RNA-seq data (GRCh38). Other organisms, genome builds, and non-Illumina platforms have not been formally validated.
RustQC implements the subset of each tool used by the nf-core/rnaseq pipeline. Unsupported features (e.g., samtools mpileup, the full featureCounts option set) should produce an error.
Validation is pinned to specific upstream versions (see benchmark details). If upstream tools change their output in a new release, RustQC may diverge until re-validated.

Reporting discrepancies

If you find a case where RustQC output differs from the upstream tool, please open an issue with a minimal reproducible example. The validation suite grows as new edge cases surface.

rewrites.bio

RustQC follows the rewrites.bio principles for AI-assisted bioinformatics tool rewrites.