Introduction
RustQC is a fast quality control toolkit for sequencing data, written in Rust. It reimplements several established bioinformatics QC tools as a single compiled binary with no runtime dependencies:
- dupRadar: RNA-seq PCR duplicate rate analysis
- featureCounts: gene-level read counting and biotype quantification
- RSeQC: 8 RNA-seq quality control modules (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance, TIN)
- preseq: library complexity extrapolation (lc_extrap)
- samtools: flagstat, idxstats, and full stats output including all histogram sections
- Qualimap: gene body coverage profiling and RNA-seq QC summary
Why RustQC?
Section titled “Why RustQC?”RNA-seq quality control typically involves running multiple tools written in R and Python, each with their own dependencies, interpreters, and runtime overhead. RustQC consolidates these into a single fast binary.
Run time for a large paired-end RNA-seq BAM (~186M reads) on AWS. RSeQC TIN alone takes 9h 45m; RustQC completes everything in 14m 54s.
Key advantages:
- Processes ~186M reads in 14m 54s vs. ~15h 34m of sequential tool runtimes on AWS, including TIN which takes 9h 45m alone in the traditional workflow
- No runtime dependencies: no R, Python, or Bioconductor installation required
- The
rnasubcommand performs read counting and duplicate analysis simultaneously, eliminating the need for a separate featureCounts run - Produces identical or near-identical results to the original tools (see benchmark details)
- Process several BAM files in a single command with automatic parallelisation
- Accepts SAM, BAM, and CRAM input files; GTF annotation files can be plain or gzip-compressed
Available tools
Section titled “Available tools”rustqc rna — RNA-Seq quality control pipeline
Section titled “rustqc rna — RNA-Seq quality control pipeline”Given a duplicate-marked BAM file and a GTF annotation, rustqc rna runs all of the following in a single pass:
| Tool | Equivalent | Description |
|---|---|---|
| dupRadar | dupRadar | RNA-seq PCR duplicate rate analysis |
| featureCounts | featureCounts | Gene-level read counting and biotypes |
| Qualimap rnaseq | Qualimap rnaseq | Gene body coverage and RNA-seq QC |
| preseq | preseq lc_extrap | Library complexity extrapolation |
| flagstat | samtools flagstat | Alignment flag statistics |
| idxstats | samtools idxstats | Per-chromosome read counts |
| stats | samtools stats | Full stats with all histogram sections |
A GTF file is required (--gtf) and all tools run automatically. Transcript-level structure is extracted from the GTF. GTF files can be plain or gzip-compressed (.gz); compression is detected automatically. Individual tools can be disabled via the YAML configuration file.
RSeQC tools — RNA-seq quality control
Section titled “RSeQC tools — RNA-seq quality control”Eight reimplementations of RSeQC tools (including TIN), all integrated into the rustqc rna command and running automatically in the same single-pass analysis. See RSeQC for full details.
| Tool | Description |
|---|---|
| bam_stat | Alignment statistics (total reads, duplicates, mapping quality) |
| infer_experiment | Strandedness inference from splice reads |
| read_duplication | Sequence- and mapping-based duplication rates |
| read_distribution | Read distribution across genomic features (CDS, UTR, intron, intergenic) |
| junction_annotation | Classify splice junctions as known, partial novel, or complete novel |
| junction_saturation | Saturation analysis of detected splice junctions |
| inner_distance | Inner distance distribution between paired-end mates |
| tin | Transcript Integrity Number per gene |
Credits
Section titled “Credits”RustQC builds on several established bioinformatics tools. If you use RustQC, please cite the original tools. See the Credits page for the full list and citations.