Skip to content

Introduction

RustQC is a fast quality control toolkit for sequencing data, written in Rust. It reimplements several established bioinformatics QC tools as a single compiled binary with no runtime dependencies:

  • dupRadar: RNA-seq PCR duplicate rate analysis
  • featureCounts: gene-level read counting and biotype quantification
  • RSeQC: 8 RNA-seq quality control modules (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance, TIN)
  • preseq: library complexity extrapolation (lc_extrap)
  • samtools: flagstat, idxstats, and full stats output including all histogram sections
  • Qualimap: gene body coverage profiling and RNA-seq QC summary

RNA-seq quality control typically involves running multiple tools written in R and Python, each with their own dependencies, interpreters, and runtime overhead. RustQC consolidates these into a single fast binary.

Key advantages:

  • Processes ~186M reads in 14m 54s vs. ~15h 34m of sequential tool runtimes on AWS, including TIN which takes 9h 45m alone in the traditional workflow
  • No runtime dependencies: no R, Python, or Bioconductor installation required
  • The rna subcommand performs read counting and duplicate analysis simultaneously, eliminating the need for a separate featureCounts run
  • Produces identical or near-identical results to the original tools (see benchmark details)
  • Process several BAM files in a single command with automatic parallelisation
  • Accepts SAM, BAM, and CRAM input files; GTF annotation files can be plain or gzip-compressed

rustqc rna — RNA-Seq quality control pipeline

Section titled “rustqc rna — RNA-Seq quality control pipeline”

Given a duplicate-marked BAM file and a GTF annotation, rustqc rna runs all of the following in a single pass:

ToolEquivalentDescription
dupRadardupRadarRNA-seq PCR duplicate rate analysis
featureCountsfeatureCountsGene-level read counting and biotypes
Qualimap rnaseqQualimap rnaseqGene body coverage and RNA-seq QC
preseqpreseq lc_extrapLibrary complexity extrapolation
flagstatsamtools flagstatAlignment flag statistics
idxstatssamtools idxstatsPer-chromosome read counts
statssamtools statsFull stats with all histogram sections

A GTF file is required (--gtf) and all tools run automatically. Transcript-level structure is extracted from the GTF. GTF files can be plain or gzip-compressed (.gz); compression is detected automatically. Individual tools can be disabled via the YAML configuration file.

Eight reimplementations of RSeQC tools (including TIN), all integrated into the rustqc rna command and running automatically in the same single-pass analysis. See RSeQC for full details.

ToolDescription
bam_statAlignment statistics (total reads, duplicates, mapping quality)
infer_experimentStrandedness inference from splice reads
read_duplicationSequence- and mapping-based duplication rates
read_distributionRead distribution across genomic features (CDS, UTR, intron, intergenic)
junction_annotationClassify splice junctions as known, partial novel, or complete novel
junction_saturationSaturation analysis of detected splice junctions
inner_distanceInner distance distribution between paired-end mates
tinTranscript Integrity Number per gene

RustQC builds on several established bioinformatics tools. If you use RustQC, please cite the original tools. See the Credits page for the full list and citations.