Skip to content

featureCounts

RustQC performs gene-level read counting in the same pass as the dupRadar duplication rate analysis and all RSeQC-equivalent metrics. During BAM processing, RustQC simultaneously:

  1. Counts reads assigned to each gene (using the same algorithm as featureCounts)
  2. Tracks duplication rates for the dupRadar analysis
  3. Computes RSeQC-equivalent quality metrics (strandedness, read distribution, junctions, etc.)
  4. Aggregates counts by gene biotype

RustQC follows the same algorithm as Subread featureCounts with these defaults:

  • Feature type: exon
  • Attribute: gene_id
  • Overlap detection: at least 1 base overlap
  • Multi-mapping reads: counted (for multi columns), excluded (for unique columns)
  • Strand-aware counting based on the -s / --stranded flag

All featureCounts output files use the BAM file stem as a prefix and are written to a featurecounts/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead. Each output can be individually enabled or disabled via the configuration file.

  • Directoryfeaturecounts/
    • sample.featureCounts.tsv Per-gene read counts
    • sample.featureCounts.tsv.summary Gene-level assignment summary
    • sample.featureCounts.biotype.tsv.summary Biotype-level assignment summary
    • sample.biotype_counts.tsv Per-biotype read counts
    • sample.biotype_counts_mqc.tsv MultiQC biotype bargraph
    • sample.biotype_counts_rrna_mqc.tsv MultiQC rRNA percentage

File: <sample>.featureCounts.tsv

A tab-separated counts file compatible with the Subread featureCounts output format. Contains per-gene read counts with annotation columns:

ColumnDescription
GeneidGene identifier
ChrChromosome(s)
StartStart position(s)
EndEnd position(s)
StrandStrand(s)
LengthGene length
<sample>Read count for this sample

The file includes a header comment line with the command used to generate it:

# Program:RustQC v0.1.0; Command: rustqc rna sample.bam --gtf genes.gtf -p
Geneid Chr Start End Strand Length sample.bam
ENSG00000000003 chrX 100627108;100629986;... 100636806;100637104;... -;-;... 3768 521
ENSG00000000005 chrX 100584936;100585053;... 100585091;100599885;... +;+;... 1339 0

This file is directly compatible with downstream tools that accept featureCounts output, such as DESeq2 and MultiQC.

File: <sample>.featureCounts.tsv.summary

Gene-level assignment summary statistics in featureCounts format, matching featureCounts -g gene_id behaviour. A read overlapping multiple genes is counted as Ambiguous, regardless of whether those genes share the same biotype.

Status sample.bam
Assigned 22812
Unassigned_Unmapped 0
Unassigned_NoFeatures 1227
Unassigned_Ambiguity 2395
  • Assigned: reads successfully assigned to exactly one gene
  • Unassigned_Unmapped: unmapped reads
  • Unassigned_NoFeatures: reads not overlapping any gene
  • Unassigned_Ambiguity: reads overlapping multiple genes

File: <sample>.featureCounts.biotype.tsv.summary

Biotype-level assignment summary, matching featureCounts -g gene_biotype behaviour. Reads overlapping multiple genes of the same biotype (e.g. two protein_coding genes) are counted as Assigned, not Ambiguous, because they map to a single biotype meta-feature. Only produced when the GTF contains the biotype attribute.

This file has the same format as the gene-level summary but will typically show more Assigned reads and fewer Ambiguous reads, since same-biotype multi-gene overlaps are resolved.

File: <sample>.biotype_counts.tsv

A tab-separated file with per-biotype read counts, sorted alphabetically by biotype name:

protein_coding 12345
lncRNA 678
rRNA 90

The biotype is extracted from the GTF attribute specified by biotype_attribute (default: gene_biotype for Ensembl, gene_type for GENCODE).

  • <sample>.biotype_counts_mqc.tsv — Biotype counts formatted as a MultiQC bargraph data file, suitable for visualizing the distribution of reads across biotypes.
  • <sample>.biotype_counts_rrna_mqc.tsv — rRNA percentage formatted as a MultiQC general statistics value, reporting the fraction of assigned reads mapping to rRNA genes.

RustQC auto-detects whether your GTF uses gene_biotype (Ensembl) or gene_type (GENCODE). You can override this with --biotype-attribute. See the CLI reference for details.

RustQC includes built-in gene-level read counting that produces output compatible with the Subread featureCounts format. Benchmarks were run on AWS cloud infrastructure on 2026-03-09.

Large dataset (GM12878 REP1, ~186M reads)
ToolWall timeCPU timePeak RSS
Rsubread featureCounts3m 3s3m 3s210 MB
RustQC (all tools, single pass)14m 54s14m 54s11.4 GB
Small dataset (~52K reads, chr6)
ToolWall timeCPU timePeak RSS
Rsubread featureCounts3m 9s1s3.1 MB
RustQC (all tools, single pass)3m 31s25.9s182.1 MB

Note: RustQC runtime shown is for all tools combined in a single pass. See Benchmark Details for a full breakdown.

The 1s CPU time for featureCounts reflects that the small dataset is trivially sized and most wall time is container startup.

RustQC’s read counting uses the same algorithm as Subread featureCounts:

  • Feature type: exon-level features grouped by gene_id
  • Overlap detection: at least 1 base pair overlap
  • Strand awareness: configurable via the -s / --stranded flag
  • Multi-mapping: tracked separately for unique and multi-mapper columns
Assignment statistics - small dataset (test, ~52K reads)
CategoryRsubread featureCountsRustQCMatch
Assigned42,17342,173
Unassigned_MultiMapping5,4965,496
Unassigned_NoFeatures2,6452,645
Unassigned_Ambiguity2,5252,525
All others00
Assignment statistics - large dataset (GM12878 REP1, ~186M reads)
CategoryRsubread featureCountsRustQCMatch
Assigned127,819,327127,819,327
Unassigned_Unmapped12,490,97712,490,977
Unassigned_MultiMapping26,451,11926,451,119
Unassigned_NoFeatures29,885,48129,885,481
Unassigned_Ambiguity4,958,5484,958,548
All others00

All assignment statistics are identical across both datasets. Gene-level read counts match exactly for every gene. The output format is directly compatible with downstream tools such as DESeq2 and MultiQC.

Beyond the standard featureCounts counts file and summary, RustQC also produces:

  • Biotype summary (.featureCounts.biotype.tsv.summary): biotype-level assignment summary matching featureCounts -g gene_biotype
  • Biotype counts (.biotype_counts.tsv): per-biotype read count summaries
  • Biotype MultiQC bargraph (.biotype_counts_mqc.tsv): ready for MultiQC visualization
  • rRNA percentage (.biotype_counts_rrna_mqc.tsv): rRNA fraction for MultiQC general statistics

Generating these in the traditional workflow requires additional scripting after the featureCounts run.

Each output file can be individually enabled or disabled. See the Configuration page for details.