featureCounts

How it works

RustQC performs gene-level read counting in the same pass as the dupRadar duplication rate analysis and all RSeQC-equivalent metrics. During BAM processing, RustQC simultaneously:

Counts reads assigned to each gene (using the same algorithm as featureCounts)
Tracks duplication rates for the dupRadar analysis
Computes RSeQC-equivalent quality metrics (strandedness, read distribution, junctions, etc.)
Aggregates counts by gene biotype

RustQC follows the same algorithm as Subread featureCounts with these defaults:

Feature type: exon
Attribute: gene_id
Overlap detection: at least 1 base overlap
Multi-mapping reads: counted (for multi columns), excluded (for unique columns)
Strand-aware counting based on the -s / --stranded flag

Output files

All featureCounts output files use the BAM file stem as a prefix and are written to a featurecounts/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead. Each output can be individually enabled or disabled via the configuration file.

Directoryfeaturecounts/
- sample.featureCounts.tsv Per-gene read counts
- sample.featureCounts.tsv.summary Gene-level assignment summary
- sample.featureCounts.biotype.tsv.summary Biotype-level assignment summary
- sample.biotype_counts.tsv Per-biotype read counts
- sample.biotype_counts_mqc.tsv MultiQC biotype bargraph
- sample.biotype_counts_rrna_mqc.tsv MultiQC rRNA percentage

Counts file

File: <sample>.featureCounts.tsv

A tab-separated counts file compatible with the Subread featureCounts output format. Contains per-gene read counts with annotation columns:

Column	Description
`Geneid`	Gene identifier
`Chr`	Chromosome(s)
`Start`	Start position(s)
`End`	End position(s)
`Strand`	Strand(s)
`Length`	Gene length
`<sample>`	Read count for this sample

The file includes a header comment line with the command used to generate it:

# Program:RustQC v0.1.0; Command: rustqc rna sample.bam --gtf genes.gtf -p
Geneid  Chr     Start   End     Strand  Length  sample.bam
ENSG00000000003 chrX    100627108;100629986;...  100636806;100637104;... -;-;... 3768    521
ENSG00000000005 chrX    100584936;100585053;...  100585091;100599885;... +;+;... 1339    0

This file is directly compatible with downstream tools that accept featureCounts output, such as DESeq2 and MultiQC.

Summary file

File: <sample>.featureCounts.tsv.summary

Gene-level assignment summary statistics in featureCounts format, matching featureCounts -g gene_id behaviour. A read overlapping multiple genes is counted as Ambiguous, regardless of whether those genes share the same biotype.

Status  sample.bam
Assigned        22812
Unassigned_Unmapped     0
Unassigned_NoFeatures   1227
Unassigned_Ambiguity    2395

Assigned: reads successfully assigned to exactly one gene
Unassigned_Unmapped: unmapped reads
Unassigned_NoFeatures: reads not overlapping any gene
Unassigned_Ambiguity: reads overlapping multiple genes

Biotype summary file

File: <sample>.featureCounts.biotype.tsv.summary

Biotype-level assignment summary, matching featureCounts -g gene_biotype behaviour. Reads overlapping multiple genes of the same biotype (e.g. two protein_coding genes) are counted as Assigned, not Ambiguous, because they map to a single biotype meta-feature. Only produced when the GTF contains the biotype attribute.

This file has the same format as the gene-level summary but will typically show more Assigned reads and fewer Ambiguous reads, since same-biotype multi-gene overlaps are resolved.

Biotype counts

File: <sample>.biotype_counts.tsv

A tab-separated file with per-biotype read counts, sorted alphabetically by biotype name:

protein_coding  12345
lncRNA  678
rRNA  90

The biotype is extracted from the GTF attribute specified by biotype_attribute (default: gene_biotype for Ensembl, gene_type for GENCODE).

Biotype MultiQC files

<sample>.biotype_counts_mqc.tsv — Biotype counts formatted as a MultiQC bargraph data file, suitable for visualizing the distribution of reads across biotypes.
<sample>.biotype_counts_rrna_mqc.tsv — rRNA percentage formatted as a MultiQC general statistics value, reporting the fraction of assigned reads mapping to rRNA genes.

Biotype attribute detection

RustQC auto-detects whether your GTF uses gene_biotype (Ensembl) or gene_type (GENCODE). You can override this with --biotype-attribute. See the CLI reference for details.

Benchmarks

RustQC includes built-in gene-level read counting that produces output compatible with the Subread featureCounts format. Benchmarks were run on AWS cloud infrastructure on 2026-03-09.

Performance

Large dataset (GM12878 REP1, ~186M reads)

Tool	Wall time	CPU time	Peak RSS
Rsubread featureCounts	3m 3s	3m 3s	210 MB
RustQC (all tools, single pass)	14m 54s	14m 54s	11.4 GB

Small dataset (~52K reads, chr6)

Tool	Wall time	CPU time	Peak RSS
Rsubread featureCounts	3m 9s	1s	3.1 MB
RustQC (all tools, single pass)	3m 31s	25.9s	182.1 MB

Note: RustQC runtime shown is for all tools combined in a single pass. See Benchmark Details for a full breakdown.

The 1s CPU time for featureCounts reflects that the small dataset is trivially sized and most wall time is container startup.

Output equivalence

RustQC’s read counting uses the same algorithm as Subread featureCounts:

Feature type: exon-level features grouped by gene_id
Overlap detection: at least 1 base pair overlap
Strand awareness: configurable via the -s / --stranded flag
Multi-mapping: tracked separately for unique and multi-mapper columns

Assignment statistics - small dataset (test, ~52K reads)

Category	Rsubread featureCounts	RustQC	Match
Assigned	42,173	42,173	✓
Unassigned_MultiMapping	5,496	5,496	✓
Unassigned_NoFeatures	2,645	2,645	✓
Unassigned_Ambiguity	2,525	2,525	✓
All others	0	0	✓

Assignment statistics - large dataset (GM12878 REP1, ~186M reads)

Category	Rsubread featureCounts	RustQC	Match
Assigned	127,819,327	127,819,327	✓
Unassigned_Unmapped	12,490,977	12,490,977	✓
Unassigned_MultiMapping	26,451,119	26,451,119	✓
Unassigned_NoFeatures	29,885,481	29,885,481	✓
Unassigned_Ambiguity	4,958,548	4,958,548	✓
All others	0	0	✓

All assignment statistics are identical across both datasets. Gene-level read counts match exactly for every gene. The output format is directly compatible with downstream tools such as DESeq2 and MultiQC.

Additional outputs

Beyond the standard featureCounts counts file and summary, RustQC also produces:

Biotype summary (.featureCounts.biotype.tsv.summary): biotype-level assignment summary matching featureCounts -g gene_biotype
Biotype counts (.biotype_counts.tsv): per-biotype read count summaries
Biotype MultiQC bargraph (.biotype_counts_mqc.tsv): ready for MultiQC visualization
rRNA percentage (.biotype_counts_rrna_mqc.tsv): rRNA fraction for MultiQC general statistics

Generating these in the traditional workflow requires additional scripting after the featureCounts run.

Configuring outputs

Each output file can be individually enabled or disabled. See the Configuration page for details.