featureCounts
How it works
Section titled “How it works”RustQC performs gene-level read counting in the same pass as the dupRadar duplication rate analysis and all RSeQC-equivalent metrics. During BAM processing, RustQC simultaneously:
- Counts reads assigned to each gene (using the same algorithm as featureCounts)
- Tracks duplication rates for the dupRadar analysis
- Computes RSeQC-equivalent quality metrics (strandedness, read distribution, junctions, etc.)
- Aggregates counts by gene biotype
RustQC follows the same algorithm as Subread featureCounts with these defaults:
- Feature type:
exon - Attribute:
gene_id - Overlap detection: at least 1 base overlap
- Multi-mapping reads: counted (for multi columns), excluded (for unique columns)
- Strand-aware counting based on the
-s/--strandedflag
Output files
Section titled “Output files”All featureCounts output files use the BAM file stem as a prefix and are written to a featurecounts/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead. Each output can be individually enabled or disabled via the configuration file.
Directoryfeaturecounts/
- sample.featureCounts.tsv Per-gene read counts
- sample.featureCounts.tsv.summary Gene-level assignment summary
- sample.featureCounts.biotype.tsv.summary Biotype-level assignment summary
sample.biotype_counts.tsvPer-biotype read countssample.biotype_counts_mqc.tsvMultiQC biotype bargraphsample.biotype_counts_rrna_mqc.tsvMultiQC rRNA percentage
Counts file
Section titled “Counts file”File: <sample>.featureCounts.tsv
A tab-separated counts file compatible with the Subread featureCounts output format. Contains per-gene read counts with annotation columns:
| Column | Description |
|---|---|
Geneid | Gene identifier |
Chr | Chromosome(s) |
Start | Start position(s) |
End | End position(s) |
Strand | Strand(s) |
Length | Gene length |
<sample> | Read count for this sample |
The file includes a header comment line with the command used to generate it:
# Program:RustQC v0.1.0; Command: rustqc rna sample.bam --gtf genes.gtf -pGeneid Chr Start End Strand Length sample.bamENSG00000000003 chrX 100627108;100629986;... 100636806;100637104;... -;-;... 3768 521ENSG00000000005 chrX 100584936;100585053;... 100585091;100599885;... +;+;... 1339 0This file is directly compatible with downstream tools that accept featureCounts output, such as DESeq2 and MultiQC.
Summary file
Section titled “Summary file”File: <sample>.featureCounts.tsv.summary
Gene-level assignment summary statistics in featureCounts format, matching featureCounts -g gene_id behaviour. A read overlapping multiple genes is counted as Ambiguous, regardless of whether those genes share the same biotype.
Status sample.bamAssigned 22812Unassigned_Unmapped 0Unassigned_NoFeatures 1227Unassigned_Ambiguity 2395- Assigned: reads successfully assigned to exactly one gene
- Unassigned_Unmapped: unmapped reads
- Unassigned_NoFeatures: reads not overlapping any gene
- Unassigned_Ambiguity: reads overlapping multiple genes
Biotype summary file
Section titled “Biotype summary file”File: <sample>.featureCounts.biotype.tsv.summary
Biotype-level assignment summary, matching featureCounts -g gene_biotype behaviour. Reads overlapping multiple genes of the same biotype (e.g. two protein_coding genes) are counted as Assigned, not Ambiguous, because they map to a single biotype meta-feature. Only produced when the GTF contains the biotype attribute.
This file has the same format as the gene-level summary but will typically show more Assigned reads and fewer Ambiguous reads, since same-biotype multi-gene overlaps are resolved.
Biotype counts
Section titled “Biotype counts”File: <sample>.biotype_counts.tsv
A tab-separated file with per-biotype read counts, sorted alphabetically by biotype name:
protein_coding 12345lncRNA 678rRNA 90The biotype is extracted from the GTF attribute specified by biotype_attribute (default: gene_biotype for Ensembl, gene_type for GENCODE).
Biotype MultiQC files
Section titled “Biotype MultiQC files”<sample>.biotype_counts_mqc.tsv— Biotype counts formatted as a MultiQC bargraph data file, suitable for visualizing the distribution of reads across biotypes.<sample>.biotype_counts_rrna_mqc.tsv— rRNA percentage formatted as a MultiQC general statistics value, reporting the fraction of assigned reads mapping to rRNA genes.
Biotype attribute detection
Section titled “Biotype attribute detection”RustQC auto-detects whether your GTF uses gene_biotype (Ensembl) or gene_type (GENCODE). You can override this with --biotype-attribute. See the CLI reference for details.
Benchmarks
Section titled “Benchmarks”RustQC includes built-in gene-level read counting that produces output compatible with the Subread featureCounts format. Benchmarks were run on AWS cloud infrastructure on 2026-03-09.
Performance
Section titled “Performance”Large dataset (GM12878 REP1, ~186M reads)
| Tool | Wall time | CPU time | Peak RSS |
|---|---|---|---|
| Rsubread featureCounts | 3m 3s | 3m 3s | 210 MB |
| RustQC (all tools, single pass) | 14m 54s | 14m 54s | 11.4 GB |
Small dataset (~52K reads, chr6)
| Tool | Wall time | CPU time | Peak RSS |
|---|---|---|---|
| Rsubread featureCounts | 3m 9s | 1s | 3.1 MB |
| RustQC (all tools, single pass) | 3m 31s | 25.9s | 182.1 MB |
Note: RustQC runtime shown is for all tools combined in a single pass. See Benchmark Details for a full breakdown.
The 1s CPU time for featureCounts reflects that the small dataset is trivially sized and most wall time is container startup.
Output equivalence
Section titled “Output equivalence”RustQC’s read counting uses the same algorithm as Subread featureCounts:
- Feature type: exon-level features grouped by
gene_id - Overlap detection: at least 1 base pair overlap
- Strand awareness: configurable via the
-s/--strandedflag - Multi-mapping: tracked separately for unique and multi-mapper columns
Assignment statistics - small dataset (test, ~52K reads)
| Category | Rsubread featureCounts | RustQC | Match |
|---|---|---|---|
| Assigned | 42,173 | 42,173 | ✓ |
| Unassigned_MultiMapping | 5,496 | 5,496 | ✓ |
| Unassigned_NoFeatures | 2,645 | 2,645 | ✓ |
| Unassigned_Ambiguity | 2,525 | 2,525 | ✓ |
| All others | 0 | 0 | ✓ |
Assignment statistics - large dataset (GM12878 REP1, ~186M reads)
| Category | Rsubread featureCounts | RustQC | Match |
|---|---|---|---|
| Assigned | 127,819,327 | 127,819,327 | ✓ |
| Unassigned_Unmapped | 12,490,977 | 12,490,977 | ✓ |
| Unassigned_MultiMapping | 26,451,119 | 26,451,119 | ✓ |
| Unassigned_NoFeatures | 29,885,481 | 29,885,481 | ✓ |
| Unassigned_Ambiguity | 4,958,548 | 4,958,548 | ✓ |
| All others | 0 | 0 | ✓ |
All assignment statistics are identical across both datasets. Gene-level read counts match exactly for every gene. The output format is directly compatible with downstream tools such as DESeq2 and MultiQC.
Additional outputs
Section titled “Additional outputs”Beyond the standard featureCounts counts file and summary, RustQC also produces:
- Biotype summary (
.featureCounts.biotype.tsv.summary): biotype-level assignment summary matchingfeatureCounts -g gene_biotype - Biotype counts (
.biotype_counts.tsv): per-biotype read count summaries - Biotype MultiQC bargraph (
.biotype_counts_mqc.tsv): ready for MultiQC visualization - rRNA percentage (
.biotype_counts_rrna_mqc.tsv): rRNA fraction for MultiQC general statistics
Generating these in the traditional workflow requires additional scripting after the featureCounts run.
Configuring outputs
Section titled “Configuring outputs”Each output file can be individually enabled or disabled. See the Configuration page for details.