Samtools
RustQC produces output files compatible with three core
Samtools commands: flagstat, idxstats, and
stats. These are generated during the same single-pass BAM scan as all other
analyses, with no additional runtime cost.
The output files are drop-in replacements that downstream tools (particularly MultiQC) can parse as if they came directly from Samtools.
Output files
Section titled “Output files”All Samtools-compatible output files use the BAM file stem as a prefix and are
written to a samtools/ subdirectory under the output directory. Use
--flat-output to write all files directly to the output directory instead.
Directorysamtools/
- sample.flagstat Alignment flag summary statistics
- sample.idxstats Per-chromosome read counts
- sample.stats Full Samtools stats output (SN, histograms, checksums)
flagstat
Section titled “flagstat”File: <sample>.flagstat
A text file matching the samtools flagstat output format. Each line reports a
count with the format <count> + 0 <description>. The 16 standard metrics are:
| Line | Description |
|---|---|
| 1 | Total reads (QC-passed + QC-failed) |
| 2 | Primary reads |
| 3 | Secondary reads |
| 4 | Supplementary reads |
| 5 | Duplicates |
| 6 | Primary duplicates |
| 7 | Mapped (with percentage of total) |
| 8 | Primary mapped (with percentage of primary) |
| 9 | Paired in sequencing |
| 10 | Read 1 |
| 11 | Read 2 |
| 12 | Properly paired (with percentage of paired) |
| 13 | With itself and mate mapped |
| 14 | Singletons (with percentage of paired) |
| 15 | With mate mapped to a different chr |
| 16 | With mate mapped to a different chr (mapQ>=5) |
The QC-failed column is always 0 (RustQC does not separate QC-pass/fail counts).
Example:
201605452 + 0 in total (QC-passed reads + QC-failed reads)186289286 + 0 primary15316166 + 0 secondary0 + 0 supplementary132703364 + 0 duplicates132703364 + 0 primary duplicates189114475 + 0 mapped (93.80% : N/A)173798309 + 0 primary mapped (93.29% : N/A)idxstats
Section titled “idxstats”File: <sample>.idxstats
A tab-separated file matching samtools idxstats output format. Each line has
four columns:
| Column | Description |
|---|---|
ref_name | Reference sequence name |
seq_length | Reference sequence length |
mapped | Number of mapped reads |
unmapped | Number of unmapped reads |
All reference sequences from the BAM header are included, even those with zero
reads. A final line with * as the reference name reports unplaced unmapped reads.
Example:
1 248956422 10019968 02 242193529 6988244 0...* 0 0 0File: <sample>.stats
Produces the full samtools stats output format, including the Summary Numbers
(SN) section and all histogram sections. The file includes a comment header
that MultiQC uses for format detection.
SN (Summary Numbers)
Section titled “SN (Summary Numbers)”Key SN fields include:
| Metric | Description |
|---|---|
raw total sequences | Primary reads (excluding supplementary/secondary) |
reads mapped | Mapped primary reads |
reads duplicated | Duplicate-flagged reads |
reads properly paired | Properly paired reads |
total length | Sum of all read lengths |
bases mapped (cigar) | Bases consumed by M/I/=X CIGAR operations |
mismatches | Mismatches from NM auxiliary tags |
error rate | Mismatches / bases mapped (cigar) |
average length | Mean read length |
average quality | Mean base quality |
insert size average | Mean insert size (from TLEN) |
insert size standard deviation | Insert size standard deviation |
inward oriented pairs | FR-oriented read pairs |
outward oriented pairs | RF-oriented read pairs |
pairs on different chromosomes | Inter-chromosomal pairs |
Histogram sections
Section titled “Histogram sections”In addition to SN, RustQC produces all standard histogram sections:
| Section | Description |
|---|---|
| FFQ | First Fragment Quality per cycle |
| LFQ | Last Fragment Quality per cycle |
| GCF | GC content distribution for first fragments |
| GCL | GC content distribution for last fragments |
| GCC | ACGT content per cycle |
| GCT | ACGT content per cycle (last fragment) |
| FBC | ACGT base content per cycle (first fragment percentages) |
| LBC | ACGT base content per cycle (last fragment percentages) |
| FTC | ACGT total counts (first fragment) |
| LTC | ACGT total counts (last fragment) |
| IS | Insert size distribution |
| RL | Read length distribution |
| FRL | First fragment read length distribution |
| LRL | Last fragment read length distribution |
| MAPQ | Mapping quality distribution |
| ID | Indel size distribution |
| IC | Indel per cycle |
| COV | Coverage depth distribution |
| GCD | GC depth distribution |
| CHK | CRC32 checksums |
Example:
# This file was produced by samtools stats and RustQCSN raw total sequences: 186289286SN filtered sequences: 0SN sequences: 186289286SN reads mapped: 173798309...FFQ 1 0 0 0 0 ...GCC 1 26.90 21.38 29.47 22.25IS 0 0 0.000000CHK 1a2b3c4d 5e6f7a8b 9c0d1e2fBenchmarks
Section titled “Benchmarks”RustQC produces Samtools-compatible output files (flagstat, idxstats, and full stats output including SN and all histogram sections) as part of its single-pass BAM processing. This section compares the output of each tool against the originals, validated on two datasets from an AWS cloud run (nf-core AWS megatests, 2026-03-09).
Performance
Section titled “Performance”Note: RustQC runtime shown is for all tools combined in a single pass. See Benchmark Details for a full breakdown.
Individual Samtools tool times are shown for reference.
Small dataset (~52K reads, chr6 subset)
| Tool | Runtime | RSS |
|---|---|---|
| samtools flagstat | <1s | 3.1 MB |
| samtools idxstats | <1s | 3.1 MB |
| samtools stats | <1s | 3.2 MB |
| RUSTQC_RNA (all-in-one) | 25.9s | 182 MB |
Large dataset (~186M reads, GM12878)
| Tool | Runtime | RSS |
|---|---|---|
| samtools flagstat | 3m 5s | 5.1 MB |
| samtools idxstats | 16s | 4.8 MB |
| samtools stats | 9m 48s | 8.2 MB |
| RUSTQC_RNA (all-in-one) | 14m 54s | 11.4 GB |
flagstat comparison
Section titled “flagstat comparison”Result: Identical
All flagstat metrics match exactly between samtools flagstat and RustQC on both
the small and large datasets.
Large dataset (GM12878, ~186M reads)
| Metric | samtools | RustQC |
|---|---|---|
| Total reads | 201,605,452 | 201,605,452 |
| Primary | 186,289,286 | 186,289,286 |
| Secondary | 15,316,166 | 15,316,166 |
| Duplicates | 132,703,364 | 132,703,364 |
| Mapped | 189,114,475 (93.80%) | 189,114,475 (93.80%) |
| Primary mapped | 173,798,309 (93.29%) | 173,798,309 (93.29%) |
| Properly paired | 173,557,764 (93.17%) | 173,557,764 (93.17%) |
| Singletons | 240,545 | 240,545 |
Small dataset (test, ~52K reads, chr6 subset)
| Metric | Samtools | RustQC |
|---|---|---|
| Total reads | 52,839 | 52,839 |
| Primary | 49,573 | 49,573 |
| Secondary | 3,266 | 3,266 |
| Duplicates | 6,097 | 6,097 |
| Mapped | 52,839 (100.00%) | 52,839 (100.00%) |
| Primary mapped | 49,573 (100.00%) | 49,573 (100.00%) |
| Properly paired | 49,546 (99.95%) | 49,546 (99.95%) |
| Singletons | 27 | 27 |
The output format is fully compatible with MultiQC and other tools that parse Samtools flagstat output.
idxstats comparison
Section titled “idxstats comparison”Result: Identical
Per-chromosome read counts match exactly across all reference sequences.
Both files include the same reference names, lengths, mapped counts, and
unmapped counts, plus the * row for unplaced reads.
stats comparison
Section titled “stats comparison”Result: Near-identical
All SN (Summary Numbers) fields and 19 of 20 histogram sections match exactly
between samtools stats and RustQC on both datasets. The GCD (GC depth) section
shows minor differences (max absolute difference of 0.018) due to sampling
variation in the GC depth calculation.
Large dataset (GM12878)
| Metric | Samtools | RustQC |
|---|---|---|
| sequences | 186,289,286 | 186,289,286 |
| reads mapped | 173,798,309 | 173,798,309 |
| reads duplicated | 132,703,364 | 132,703,364 |
| reads unmapped | 12,490,977 | 12,490,977 |
| reads properly paired | 173,557,764 | 173,557,764 |
| error rate | 2.599194e-03 | 2.599194e-03 |
| average length | 99 | 99 |
| average quality | 36.3 | 36.3 |
| insert size average | 1,536.7 | 1,536.7 |
| insert size std dev | 2,263.8 | 2,263.8 |
Small dataset (test, chr6 subset)
| Metric | Samtools | RustQC |
|---|---|---|
| sequences | 49,573 | 49,573 |
| reads mapped | 49,573 | 49,573 |
| reads duplicated | 6,097 | 6,097 |
| reads unmapped | 0 | 0 |
| reads properly paired | 49,546 | 49,546 |
| error rate | 5.750919e-03 | 5.750919e-03 |
| average length | 145 | 145 |
| average quality | 38.6 | 38.6 |
| insert size average | 1,625.9 | 1,625.9 |
| insert size std dev | 2,305.4 | 2,305.4 |
RustQC produces the full samtools stats output, including the SN section and all
histogram sections (FFQ, LFQ, GCF, GCL, GCC, GCT, FBC, LBC, FTC, LTC, IS, RL,
FRL, LRL, MAPQ, ID, IC, COV, GCD, CHK). The output format includes the samtools stats
header comment required for MultiQC parsing.
Configuration
Section titled “Configuration”Each output (flagstat, idxstats, stats) can be individually enabled or disabled. See the Configuration page for details.
References
Section titled “References”- Samtools: Danecek P, Bonfield JK, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. Samtools website