Skip to content

Samtools

RustQC produces output files compatible with three core Samtools commands: flagstat, idxstats, and stats. These are generated during the same single-pass BAM scan as all other analyses, with no additional runtime cost.

The output files are drop-in replacements that downstream tools (particularly MultiQC) can parse as if they came directly from Samtools.

All Samtools-compatible output files use the BAM file stem as a prefix and are written to a samtools/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead.

  • Directorysamtools/
    • sample.flagstat Alignment flag summary statistics
    • sample.idxstats Per-chromosome read counts
    • sample.stats Full Samtools stats output (SN, histograms, checksums)

File: <sample>.flagstat

A text file matching the samtools flagstat output format. Each line reports a count with the format <count> + 0 <description>. The 16 standard metrics are:

LineDescription
1Total reads (QC-passed + QC-failed)
2Primary reads
3Secondary reads
4Supplementary reads
5Duplicates
6Primary duplicates
7Mapped (with percentage of total)
8Primary mapped (with percentage of primary)
9Paired in sequencing
10Read 1
11Read 2
12Properly paired (with percentage of paired)
13With itself and mate mapped
14Singletons (with percentage of paired)
15With mate mapped to a different chr
16With mate mapped to a different chr (mapQ>=5)

The QC-failed column is always 0 (RustQC does not separate QC-pass/fail counts).

Example:

201605452 + 0 in total (QC-passed reads + QC-failed reads)
186289286 + 0 primary
15316166 + 0 secondary
0 + 0 supplementary
132703364 + 0 duplicates
132703364 + 0 primary duplicates
189114475 + 0 mapped (93.80% : N/A)
173798309 + 0 primary mapped (93.29% : N/A)

File: <sample>.idxstats

A tab-separated file matching samtools idxstats output format. Each line has four columns:

ColumnDescription
ref_nameReference sequence name
seq_lengthReference sequence length
mappedNumber of mapped reads
unmappedNumber of unmapped reads

All reference sequences from the BAM header are included, even those with zero reads. A final line with * as the reference name reports unplaced unmapped reads.

Example:

1 248956422 10019968 0
2 242193529 6988244 0
...
* 0 0 0

File: <sample>.stats

Produces the full samtools stats output format, including the Summary Numbers (SN) section and all histogram sections. The file includes a comment header that MultiQC uses for format detection.

Key SN fields include:

MetricDescription
raw total sequencesPrimary reads (excluding supplementary/secondary)
reads mappedMapped primary reads
reads duplicatedDuplicate-flagged reads
reads properly pairedProperly paired reads
total lengthSum of all read lengths
bases mapped (cigar)Bases consumed by M/I/=X CIGAR operations
mismatchesMismatches from NM auxiliary tags
error rateMismatches / bases mapped (cigar)
average lengthMean read length
average qualityMean base quality
insert size averageMean insert size (from TLEN)
insert size standard deviationInsert size standard deviation
inward oriented pairsFR-oriented read pairs
outward oriented pairsRF-oriented read pairs
pairs on different chromosomesInter-chromosomal pairs

In addition to SN, RustQC produces all standard histogram sections:

SectionDescription
FFQFirst Fragment Quality per cycle
LFQLast Fragment Quality per cycle
GCFGC content distribution for first fragments
GCLGC content distribution for last fragments
GCCACGT content per cycle
GCTACGT content per cycle (last fragment)
FBCACGT base content per cycle (first fragment percentages)
LBCACGT base content per cycle (last fragment percentages)
FTCACGT total counts (first fragment)
LTCACGT total counts (last fragment)
ISInsert size distribution
RLRead length distribution
FRLFirst fragment read length distribution
LRLLast fragment read length distribution
MAPQMapping quality distribution
IDIndel size distribution
ICIndel per cycle
COVCoverage depth distribution
GCDGC depth distribution
CHKCRC32 checksums

Example:

# This file was produced by samtools stats and RustQC
SN raw total sequences: 186289286
SN filtered sequences: 0
SN sequences: 186289286
SN reads mapped: 173798309
...
FFQ 1 0 0 0 0 ...
GCC 1 26.90 21.38 29.47 22.25
IS 0 0 0.000000
CHK 1a2b3c4d 5e6f7a8b 9c0d1e2f

RustQC produces Samtools-compatible output files (flagstat, idxstats, and full stats output including SN and all histogram sections) as part of its single-pass BAM processing. This section compares the output of each tool against the originals, validated on two datasets from an AWS cloud run (nf-core AWS megatests, 2026-03-09).

Note: RustQC runtime shown is for all tools combined in a single pass. See Benchmark Details for a full breakdown.

Individual Samtools tool times are shown for reference.

Small dataset (~52K reads, chr6 subset)
ToolRuntimeRSS
samtools flagstat<1s3.1 MB
samtools idxstats<1s3.1 MB
samtools stats<1s3.2 MB
RUSTQC_RNA (all-in-one)25.9s182 MB
Large dataset (~186M reads, GM12878)
ToolRuntimeRSS
samtools flagstat3m 5s5.1 MB
samtools idxstats16s4.8 MB
samtools stats9m 48s8.2 MB
RUSTQC_RNA (all-in-one)14m 54s11.4 GB

Result: Identical

All flagstat metrics match exactly between samtools flagstat and RustQC on both the small and large datasets.

Large dataset (GM12878, ~186M reads)
MetricsamtoolsRustQC
Total reads201,605,452201,605,452
Primary186,289,286186,289,286
Secondary15,316,16615,316,166
Duplicates132,703,364132,703,364
Mapped189,114,475 (93.80%)189,114,475 (93.80%)
Primary mapped173,798,309 (93.29%)173,798,309 (93.29%)
Properly paired173,557,764 (93.17%)173,557,764 (93.17%)
Singletons240,545240,545
Small dataset (test, ~52K reads, chr6 subset)
MetricSamtoolsRustQC
Total reads52,83952,839
Primary49,57349,573
Secondary3,2663,266
Duplicates6,0976,097
Mapped52,839 (100.00%)52,839 (100.00%)
Primary mapped49,573 (100.00%)49,573 (100.00%)
Properly paired49,546 (99.95%)49,546 (99.95%)
Singletons2727

The output format is fully compatible with MultiQC and other tools that parse Samtools flagstat output.

Result: Identical

Per-chromosome read counts match exactly across all reference sequences. Both files include the same reference names, lengths, mapped counts, and unmapped counts, plus the * row for unplaced reads.

Result: Near-identical

All SN (Summary Numbers) fields and 19 of 20 histogram sections match exactly between samtools stats and RustQC on both datasets. The GCD (GC depth) section shows minor differences (max absolute difference of 0.018) due to sampling variation in the GC depth calculation.

Large dataset (GM12878)
MetricSamtoolsRustQC
sequences186,289,286186,289,286
reads mapped173,798,309173,798,309
reads duplicated132,703,364132,703,364
reads unmapped12,490,97712,490,977
reads properly paired173,557,764173,557,764
error rate2.599194e-032.599194e-03
average length9999
average quality36.336.3
insert size average1,536.71,536.7
insert size std dev2,263.82,263.8
Small dataset (test, chr6 subset)
MetricSamtoolsRustQC
sequences49,57349,573
reads mapped49,57349,573
reads duplicated6,0976,097
reads unmapped00
reads properly paired49,54649,546
error rate5.750919e-035.750919e-03
average length145145
average quality38.638.6
insert size average1,625.91,625.9
insert size std dev2,305.42,305.4

RustQC produces the full samtools stats output, including the SN section and all histogram sections (FFQ, LFQ, GCF, GCL, GCC, GCT, FBC, LBC, FTC, LTC, IS, RL, FRL, LRL, MAPQ, ID, IC, COV, GCD, CHK). The output format includes the samtools stats header comment required for MultiQC parsing.

Each output (flagstat, idxstats, stats) can be individually enabled or disabled. See the Configuration page for details.

  • Samtools: Danecek P, Bonfield JK, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. Samtools website