Configuration File
The rustqc rna subcommand supports an optional YAML configuration file for
advanced settings that go beyond what CLI flags offer. Pass the config file with
--config / -c:
rustqc rna sample.bam --gtf genes.gtf -p -c config.yaml -o results/All sections and fields are optional. Missing fields use their default values. Unknown fields are silently ignored, so config files remain forward-compatible.
The configuration file mirrors the CLI subcommand hierarchy. All RNA-Seq QC
settings live under a top-level rna: key, matching the rustqc rna subcommand.
Full example
Section titled “Full example”rna: # Library settings stranded: unstranded # unstranded, forward, or reverse paired: true # Paired-end mode
# Chromosome name remapping chromosome_prefix: "chr" chromosome_mapping: chrM: "MT"
# Output settings sample_name: "MySample" # Override BAM-derived sample name for output filenames flat_output: false # Set to true to skip subdirectories
# dupRadar output toggles dupradar: dup_matrix: true intercept_slope: true density_scatter_plot: true boxplot: true expression_histogram: true multiqc_intercept: true multiqc_curve: true
# featureCounts output toggles featurecounts: counts_file: true summary_file: true biotype_summary_file: true biotype_counts: true biotype_counts_mqc: true biotype_rrna_mqc: true biotype_attribute: "gene_biotype"
# RSeQC tool toggles and settings bam_stat: enabled: true infer_experiment: enabled: true sample_size: 200000 read_duplication: enabled: true read_distribution: enabled: true junction_annotation: enabled: true min_intron: 50 junction_saturation: enabled: true seed: 42 min_coverage: 1 percentile_floor: 5 percentile_ceiling: 100 percentile_step: 5 inner_distance: enabled: true sample_size: 1000000 lower_bound: -250 upper_bound: 250 step: 5
# TIN (Transcript Integrity Number) tin: enabled: true seed: 12345 sample_size: 100 min_coverage: 10
# Qualimap RNA-Seq QC qualimap: enabled: true
# Library complexity (preseq lc_extrap) preseq: enabled: true max_extrap: 10000000000 step_size: 1000000 n_bootstraps: 100 confidence_level: 0.95 seed: 408 max_terms: 100 max_segment_length: 100000000 defects: false
# Samtools-compatible outputs flagstat: enabled: true idxstats: enabled: true samtools_stats: enabled: trueLibrary settings
Section titled “Library settings”stranded
Section titled “stranded”Library strandedness for strand-aware read counting:
| Value | Meaning |
|---|---|
unstranded | Count reads on either strand |
forward | Forward stranded (read 1 maps to the transcript strand) |
reverse | Reverse stranded (read 2 maps to the transcript strand) |
Default: unstranded
This can also be set via the -s / --stranded CLI flag, which takes
precedence over the config file value.
paired
Section titled “paired”Enable paired-end mode. When true, read pairs are counted as a single
fragment.
Default: false (single-end mode)
This can also be set via the -p / --paired CLI flag. If either the CLI flag
or the config file value is true, paired-end mode is enabled.
rna: stranded: reverse paired: trueOutput settings
Section titled “Output settings”sample_name
Section titled “sample_name”Override the sample name used for output filenames. By default, the sample name
is derived from the BAM file stem (e.g., sample.markdup.sorted.bam produces
files named sample.markdup.sorted.*). When set, the provided name is used
instead.
Default: derived from BAM filename
This can also be set via the --sample-name CLI flag, which takes precedence
over the config file value. Can only be used with a single input file.
rna: sample_name: "MySample"Chromosome name mapping
Section titled “Chromosome name mapping”When the chromosome names in your alignment file differ from those in the GTF, RustQC can remap them automatically. Two mechanisms are available, and they can be combined.
chromosome_prefix
Section titled “chromosome_prefix”A string prefix to prepend to alignment file chromosome names before matching against GTF names. Applied first, before explicit mapping lookups.
rna: # Alignment has "1", "2", "X"; GTF has "chr1", "chr2", "chrX" chromosome_prefix: "chr"chromosome_mapping
Section titled “chromosome_mapping”An explicit mapping from GTF chromosome names (keys) to alignment file chromosome names (values). Applied after the prefix, so explicit entries can override the prefix for specific chromosomes.
rna: # After adding "chr" prefix, override the mitochondrial chromosome chromosome_mapping: chrM: "MT"A common use case is GENCODE GTFs (which use chr1, chr2, …) with Ensembl
alignments (which use 1, 2, …):
rna: chromosome_prefix: "chr" chromosome_mapping: chrM: "MT"dupRadar output toggles
Section titled “dupRadar output toggles”The dupradar: section controls which dupRadar output files are generated.
All outputs are enabled by default.
rna: dupradar: dup_matrix: true # Duplication matrix TSV intercept_slope: true # Intercept/slope fit results density_scatter_plot: true # Density scatter plot (PNG + SVG) boxplot: true # Duplication rate boxplot (PNG + SVG) expression_histogram: true # Expression histogram (PNG + SVG) multiqc_intercept: true # MultiQC intercept/slope file multiqc_curve: true # MultiQC fitted curve fileSet any field to false to skip generating that output:
rna: dupradar: boxplot: false expression_histogram: falsefeatureCounts output toggles
Section titled “featureCounts output toggles”The featurecounts: section controls which featureCounts-compatible output files
are generated, plus the biotype attribute setting.
rna: featurecounts: counts_file: true # featureCounts-compatible counts TSV summary_file: true # Gene-level assignment summary biotype_summary_file: true # Biotype-level assignment summary biotype_counts: true # Biotype counts TSV biotype_counts_mqc: true # Biotype counts MultiQC bargraph file biotype_rrna_mqc: true # Biotype rRNA percentage MultiQC file biotype_attribute: "gene_biotype" # GTF attribute for biotype groupingbiotype_attribute
Section titled “biotype_attribute”The GTF attribute name used for biotype grouping. This controls how genes are categorized in the biotype output files.
| GTF source | Typical attribute |
|---|---|
| Ensembl | gene_biotype |
| GENCODE | gene_type |
Default: "gene_biotype"
This can also be set via the --biotype-attribute CLI flag, which takes
precedence over the config file value.
RustQC auto-detects the biotype attribute if the specified one is not found in
the GTF. If neither gene_biotype nor gene_type is present, a warning is
printed and biotype counting is skipped.
RSeQC tool settings
Section titled “RSeQC tool settings”Each of the 8 RSeQC tools has an enabled toggle (default true) and
tool-specific parameter overrides. Disabling a tool here prevents it from
running even when annotation is provided. CLI flags take precedence over
config file values for all parameters.
bam_stat
Section titled “bam_stat”rna: bam_stat: enabled: true # Set to false to skip bam_statNo additional parameters. This tool does not require annotation.
infer_experiment
Section titled “infer_experiment”rna: infer_experiment: enabled: true sample_size: 200000 # Number of reads to sample (default: 200000)Requires annotation (--gtf). The sample_size can also be set via
--infer-experiment-sample-size.
read_duplication
Section titled “read_duplication”rna: read_duplication: enabled: true # Set to false to skip read_duplicationNo additional parameters. This tool does not require annotation.
CLI shortcut: Use
--skip-read-duplicationto disable without a config file.
read_distribution
Section titled “read_distribution”rna: read_distribution: enabled: true # Set to false to skip read_distributionNo additional parameters. Requires annotation (--gtf).
junction_annotation
Section titled “junction_annotation”rna: junction_annotation: enabled: true min_intron: 50 # Minimum intron length in bases (default: 50)Requires annotation (--gtf). The min_intron can also be set via --min-intron.
junction_saturation
Section titled “junction_saturation”rna: junction_saturation: enabled: true seed: 42 # Random seed for reproducible sampling (default: 42) min_coverage: 1 # Minimum read count to consider a junction (default: 1) percentile_floor: 5 # Sampling start percentage (default: 5) percentile_ceiling: 100 # Sampling end percentage (default: 100) percentile_step: 5 # Sampling step size (default: 5)Requires annotation (--gtf). These parameters can also be set via CLI flags:
--junction-saturation-seed, --junction-saturation-min-coverage,
--junction-saturation-percentile-floor, --junction-saturation-percentile-ceiling,
--junction-saturation-percentile-step.
inner_distance
Section titled “inner_distance”rna: inner_distance: enabled: true sample_size: 1000000 # Number of reads to sample (default: 1000000) lower_bound: -250 # Histogram lower bound (default: -250) upper_bound: 250 # Histogram upper bound (default: 250) step: 5 # Histogram bin width (default: 5)Requires annotation (--gtf). These parameters can also be set via CLI flags:
--inner-distance-sample-size, --inner-distance-lower-bound,
--inner-distance-upper-bound, --inner-distance-step.
rna: tin: enabled: true seed: 12345 # Random seed for reproducible results (default: none, non-deterministic) sample_size: 100 # Equally-spaced positions to sample per transcript (default: 100) min_coverage: 10 # Minimum read-start count to compute TIN (default: 10)Requires annotation (--gtf). The TIN (Transcript Integrity Number)
measures transcript integrity via Shannon entropy of read coverage uniformity.
The seed can also be set via --tin-seed.
CLI shortcut: Use
--skip-tinto disable without a config file.
qualimap
Section titled “qualimap”rna: qualimap: enabled: true # Set to false to skip Qualimap RNA-Seq QC analysisRequires annotation (--gtf only). Runs the Qualimap RNA-Seq QC analysis:
gene body coverage profiling (100 percentile bins, 5’ to 3’), 5’/3’ bias metrics,
read origin classification (exonic/intronic/intergenic), strand-specificity
estimation, and splice junction motif counting. Produces Qualimap-compatible
output files parseable by MultiQC.
preseq
Section titled “preseq”rna: preseq: enabled: true max_extrap: 10000000000 # Maximum extrapolation depth (default: 1e10) step_size: 1000000 # Step size between extrapolation points (default: 1e6) n_bootstraps: 100 # Bootstrap replicates for confidence intervals (default: 100) confidence_level: 0.95 # CI confidence level (default: 0.95) seed: 408 # Random seed for reproducibility (default: 408, matching upstream preseq) max_terms: 100 # Maximum terms in power series (default: 100) max_segment_length: 100000000 # Max merged PE fragment length in bp (default: 1e8) defects: false # Use defects model for problematic histograms (default: false)Only needs BAM fragment info (no annotation required). The max_extrap,
step_size, n_bootstraps, seed, and max_segment_length can also be set via CLI
flags (--preseq-max-extrap, --preseq-step-size, --preseq-n-bootstraps,
--preseq-seed, --preseq-seg-len). Use --skip-preseq to disable entirely.
samtools
Section titled “samtools”flagstat / idxstats / samtools_stats
Section titled “flagstat / idxstats / samtools_stats”rna: flagstat: enabled: true # samtools flagstat-compatible output idxstats: enabled: true # samtools idxstats-compatible output samtools_stats: enabled: true # samtools stats compatible output (full format including all histogram sections)These produce samtools-compatible output files in the samtools/ subdirectory.
They share the same BAM statistics accumulator as bam_stat — enabling any of
them ensures the statistics are collected.