Output Files Reference¶
Complete guide to understanding pipeline outputs.
Directory Structure¶
Boltzgen Outputs¶
Final Ranked Designs¶
Description: Top-ranked protein designs in CIF format.
Contents: Complete atomic coordinates for designed complexes.
Intermediate Designs¶
results/{sample}/boltzgen/intermediate_designs/
├── generation_*.cif
├── inverse_fold_*.cif
└── refold_*.cif
Description: Intermediate structures from design pipeline.
Log Files¶
Description: Complete execution log with design metrics.
PRODIGY Outputs¶
Summary CSV¶
Format:
sample_id,design_file,delta_g,kd,temperature,bsa,ics,charged_residues,charged_percentage,apolar_residues,apolar_percentage
sample1,design_1.cif,-11.2,5.4e-09,25.0,1543.21,89,15,16.85,48,53.93
Full Results¶
Description: Complete PRODIGY output with all metrics.
ipSAE Outputs¶
Format:
design_id,interface_area,shape_comp,contact_density,h_bonds,salt_bridges,hydrophobic
design_1,1543.2,0.68,0.045,12,3,28
├── pockets/
│ ├── {sample}_pocket1.pdb
│ ├── {sample}_pocket2.pdb
│ └── ...
├── visualizations/
│ └── {sample}_pockets.pml
└── {sample}_predictions.csv
Predictions CSV¶
Format:
rank,score,size,center_x,center_y,center_z,residues
1,0.85,42,12.3,45.6,78.9,"10,11,12,45,46,47"
2,0.72,38,23.4,56.7,89.0,"20,21,22,65,66,67"
Target Mode Outputs¶
Generated Designs¶
results/{sample}/design_variants/
├── {sample}_len60_v1.yaml
├── {sample}_len60_v2.yaml
├── {sample}_len80_v1.yaml
└── ...
Design Info¶
Contents: Summary of generated design variants.
Pipeline Info¶
Execution Report¶
Description: Interactive HTML report with: - Pipeline execution summary - Resource usage statistics - Process completion status - Error reports
Execution Timeline¶
Description: Visual timeline of process execution.
Execution Trace¶
Format: TSV file with detailed process information:
task_id hash native_id name status exit submit duration realtime %cpu rss vmem
1 ab/cd12 12345 BOLTZGEN_RUN COMPLETED 0 2024-01-15 10:00:00 1h 23m 1h 21m 95.2% 16.2 GB 24.1 GB
File Formats¶
CIF Files¶
Description: Crystallographic Information File format
Usage:
YAML Files¶
Description: Design specifications
Example:
name: design1
target:
structure: target.pdb
residues: [10, 11, 12]
designed:
chain_type: protein
length: [60, 100]
CSV Files¶
Description: Comma-separated analysis results
Usage:
Result Organization¶
By Sample¶
All outputs for each sample grouped together:
By Analysis Type¶
Within each sample, organized by analysis:
{sample}/
├── boltzgen/ # Primary designs
├── prodigy/ # Binding affinity
└── ipsae/ # Interface scoring
Accessing Results¶
Command Line¶
# List all final designs
find results/ -name "*.cif" -path "*/final_ranked_designs/*"
# Get best PRODIGY scores
cat results/*/prodigy/*_summary.csv | \
grep -v "sample_id" | \
sort -t',' -k3,3n | \
head -5
# Count successful designs
find results/ -name "design_*.cif" | wc -l
Python¶
from pathlib import Path
import pandas as pd
# Load all PRODIGY results
results = []
for csv in Path('results').rglob('*_prodigy_summary.csv'):
df = pd.read_csv(csv)
results.append(df)
combined = pd.concat(results)
print(combined.nsmallest(10, 'delta_g'))
R¶
library(tidyverse)
# Load PRODIGY results
results <- list.files(
"results",
pattern = "*_summary.csv",
recursive = TRUE,
full.names = TRUE
) %>%
map_df(read_csv)
# Analyze
results %>%
arrange(delta_g) %>%
head(10)
Quality Control¶
Check Completion¶
# Verify all samples completed
grep "COMPLETED" results/pipeline_info/execution_trace.txt | \
wc -l
# Check for failures
grep "FAILED" results/pipeline_info/execution_trace.txt
Validate Outputs¶
# Ensure all expected files exist
for sample in sample1 sample2; do
if [ ! -d "results/${sample}/boltzgen/final_ranked_designs" ]; then
echo "Missing designs for ${sample}"
fi
done
Export Results¶
Archive for Publication¶
# Create archive of final results
tar -czf protein_designs.tar.gz \
results/*/boltzgen/final_ranked_designs/ \
results/*/prodigy/*_summary.csv \
results/pipeline_info/execution_report.html
Upload to Repository¶
Next Steps¶
Reproducibility
Always save the execution report and trace files for reproducibility and troubleshooting.