Example Workflows¶
Complete examples for common protein design use cases.
Example 1: Protein Binder Design¶
Design a protein to bind EGFR using a pre-made design specification.
Create Design YAML¶
egfr_protein_design.yaml
# Boltzgen design specification for protein binder
entities:
# Designed protein entity
- protein:
id: C
sequence: 80..120 # Length range for designed protein
# Target structure entity
- file:
path: egfr_structure.cif
include:
- chain:
id: A # Target chain to bind
Create Samplesheet¶
egfr_samplesheet.csv
sample_id,design_yaml,structure_files,protocol,num_designs,budget
egfr_binder,egfr_protein_design.yaml,egfr_structure.cif,protein-anything,100,10
Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input egfr_samplesheet.csv \
--outdir egfr_designs \
--run_proteinmpnn \
--run_ipsae \
--run_prodigy \
--run_consolidation
Analyze Results¶
import pandas as pd
# Load consolidated metrics
results = pd.read_csv('egfr_designs/egfr_binder/consolidated_metrics.csv')
# Find top 5 candidates by binding affinity
top5 = results.nsmallest(5, 'prodigy_delta_g')
print(top5[['design_file', 'prodigy_delta_g', 'prodigy_kd', 'ipsae_score']])
Example 2: Peptide Binder Design¶
Design peptide binders for a target protein.
Create Design YAML¶
peptide_design.yaml
# Boltzgen design specification for peptide binder
entities:
# Designed peptide entity
- protein:
id: P
sequence: 12..25 # Peptide length range
# Target structure
- file:
path: target.cif
include:
- chain:
id: A
Create Samplesheet¶
peptide_samplesheet.csv
sample_id,design_yaml,structure_files,protocol,num_designs,budget
peptide_binder,peptide_design.yaml,target.cif,peptide-anything,100,10
Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input peptide_samplesheet.csv \
--protocol peptide-anything \
--outdir peptide_designs
:material-antibody: Example 3: Nanobody Design¶
Design nanobodies to bind a specific target.
Create Design YAML¶
nanobody_design.yaml
# Boltzgen design specification for nanobody
entities:
# Designed nanobody entity
- protein:
id: N
sequence: 110..130 # Typical nanobody length range
# Target structure
- file:
path: antigen.cif
include:
- chain:
id: A
Create Samplesheet¶
nanobody_samplesheet.csv
sample_id,design_yaml,structure_files,protocol,num_designs,budget
nanobody_binder,nanobody_design.yaml,antigen.cif,nanobody-anything,100,10
Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input nanobody_samplesheet.csv \
--protocol nanobody-anything \
--outdir nanobody_designs
Example 4: Multiple Targets¶
Design binders for multiple targets in a single run.
Create Design YAMLs¶
target1_design.yaml
entities:
- protein:
id: C
sequence: 80..120
- file:
path: target1.cif
include:
- chain:
id: A
target2_design.yaml
entities:
- protein:
id: C
sequence: 60..100
- file:
path: target2.cif
include:
- chain:
id: B
Create Samplesheet¶
multi_target_samplesheet.csv
sample_id,design_yaml,structure_files,protocol,num_designs,budget
target1_binder,target1_design.yaml,target1.cif,protein-anything,100,10
target2_binder,target2_design.yaml,target2.cif,protein-anything,100,10
Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input multi_target_samplesheet.csv \
--outdir multi_designs \
--run_consolidation
Example 5: Full Analysis Pipeline¶
Complete workflow with all analysis tools enabled.
Create Samplesheet¶
full_analysis_samplesheet.csv
sample_id,design_yaml,structure_files,protocol,num_designs,budget
full_analysis,my_design.yaml,target.cif,protein-anything,200,20
Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input full_analysis_samplesheet.csv \
--outdir full_analysis_results \
--num_designs 200 \
--budget 20 \
--run_proteinmpnn \
--mpnn_num_seq_per_target 10 \
--run_ipsae \
--ipsae_pae_cutoff 8 \
--run_prodigy \
--run_consolidation \
--report_top_n 20
Review Consolidated Report¶
# View consolidated metrics
cat full_analysis_results/full_analysis/consolidated_metrics.csv | column -t -s,
# Count successful designs
grep "SUCCESS" full_analysis_results/full_analysis/consolidated_metrics.csv | wc -l
# Find designs with best affinity
sort -t',' -k3,3n full_analysis_results/full_analysis/consolidated_metrics.csv | head -10
Example 6: Using Test Profiles¶
The pipeline includes built-in test profiles for quick validation.
Test Protein Design¶
nextflow run seqeralabs/nf-proteindesign \
-profile test_design_protein,docker \
--outdir test_protein_results
Test Peptide Design¶
nextflow run seqeralabs/nf-proteindesign \
-profile test_design_peptide,docker \
--outdir test_peptide_results
Test Nanobody Design¶
nextflow run seqeralabs/nf-proteindesign \
-profile test_design_nanobody,docker \
--outdir test_nanobody_results
Example 7: Seqera Platform Deployment¶
Run the pipeline on Seqera Platform with GPU compute.
Via Seqera Platform UI¶
- Navigate to your workspace
- Click "Launch Pipeline"
- Select
seqeralabs/nf-proteindesign - Upload your samplesheet to a Data Link
- Configure parameters:
input: Path to samplesheet in Data Linkoutdir: Output Data Link pathnum_designs: 100budget: 10- Select GPU-enabled compute environment
- Click "Launch"
Via Seqera CLI¶
# Create launch configuration
tw launch seqeralabs/nf-proteindesign \
--workspace <your-workspace> \
--compute-env <gpu-compute-env> \
--params-file params.json \
--outdir s3://your-bucket/results
Output Files¶
After pipeline completion, you'll find:
results/
└── {sample_id}/
├── boltzgen/
│ ├── final_ranked_designs/
│ │ ├── design_1.cif # Top ranked design
│ │ ├── design_2.cif
│ │ └── ...
│ └── intermediate_designs/
│ └── *.cif
├── proteinmpnn/ # If --run_proteinmpnn enabled
│ ├── design_1_sequences.fa
│ └── ...
├── ipsae/ # If --run_ipsae enabled
│ ├── design_1_ipsae_scores.csv
│ └── ...
├── prodigy/ # If --run_prodigy enabled
│ ├── design_1_prodigy_summary.csv
│ └── ...
└── consolidated_metrics.csv # If --run_consolidation enabled
Tips and Best Practices¶
Design YAML Tips¶
- Length ranges: Use
80..120syntax for flexible design lengths - Multiple chains: Specify multiple target chains for complex interfaces
- Chain IDs: Use descriptive chain IDs (A, B, C, etc.)
Parameter Tuning¶
- Quick tests: Start with
num_designs=10, budget=5for fast validation - Production runs: Use
num_designs=100-200, budget=10-20for quality results - Large campaigns: Increase to
num_designs=200+, budget=50+for diversity
Resource Optimization¶
- GPU memory: Ensure 16GB+ VRAM for standard runs
- Caching: Use
--cache_dirto avoid re-downloading model weights - Resume: Always use
-resumeflag to recover from interruptions
Analysis Workflow¶
- Run Boltzgen to generate initial designs
- Enable ProteinMPNN for sequence optimization
- Use IPSAE for interface quality scoring
- Apply PRODIGY for binding affinity prediction
- Review consolidated metrics for top candidates
- Select top designs for experimental validation
Troubleshooting¶
Common Issues¶
GPU not detected:
# Verify GPU access
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Out of memory:
Pipeline fails: