Quick Start Guide¶
Get up and running with nf-proteindesign in minutes!
Prerequisites¶
Before running the pipeline, ensure you have:
Required Software¶
-
Nextflow (>=23.04.0)
-
Container Engine:
- Docker (required)
Hardware Requirements¶
GPU Required
Boltzgen requires an NVIDIA GPU with CUDA support for reasonable execution times. CPU execution is possible but extremely slow.
- GPU: NVIDIA GPU with CUDA 11.8+ support
- Memory: 16GB RAM minimum, 32GB+ recommended
- Storage: 50GB+ for dependencies and outputs
Prepare Input Files¶
1. Design YAML Files (Design Mode)¶
Create a design specification file following Boltzgen format:
name: antibody_design_example
target:
structure: data/target_protein.pdb
residues: [10, 11, 12, 45, 46, 47, 89] # Binding site residues
designed:
chain_type: protein
length: [50, 80] # Range of acceptable lengths
global:
n_samples: 10
save_traj: true
2. Create Samplesheet¶
Create a CSV file with your design specifications:
sample_id,design_yaml,num_designs,budget
design1,/path/to/design1.yaml,10000,20
design2,/path/to/design2.yaml,5000,10
design3,/path/to/design3.yaml,15000,30
Column descriptions:
- sample_id: Unique identifier for the design
- design_yaml: Path to the design YAML file
- num_designs: Number of intermediate designs to generate (10,000-60,000 for production)
- budget: Number of final diversity-optimized designs to keep
Running the Pipeline¶
Basic Execution¶
Choose the appropriate profile for your system:
With Analysis Modules¶
Enable optional analysis steps for comprehensive quality assessment:
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--outdir results \
--run_proteinmpnn \
--run_protenix_refold \
--run_ipsae \
--run_prodigy \
--run_foldseek \
--foldseek_database /path/to/database_dir \
--foldseek_database_name afdb \
--run_consolidation
Common Options¶
Design Parameters¶
Customize design generation:
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--outdir results \
--num_designs 10000 \ # Number of intermediate designs
--budget 20 \ # Number of final designs to keep
--protocol protein-anything # Design protocol
Resource Allocation¶
Adjust compute resources:
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--outdir results \
--max_cpus 16 \
--max_memory 64.GB \
--max_time 48.h
Understanding Outputs¶
After successful execution, your results/ directory will contain:
results/
├── boltzgen/ # Main Boltzgen outputs
│ ├── sample1/
│ │ ├── final_ranked_designs/
│ │ ├── intermediate_designs/
│ │ └── boltzgen.log
│ └── sample2/
│ └── ...
├── ipsae/ # IPSAE scores (if enabled)
│ └── sample1_ipsae_scores.csv
├── prodigy/ # PRODIGY predictions (if enabled)
│ └── sample1_prodigy_predictions.csv
├── pipeline_info/ # Execution reports
│ ├── execution_report.html
│ ├── execution_timeline.html
│ └── execution_trace.txt
└── multiqc/ # MultiQC report (if enabled)
└── multiqc_report.html
Final Designs
The most important files are in boltzgen/*/final_ranked_designs/ - these contain your ranked protein designs ready for experimental validation.
Example Workflow¶
Here's a complete example from start to finish:
1. Prepare Design File¶
name: covid_spike_binder
target:
structure: data/spike_protein.pdb
residues: [417, 484, 501] # RBD key residues
designed:
chain_type: nanobody
length: [110, 130]
global:
n_samples: 20
timesteps: 100
save_traj: true
2. Create Samplesheet¶
3. Run Pipeline¶
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input spike_designs.csv \
--outdir covid_binders \
--run_prodigy true
4. Check Results¶
# View execution report
open covid_binders/pipeline_info/execution_report.html
# Check final designs
ls covid_binders/boltzgen/spike_nb1/final_ranked_designs/
# View binding predictions
cat covid_binders/prodigy/spike_nb1_prodigy_predictions.csv
Troubleshooting¶
Common Issues¶
GPU Not Detected
Error: CUDA device not found
Solution: Ensure NVIDIA drivers are installed and Docker has GPU access:
Out of Memory
Error: CUDA out of memory
Solution: Reduce batch size or number of parallel samples:
Container Pull Failed
Error: Error pulling container image
Solution: Pre-pull containers or use cached versions:
Next Steps¶
Now that you're up and running:
- Learn Basic Usage: Check the Usage Guide for detailed instructions
- Optimize Parameters: See the Parameters Reference
- Enable Analysis Modules: Learn about ProteinMPNN/Protenix, PRODIGY, and ipSAE
- Advanced Usage: Explore Architecture details
Need Help?
- Check the GitHub Issues
- Review example workflows
- See the Quick Reference