nf-proteindesign¶
Proof of Principle Pipeline
This pipeline was developed by Seqera as a proof of principle using Seqera AI. It demonstrates the capabilities of AI-assisted bioinformatics pipeline development but should be thoroughly validated before use in production environments.
Overview¶
nf-proteindesign is a Nextflow pipeline for high-throughput protein design using Boltzgen, an all-atom generative diffusion model. Design proteins, peptides, and nanobodies to bind various biomolecular targets with a comprehensive suite of downstream analysis modules.
Modular Analysis Pipeline
The pipeline combines Boltzgen design with optional sequence optimization (ProteinMPNN + Boltz-2), quality assessment (ipSAE, PRODIGY, Foldseek), and unified reporting (metrics consolidation).
Analysis Modules¶
🧬 ProteinMPNN
Sequence optimization for designed structures with configurable sampling temperature.
--run_proteinmpnn
🔄 Boltz-2
Structure prediction for ProteinMPNN sequences to validate refolding.
--run_boltz2_refold
📊 ipSAE
Interface quality scoring for Boltzgen and Boltz-2 structures.
--run_ipsae
⚡ PRODIGY
Binding affinity prediction (ΔG and Kd) for all structures.
--run_prodigy
🔍 Foldseek
Structural similarity search in AlphaFold/Swiss-Model databases.
--run_foldseek
📈 Consolidation
Unified CSV report combining all analysis metrics.
--run_consolidation
Key Features¶
- :material-parallel: Parallel Processing: Run multiple design specifications simultaneously
- YAML-Based Design: Complete control with custom design specifications
- Comprehensive Analysis: Six optional analysis modules for quality assessment
- Sequence Optimization: ProteinMPNN + Boltz-2 validation workflow
- Container Support: Full Docker compatibility
- :material-gpu: GPU Acceleration: Optimized for NVIDIA GPU execution
- Organized Outputs: Structured results with unified reporting
:material-pipeline: Pipeline Workflow¶
graph TB
A[Samplesheet<br/>Design YAMLs] --> B{Boltzgen<br/>Precomputed?}
B -->|No| C[Run Boltzgen Design]
B -->|Yes| D[Use Precomputed]
C --> E[Budget Designs<br/>CIF + NPZ]
D --> E
E --> F{ProteinMPNN<br/>Enabled?}
F -->|No| Z[Boltzgen Outputs Only]
F -->|Yes| G[Sequence Optimization<br/>Parallel per Design]
G --> H{Boltz-2<br/>Enabled?}
H -->|No| Y[MPNN Sequences Only]
H -->|Yes| I[Prepare Sequences<br/>Split + Target]
I --> J[Boltz-2 Structure<br/>Prediction]
J --> K[Boltz-2 Structures<br/>CIF + NPZ]
K --> L{Analysis<br/>Modules?}
L -->|IPSAE| M[Interface Scoring]
L -->|PRODIGY| N[Binding Affinity]
L -->|Foldseek| O[Structural Search]
M --> P{Consolidate?}
N --> P
O --> P
P -->|Yes| Q[Unified CSV + HTML<br/>Report]
Q --> R[Final Results]
K --> R
Z --> R
Y --> R
style C fill:#9C27B0,stroke:#9C27B0,color:#fff
style G fill:#8E24AA,stroke:#8E24AA,color:#fff
style J fill:#7B1FA2,stroke:#7B1FA2,color:#fff
style Q fill:#6A1B9A,stroke:#6A1B9A,color:#fff
classDef note fill:#FFF3E0,stroke:#FF9800,color:#000
class L note
Analysis Requirements
IPSAE, PRODIGY, and Foldseek require both --run_proteinmpnn and --run_boltz2_refold to be enabled. These modules analyze only the Boltz-2 refolded structures, not the original Boltzgen designs.
Quick Start¶
Get started with nf-proteindesign in minutes:
# 1. Install Nextflow (>=23.04.0)
curl -s https://get.nextflow.io | bash
# 2. Run the pipeline
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--outdir results
Need Help?
Check out the Quick Start Guide for detailed setup instructions and examples.
What Can You Design?¶
The pipeline leverages Boltzgen's capabilities to design:
- Proteins: Full-length protein binders targeting specific interfaces
- Peptides: Short peptide sequences for tight binding
- Nanobodies: Compact antibody fragments for therapeutic applications
- Multi-target Binders: Design to multiple targets simultaneously
All with the flexibility to specify: - Binding site residues - Designed chain type (protein, peptide, nanobody) - Chain length constraints - Custom diffusion parameters
Documentation Structure¶
Getting Started
Installation, basic usage, and quick reference guides.
Pipeline Modes
Detailed documentation for each operational mode.
Analysis Tools
PRODIGY and ipSAE integration guides.
Architecture
Technical details and implementation notes.
Computing Requirements¶
Hardware Requirements
GPU: NVIDIA GPU with CUDA support (recommended for reasonable execution times)
Memory: Minimum 16GB RAM, 32GB+ recommended for large designs
Storage: 50GB+ for pipeline dependencies and outputs
Contributing¶
We welcome contributions! The pipeline is designed with modularity and extensibility in mind.
License¶
This project is licensed under the MIT License - see the LICENSE file for details.