nf-proteindesign¶

Proof of Principle Pipeline

This pipeline was developed by Seqera as a proof of principle using Seqera AI. It demonstrates the capabilities of AI-assisted bioinformatics pipeline development but should be thoroughly validated before use in production environments.

Overview¶

nf-proteindesign is a Nextflow pipeline for high-throughput protein design using Boltzgen, an all-atom generative diffusion model. Design proteins, peptides, and nanobodies to bind various biomolecular targets with a comprehensive suite of downstream analysis modules.

Modular Analysis Pipeline

The pipeline combines Boltzgen design with optional sequence optimization (ProteinMPNN + Boltz-2), quality assessment (ipSAE, PRODIGY, Foldseek), and unified reporting (metrics consolidation).

Analysis Modules¶

🧬 ProteinMPNN

Sequence optimization for designed structures with configurable sampling temperature.

--run_proteinmpnn

🔄 Boltz-2

Structure prediction for ProteinMPNN sequences to validate refolding.

--run_boltz2_refold

📊 ipSAE

Interface quality scoring for Boltzgen and Boltz-2 structures.

--run_ipsae

⚡ PRODIGY

Binding affinity prediction (ΔG and Kd) for all structures.

--run_prodigy

🔍 Foldseek

Structural similarity search in AlphaFold/Swiss-Model databases.

--run_foldseek

📈 Consolidation

Unified CSV report combining all analysis metrics.

--run_consolidation

Key Features¶

:material-parallel: Parallel Processing: Run multiple design specifications simultaneously
YAML-Based Design: Complete control with custom design specifications
Comprehensive Analysis: Six optional analysis modules for quality assessment
Sequence Optimization: ProteinMPNN + Boltz-2 validation workflow
Container Support: Full Docker compatibility
:material-gpu: GPU Acceleration: Optimized for NVIDIA GPU execution
Organized Outputs: Structured results with unified reporting

:material-pipeline: Pipeline Workflow¶

graph TB
    A[Samplesheet<br/>Design YAMLs] --> B{Boltzgen<br/>Precomputed?}
    B -->|No| C[Run Boltzgen Design]
    B -->|Yes| D[Use Precomputed]
    C --> E[Budget Designs<br/>CIF + NPZ]
    D --> E

    E --> F{ProteinMPNN<br/>Enabled?}
    F -->|No| Z[Boltzgen Outputs Only]
    F -->|Yes| G[Sequence Optimization<br/>Parallel per Design]

    G --> H{Boltz-2<br/>Enabled?}
    H -->|No| Y[MPNN Sequences Only]
    H -->|Yes| I[Prepare Sequences<br/>Split + Target]

    I --> J[Boltz-2 Structure<br/>Prediction]
    J --> K[Boltz-2 Structures<br/>CIF + NPZ]

    K --> L{Analysis<br/>Modules?}
    L -->|IPSAE| M[Interface Scoring]
    L -->|PRODIGY| N[Binding Affinity]
    L -->|Foldseek| O[Structural Search]

    M --> P{Consolidate?}
    N --> P
    O --> P
    P -->|Yes| Q[Unified CSV + HTML<br/>Report]

    Q --> R[Final Results]
    K --> R
    Z --> R
    Y --> R

    style C fill:#9C27B0,stroke:#9C27B0,color:#fff
    style G fill:#8E24AA,stroke:#8E24AA,color:#fff
    style J fill:#7B1FA2,stroke:#7B1FA2,color:#fff
    style Q fill:#6A1B9A,stroke:#6A1B9A,color:#fff

    classDef note fill:#FFF3E0,stroke:#FF9800,color:#000
    class L note

Analysis Requirements

IPSAE, PRODIGY, and Foldseek require both --run_proteinmpnn and --run_boltz2_refold to be enabled. These modules analyze only the Boltz-2 refolded structures, not the original Boltzgen designs.

Quick Start¶

Get started with nf-proteindesign in minutes:

# 1. Install Nextflow (>=23.04.0)
curl -s https://get.nextflow.io | bash

# 2. Run the pipeline
nextflow run seqeralabs/nf-proteindesign \
    -profile docker \
    --input samplesheet.csv \
    --outdir results

Need Help?

Check out the Quick Start Guide for detailed setup instructions and examples.

What Can You Design?¶

The pipeline leverages Boltzgen's capabilities to design:

Proteins: Full-length protein binders targeting specific interfaces
Peptides: Short peptide sequences for tight binding
Nanobodies: Compact antibody fragments for therapeutic applications
Multi-target Binders: Design to multiple targets simultaneously

All with the flexibility to specify: - Binding site residues - Designed chain type (protein, peptide, nanobody) - Chain length constraints - Custom diffusion parameters

Documentation Structure¶

Getting Started

Installation, basic usage, and quick reference guides.

Pipeline Modes

Detailed documentation for each operational mode.

Analysis Tools

PRODIGY and ipSAE integration guides.

Architecture

Technical details and implementation notes.

Computing Requirements¶

Hardware Requirements

GPU: NVIDIA GPU with CUDA support (recommended for reasonable execution times)
Memory: Minimum 16GB RAM, 32GB+ recommended for large designs
Storage: 50GB+ for pipeline dependencies and outputs

Contributing¶

We welcome contributions! The pipeline is designed with modularity and extensibility in mind.

License¶

This project is licensed under the MIT License - see the LICENSE file for details.

Built with :material-heart: using Nextflow and Material for MkDocs