Skip to content

PRODIGY Binding Affinity Prediction

Overview

The pipeline includes optional PRODIGY (PROtein binDIng enerGY prediction) analysis for evaluating the predicted binding affinity of protein-protein complexes generated by Boltzgen.

What is PRODIGY?

PRODIGY is a fast, structure-based binding affinity predictor developed by the Bonvin lab (Utrecht University). It uses interface properties to estimate binding free energy (ΔG) and dissociation constant (Kd) from structural information.

Key Metrics

PRODIGY analyzes several interface properties:

Metric Description Interpretation
ΔG Binding free energy (kcal/mol) More negative = stronger binding
Kd Dissociation constant (M) Lower = tighter binding
BSA Buried Surface Area (Ų) Larger = more extensive interface
ICs Interface Contacts More = more interactions
Charged % Charged residues in interface Electrostatic contribution
Apolar % Apolar residues in interface Hydrophobic contribution

Enabling PRODIGY

Add the --run_prodigy flag to enable analysis:

nextflow run seqeralabs/nf-proteindesign \
    -profile docker \
    --input samplesheet.csv \
    --run_prodigy \
    --outdir results

Chain Selection

# PRODIGY automatically identifies chains
nextflow run ... --run_prodigy
# Specify chains explicitly
nextflow run ... \
    --run_prodigy \
    --prodigy_selection 'A,B'
# For specific interfaces in multi-chain systems
nextflow run ... \
    --run_prodigy \
    --prodigy_selection 'A,B,C'

Output Structure

PRODIGY generates detailed results for each final ranked design:

results/
└── {sample_id}/
    └── prodigy/
        ├── design_1_prodigy_results.txt      # Full output
        ├── design_1_prodigy_summary.csv      # Parsed metrics
        ├── design_2_prodigy_results.txt
        ├── design_2_prodigy_summary.csv
        └── ...

Raw Results Format

The *_prodigy_results.txt files contain complete PRODIGY output:

design_1_prodigy_results.txt
[+] Reading structure file: design_1.cif
[+] Parsed structure file design_1 (1 model(s))
[+] Setting selection: A,B
[+] Found 2 chains in structure: A, B
[+] Calculating buried surface area...
[+] Buried Surface Area: 1543.21 A^2
[+] Number of interface contacts (ICs): 89
[+] Number of non-interacting surface residues: 34
[+] Number of charged residues in ICs: 15
[+] Percentage of charged residues in ICs: 16.85%
[+] Number of apolar residues in ICs: 48
[+] Percentage of apolar residues in ICs: 53.93%
[+] Predicted binding affinity (ΔG): -11.2 kcal/mol
[+] Predicted dissociation constant (Kd): 5.4e-09 M at 25.0°C

Parsed Summary Format

The *_prodigy_summary.csv provides machine-readable metrics:

design_1_prodigy_summary.csv
sample_id,design_file,delta_g,kd,temperature,bsa,ics,charged_residues,charged_percentage,apolar_residues,apolar_percentage
sample1,design_1.cif,-11.2,5.4e-09,25.0,1543.21,89,15,16.85,48,53.93

Interpreting Results

Binding Affinity (ΔG)

Strong Binders

ΔG < -10 kcal/mol: Excellent predicted affinity
Typical of high-affinity antibodies and tight binders

Good Binders

-10 < ΔG < -8 kcal/mol: Good predicted affinity
Suitable for many therapeutic applications

Weak Binders

ΔG > -8 kcal/mol: Moderate to weak affinity
May require optimization

Dissociation Constant (Kd)

The Kd value provides an alternative measure of binding strength:

Kd Range Binding Strength Example
< 1 nM Extremely tight High-affinity antibodies
1-10 nM Very strong Therapeutic antibodies
10-100 nM Strong Many functional binders
100-1000 nM Moderate Weak binders
> 1 μM Weak May not be functional

Interface Properties

Balanced Interfaces

Good binders typically show:

  • BSA: 1200-2000 Ų
  • ICs: 60-120 contacts
  • Charged: 10-25% (electrostatic interactions)
  • Apolar: 40-60% (hydrophobic core)

Comparing Designs

Use PRODIGY results to rank and select designs:

Example Analysis Workflow

import pandas as pd
import matplotlib.pyplot as plt

# Load all PRODIGY summaries
summaries = []
for f in Path('results/sample1/prodigy/').glob('*_summary.csv'):
    df = pd.read_csv(f)
    summaries.append(df)

combined = pd.concat(summaries)

# Rank by binding affinity
ranked = combined.sort_values('delta_g')
print("Top 5 designs by predicted affinity:")
print(ranked[['design_file', 'delta_g', 'kd', 'bsa']].head())

# Plot affinity distribution
plt.figure(figsize=(10, 6))
plt.hist(combined['delta_g'], bins=20, color='#9C27B0', alpha=0.7)
plt.xlabel('ΔG (kcal/mol)')
plt.ylabel('Number of Designs')
plt.title('Distribution of Predicted Binding Affinities')
plt.axvline(-10, color='red', linestyle='--', label='Strong binding threshold')
plt.legend()
plt.savefig('affinity_distribution.png', dpi=300)

Quick Command-Line Analysis

# Find top 10 designs by affinity
cat results/*/prodigy/*_summary.csv | \
    grep -v "sample_id" | \
    sort -t',' -k3,3n | \
    head -10 | \
    column -t -s','

# Count designs with ΔG < -10 kcal/mol
cat results/*/prodigy/*_summary.csv | \
    awk -F',' '$3 < -10 {count++} END {print "Strong binders:", count}'

Limitations

Important Considerations

PRODIGY predictions are computational estimates. Always consider:

  • Structure quality: PRODIGY assumes high-quality input structures
  • Experimental validation: Predicted values should guide, not replace, experiments
  • Context-dependent: Predictions don't account for solvent, pH, or cellular environment
  • Relative ranking: Better for comparing designs than absolute predictions

When PRODIGY Works Best

Protein-protein interfaces
Binary complexes
Well-defined binding sites
Comparative analysis of designs

When to Be Cautious

⚠️ Flexible/disordered regions
⚠️ Multi-chain complexes (>2 chains)
⚠️ Small interfaces (<800 Ų BSA)
⚠️ Unusual binding modes

References

Citation

PRODIGY: a web server for predicting the binding affinity of protein-protein complexes
Vangone A, Bonvin AMJJ. Bioinformatics (2015)
doi:10.1093/bioinformatics/btv038

Additional resources:

Integration Example

Complete pipeline run with PRODIGY:

# Run full design workflow with affinity prediction
nextflow run seqeralabs/nf-proteindesign \
    -profile docker \
    --input samplesheet.csv \
    --outdir protein_designs \
    --run_prodigy \
    --prodigy_selection 'A,B' \
    --n_samples 50 \
    -resume

# Analyze results
python analyze_prodigy_results.py \
    --input protein_designs/*/prodigy/*_summary.csv \
    --output analysis_report.html \
    --threshold -10.0

Next Steps


Combine Analyses

Use both PRODIGY and ipSAE for comprehensive design evaluation:

nextflow run ... --run_prodigy --run_ipsae