PRODIGY Binding Affinity Prediction¶
Overview¶
The pipeline includes optional PRODIGY (PROtein binDIng enerGY prediction) analysis for evaluating the predicted binding affinity of protein-protein complexes generated by Boltzgen.
What is PRODIGY?
PRODIGY is a fast, structure-based binding affinity predictor developed by the Bonvin lab (Utrecht University). It uses interface properties to estimate binding free energy (ΔG) and dissociation constant (Kd) from structural information.
Key Metrics¶
PRODIGY analyzes several interface properties:
| Metric | Description | Interpretation |
|---|---|---|
| ΔG | Binding free energy (kcal/mol) | More negative = stronger binding |
| Kd | Dissociation constant (M) | Lower = tighter binding |
| BSA | Buried Surface Area (Ų) | Larger = more extensive interface |
| ICs | Interface Contacts | More = more interactions |
| Charged % | Charged residues in interface | Electrostatic contribution |
| Apolar % | Apolar residues in interface | Hydrophobic contribution |
Enabling PRODIGY¶
Add the --run_prodigy flag to enable analysis:
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--run_prodigy \
--outdir results
Chain Selection¶
Output Structure¶
PRODIGY generates detailed results for each final ranked design:
results/
└── {sample_id}/
└── prodigy/
├── design_1_prodigy_results.txt # Full output
├── design_1_prodigy_summary.csv # Parsed metrics
├── design_2_prodigy_results.txt
├── design_2_prodigy_summary.csv
└── ...
Raw Results Format¶
The *_prodigy_results.txt files contain complete PRODIGY output:
[+] Reading structure file: design_1.cif
[+] Parsed structure file design_1 (1 model(s))
[+] Setting selection: A,B
[+] Found 2 chains in structure: A, B
[+] Calculating buried surface area...
[+] Buried Surface Area: 1543.21 A^2
[+] Number of interface contacts (ICs): 89
[+] Number of non-interacting surface residues: 34
[+] Number of charged residues in ICs: 15
[+] Percentage of charged residues in ICs: 16.85%
[+] Number of apolar residues in ICs: 48
[+] Percentage of apolar residues in ICs: 53.93%
[+] Predicted binding affinity (ΔG): -11.2 kcal/mol
[+] Predicted dissociation constant (Kd): 5.4e-09 M at 25.0°C
Parsed Summary Format¶
The *_prodigy_summary.csv provides machine-readable metrics:
sample_id,design_file,delta_g,kd,temperature,bsa,ics,charged_residues,charged_percentage,apolar_residues,apolar_percentage
sample1,design_1.cif,-11.2,5.4e-09,25.0,1543.21,89,15,16.85,48,53.93
Interpreting Results¶
Binding Affinity (ΔG)¶
Strong Binders
ΔG < -10 kcal/mol: Excellent predicted affinity
Typical of high-affinity antibodies and tight binders
Good Binders
-10 < ΔG < -8 kcal/mol: Good predicted affinity
Suitable for many therapeutic applications
Weak Binders
ΔG > -8 kcal/mol: Moderate to weak affinity
May require optimization
Dissociation Constant (Kd)¶
The Kd value provides an alternative measure of binding strength:
| Kd Range | Binding Strength | Example |
|---|---|---|
| < 1 nM | Extremely tight | High-affinity antibodies |
| 1-10 nM | Very strong | Therapeutic antibodies |
| 10-100 nM | Strong | Many functional binders |
| 100-1000 nM | Moderate | Weak binders |
| > 1 μM | Weak | May not be functional |
Interface Properties¶
Balanced Interfaces
Good binders typically show:
- BSA: 1200-2000 Ų
- ICs: 60-120 contacts
- Charged: 10-25% (electrostatic interactions)
- Apolar: 40-60% (hydrophobic core)
Comparing Designs¶
Use PRODIGY results to rank and select designs:
Example Analysis Workflow¶
import pandas as pd
import matplotlib.pyplot as plt
# Load all PRODIGY summaries
summaries = []
for f in Path('results/sample1/prodigy/').glob('*_summary.csv'):
df = pd.read_csv(f)
summaries.append(df)
combined = pd.concat(summaries)
# Rank by binding affinity
ranked = combined.sort_values('delta_g')
print("Top 5 designs by predicted affinity:")
print(ranked[['design_file', 'delta_g', 'kd', 'bsa']].head())
# Plot affinity distribution
plt.figure(figsize=(10, 6))
plt.hist(combined['delta_g'], bins=20, color='#9C27B0', alpha=0.7)
plt.xlabel('ΔG (kcal/mol)')
plt.ylabel('Number of Designs')
plt.title('Distribution of Predicted Binding Affinities')
plt.axvline(-10, color='red', linestyle='--', label='Strong binding threshold')
plt.legend()
plt.savefig('affinity_distribution.png', dpi=300)
Quick Command-Line Analysis¶
# Find top 10 designs by affinity
cat results/*/prodigy/*_summary.csv | \
grep -v "sample_id" | \
sort -t',' -k3,3n | \
head -10 | \
column -t -s','
# Count designs with ΔG < -10 kcal/mol
cat results/*/prodigy/*_summary.csv | \
awk -F',' '$3 < -10 {count++} END {print "Strong binders:", count}'
Limitations¶
Important Considerations
PRODIGY predictions are computational estimates. Always consider:
- Structure quality: PRODIGY assumes high-quality input structures
- Experimental validation: Predicted values should guide, not replace, experiments
- Context-dependent: Predictions don't account for solvent, pH, or cellular environment
- Relative ranking: Better for comparing designs than absolute predictions
When PRODIGY Works Best¶
✅ Protein-protein interfaces
✅ Binary complexes
✅ Well-defined binding sites
✅ Comparative analysis of designs
When to Be Cautious¶
⚠️ Flexible/disordered regions
⚠️ Multi-chain complexes (>2 chains)
⚠️ Small interfaces (<800 Ų BSA)
⚠️ Unusual binding modes
References¶
Citation
PRODIGY: a web server for predicting the binding affinity of protein-protein complexes
Vangone A, Bonvin AMJJ. Bioinformatics (2015)
doi:10.1093/bioinformatics/btv038
Additional resources:
Integration Example¶
Complete pipeline run with PRODIGY:
# Run full design workflow with affinity prediction
nextflow run seqeralabs/nf-proteindesign \
-profile docker \
--input samplesheet.csv \
--outdir protein_designs \
--run_prodigy \
--prodigy_selection 'A,B' \
--n_samples 50 \
-resume
# Analyze results
python analyze_prodigy_results.py \
--input protein_designs/*/prodigy/*_summary.csv \
--output analysis_report.html \
--threshold -10.0
Next Steps¶
- Learn about ipSAE scoring for complementary analysis
- Explore output files organization
- See examples of complete workflows