Protein-Protein Interfaces (PPI)
- Impact: Proteins interact with each other in many scenarios—for example, our antibody proteins recognize diseases by binding to antigens. A critical problem in understanding these interactions is to identify which amino acids of two given proteins will interact upon binding.
- Dataset description: For training, we use the Database of Interacting Protein Structures (DIPS), a comprehensive dataset of protein complexes mined from the PDB (Townshend et al., 2019). We predict on the Docking Benchmark 5 (Vreven et al., 2015), a smaller gold standard dataset.
- Task: We predict if two amino acids will come into contact when their respective proteins bind.
- Splitting criteria: We split protein complexes by sequence identity at 30%.
- Downloads: The full dataset, split data, and split indices are available for download via Zenodo (doi:10.5281/zenodo.4911102)
Townshend, R. J. L., Bedi, R., et al. (2019). Advances in Neural Information Processing Systems (Vol. 32, pp. 15616–15625).
Vreven, T., Moal, I. H., et al. (2015). Journal of Molecular Biology, 427(19), 3031–3041.
This dataset is licensed under a Creative Commons CC-BY license.