ATOM3D - Tasks on Molecules in Three Dimensions


A collection of benchmark datasets for machine learning in structural biology

View the Project on GitHub drorlab/atom3d


ATOM3D: Tasks on Molecules in Three Dimensions

ATOM3D is a unified collection of datasets concerning the three-dimensional structure of biomolecules, including proteins, small molecules, and nucleic acids. These datasets are specifically designed to provide a benchmark for machine learning methods which operate on 3D molecular structure, and represent a variety of important structural, functional, and engineering tasks. All datasets are provided in a standardized format along with a Python package containing processing code, utilities, models, and dataloaders for common machine learning frameworks such as PyTorch. ATOM3D is designed to be a living database, where datasets are updated and tasks are added as the field progresses.


ATOM3D currently contains eight datasets, which can be roughly grouped into four categories that represent a wide range of problems, spanning single molecular structures and interactions between biomolecules as well as molecular functional and design/engineering tasks. Click on the corresponding dataset for dataset details and download links.