ATOM3D is a unified collection of datasets concerning the three-dimensional structure of biomolecules, including proteins, small molecules, and nucleic acids. These datasets are specifically designed to provide a benchmark for machine learning methods which operate on 3D molecular structure, and represent a variety of important structural, functional, and engineering tasks. All datasets are provided in a standardized format along with a Python package containing processing code, utilities, models, and dataloaders for common machine learning frameworks such as PyTorch. ATOM3D is designed to be a living database, where datasets are updated and tasks are added as the field progresses.
Repository and Python Package: All code for dataset processing, training, and benchmarking can be found at https://github.com/drorlab/atom3d. Our Python package can be installed from source using the Github repository or using Pip:
pip install atom3d.
Documentation: Documentation for our Python package, including loading, creating, and interacting with ATOM3D datasets, is available at atom3d.readthedocs.io.
Paper: Please see our NeurIPS Datasets and Benchmarks paper for further details on the datasets and benchmarks.
ATOM3D currently contains eight datasets, which can be roughly grouped into four categories that represent a wide range of problems, spanning single molecular structures and interactions between biomolecules as well as molecular functional and design/engineering tasks. Click on the corresponding dataset for dataset details and download links.