The Cambridge Crystallographic Data Centre (CCDC) is revolutionizing the field of structural chemistry by harnessing the power of Intel and AWS. The CCDC is a nonprofit organization that collates and organizes the Cambridge Structural Database (CSD)—a catalog of over 1.1 million enhanced small molecule structures. The CSD serves as a vital resource for pharmaceutical development, providing insight into small molecule structural features and properties. By collaborating with Intel and AWS, the CCDC has transformed its data curation workflow, significantly reducing processing time and compute costs. Researchers can now access predictions of protonation states in protein structures, saving countless hours of life sciences research worldwide.
One of the primary challenges faced by the CCDC was keeping up with the ever-expanding Protein Data Bank, which continuously adds new protein structures. Automatically interpreting these structures, particularly determining the positions of hundreds of hydrogen atoms, was a daunting task. Working with Intel and AWS, the CCDC leveraged Intel's high-performance processors and AWS's scalable cloud infrastructure to develop a flexible curation workflow. This collaborative effort enabled the creation of a public dataset and accurate binding site examples, benefiting medicinal chemists and significantly reducing the time and cost of small-molecule drug development.
The collaboration not only streamlined data curation but also set the stage for ongoing improvements to the Cambridge Structural Database.