The article of our employees "The CSD and knowledge databases: from answers to questions" was included in the themed collection of CrystEngComm HOT articles. The publication discusses a number of problems related to machine learning and database screening.
CrystEngComm is a peer-reviewed online scholarly journal of the Royal Society of Chemistry that publishes original research and review articles on crystal engineering, crystal properties and polymorphism, crystalline materials and nanomaterials. The issues of the journal are published two times a month. The Impact Factor of CrystEngComm in 2018 was 3.382.
The article was co-authored by Aleksander Shevchenko, a Senior Research Scientist at the Laboratory of Crystal Chemistry and Crystal Design SCTMS, Roman Eremin, Senior Research Scientist at the Laboratory of Mathematical Modeling of Materials SCTMS, and Prof. Vladislav Blatov, director of the SCTMS.
The article discusses a general scheme for obtaining information about crystal structures from crystallographic databases. This scheme is illustrated by the example of creating a database containing a number of structural descriptors, which reflect the geometric and topological properties of coordination compounds. The initial crystallographic information on 7690 crystal structures was retrieved mainly from the Cambridge Structural Database and processed with the ToposPro software package. The authors used a number of machine learning methods to develop a predictive scheme and proved that the Random Forest method provides the best prediction of the overall topological properties (dimensionality and underlying topology) of coordination networks. The authors have also shown that the developed knowledge database and predictive scheme can be considered as a prototype of an artificial intelligence system, which can be used to answer typical questions that arise in the design of coordination compounds.