The task is to predict a target compound’s property from its molecular structure. These property prediction models can be used to test different molecules and identify candidate molecules with high activity against SARS-CoV-2, which will then be tested in the labs. Note that the predictor will be applied to safe compounds (e.g. FDA-approved drugs).
We are experimenting with graph-based methods, fingerprint based methods and their combinations. The code for all our work can be found here.
SMILES strings of molecules are converted to 2d fingerprints. In simple words, fingerprints are kernel functions that are applied over molecules to extract their features and hash them into a bit vector or count vector. These vectors act as a representation of the molecules which is used as input to binary classifiers. Predictions are made based on voting using probabilities obtained from ensemble of classifiers like Multi Layer Perceptron, Random Forests etc.
We convert the SMILES strings into graphs with atoms as nodes and bonds as edges. Each node has a set of features like atomic-number, chirality, etc. and each node has a set of features like singlebond, double-bond, whether it is in an aromatic ring or not, etc. We leverage the recent advances in graph neural networks where we can input graphs of different structures and classify them as active or inactive.
Here, we make use of both representations of the molecules discussed above. We convert SMILES string into graphs and also extract molecular properties like fingerprints and its descriptors. We train Graph Convolutional Network (GCNs) to first convert molecular graph to fixed length feature representation. A GCN layer essentially performs message passing on all the neighboring atoms and bonds and then applies a fully-connected layer, thereby encoding local chemical information. Once we get these feature representations, we concatenate it with molecule’s fingerprints and descriptors and give them as input to the binary classifier which is a simple Feedforward Neural Network. We combine these two models and train them simultaneously.
• Neural Message Passing for Quantum Chemistry, Justin Gilmer, Samuel S. Schoenholz, Patrick
F. Riley, Oriol Vinyals, George E. Dahl
• RDKit: Open-source cheminformatics, Greg Landrum