Bill Tatsis, Matt Seddon, Dan Mason, Dan O’Donovan, Gio Cincilla, Azedine Zoufir and Nath Brown
Why?
A common computational task we perform in drug design involves exploring the chemical space around an ‘initial’ hit molecule. Several commercial packages, like OpenEye, Cresset, and Schrödinger, are available for this purpose. Recently, an open-source alternative called VSFlow [1] has been introduced, which facilitates the preparation of a commercial compound database and provides three different ligand-based virtual screening modes.
Here at Healx, we have developed an end-to-end computational toolbox for 3D virtual screening based on the shape and electrostatics similarity to a reference (hit) compound. A drug designer can use a reference molecule and a (commercial) compound library of interest as inputs. The final output is a set of chemically diverse compounds with high 3D similarity to the reference compound.
What?
In Lig3DLens (L3DL) we are employing RDKit [2] to generate the 3D conformers of the library compounds and align them to the reference compound. We use the ESPSim package [3] to include an electrostatics term in the 3D scoring function. Additionally, we provide a tool for clustering the top-ranking hits obtained from 3D virtual screening (VS) and selecting a final set of compounds to purchase and test.
L3DL consists of three modules: i) preparing a commercial library that will be used as the chemical atlas, ii) generating the 3D conformers and overlaying them to the reference molecule using one of RDKit’s shape alignment method (rdMolAlign) [4], iii) clustering the top ranking compounds and selecting a predefined number of compounds that can be ordered and tested in an assay relevant to the project.
Please note that this technical walk-through focuses on the code’s functionality, and we do not delve into theoretical insights behind the methods employed in this computational toolbox.
Lig3DLens’ open source code can be found here.
How?
1. Compound library preparation
At this step, two main tasks are performed: i) Standardising the chemical structures in the compound library and ii) filtering compounds based on predefined ranges in the physicochemical property space. For the pre-processing of molecular structures, we employ the datamol [5] package. The user can provide input files in sd, csv, or ascii formats, which should contain the SMILES and IDs of the compounds. Additionally, the code allows to filter out compounds with undesired physicochemical properties. The ranges for these properties can be defined in the input yaml file. The resulting output is a sd file containing the 2D structures and any ID columns found in the input file.