Tuberculosis caused by the elusive Mycobacterium tuberculosis is responsible for 10 million cases and 1.5 million deaths around the world making it one of the deadliest pathogens known to man. Patient symptoms include persistent cough, chest pain, weakness and fatigue with around 25% of the world population carrying the bacteria but only 5-15% manifesting the disease and its symptoms WHO 2020 Report.
The BCG vaccine is known to have an efficacy of only 50% and the drugs currently available require long dosage periods, have multiple side effects and presence of multidrug and total drug resistant strains have rendered most of them ineffective (Colditz et al). MDR strains or multi-drug resistant strains are those that are resistant to first line agents- Isoniazid (INH) and Rifampicin (RIF) and XDR strains are extensively resistant strains that are not only MDR are also resistant to any fluoroquinolone and to at least one of the injectable second-line drugs: kanamycin, capreomycin or amikacin ( Zhang et al.). Moreover, total drug resistant strains are those that show resistance to almost all laboratory strains tested are emerging all over the world, especially in TB endemic regions (World Health Organization (2010). Treatment of Tuberculosis: Guidelines. Geneva.).
Drug discovery is a long and arduous process that takes decades to bring a new lead compound from bench to bedside. New effective drugs become increasingly important with rising antibiotic resistant strains. The goal of this project is to provide a bioinformatic tool to aid in discovering a potent drug molecule against a protein of interest using the ChEMBL database. The project will report the Lipinski descriptors of all target compounds and using these descriptors identify and classify the active from molecules with high statistical significance.
The mycolyl-arabinogalactan-peptidoglycan (mAGP) complex represents the cell wall core structure that is a unique distinguishing feature of the Mycobacterial bacilli. Our protein of interest, the 3-oxoacyl-[acyl-carrier-protein] synthase III catalyzes the condensation reaction of fatty acid synthesis by the addition to an acyl acceptor of two carbons from malonyl-ACP. It catalyzes the first condensation reaction which initiates fatty acid synthesis and may therefore play a role in governing the total rate of fatty acid production. Its substrate specificity is critical for the biosynthesis of mycolic acid fatty acid chain and it presents itself as an attractive protein to target for identifying new drug molecules. (National Center for Biotechnology Information (2022) https://pubchem.ncbi.nlm.nih.gov/protein/P9WNG3., Scarsdale et. al.)
Methodology -
The ChEMBL database is our main source of list of potential drug compounds against the target molecule. This database has been accessed via the official ChEMBL websource client. Query searches can be screened specifically for accessing all target proteins listed for Mycobacterium tuberculosis. We then identify ChEMBL ID of our protein of interest and retrieve bioactivity data filtering searches to keep only those molecules that have a reported IC50. IC50 is the half maximally inhibitory concentration of the drug that is a standard in determining potency of drug moleclues.
The drug molecules are classified into active, inactive and intermediate based on their IC50. These values can be adjusted based on the determined IC50 threshold for active compounds. In our case study, compounds that have IC50 less than 1000nM is considered to have a bioactive IC50. Then a new dataframe was made with the canonical smiles, bioactivity and IC50 values giving us a list of potential drug targets. We then list the Lipinski discriptors for each molecule which consist of the molecular weight, IC50, LogP, Number of H donors and number of H acceptors. IC50 values go upto 10^-9 M and have large variation. To normalize values we convert IC50 to pIC50 and eleminate compounds with intermediate IC50 from our dataframe to simplify further analysis.
Exploratory data analysis in the chemical space is performed in the next part. The discriptors are plotted to analyse features of each discriptor corresponding to the active and inactive compounds. We then perform the Mann-Whitney U test to determine statistic significance of bioactivity in active compounds. The test is repeated for each individual discriptor to see if individual discriptors also significantly contribute to activity. This excerxise finally gives us a list of compounds that should show activity against 3-oxoacyl-[acyl-carrier-protein] synthase III based on its lipinski discriptors