FindTargetsWEB: A User-Friendly Tool For Identification Of .

Transcription

TECHNOLOGY AND CODEpublished: 04 July 2019doi: 10.3389/fgene.2019.00633FindTargetsWEB: A User-FriendlyTool for Identification of PotentialTherapeutic Targets in MetabolicNetworks of BacteriaThiago Castanheira Merigueti 1, Marcia Weber Carneiro 4, Ana Paula D’A. Carvalho-Assef 3,Floriano Paes Silva-Jr 2* and Fabricio Alves Barbosa da Silva 1*Scientific Computing Program–Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, Brazil, 2 Laboratory of Experimentaland Computational Biochemistry of Drugs (LaBECFar), Oswaldo Cruz Institute–Oswaldo Cruz Foundation (FIOCRUZ), Riode Janeiro, Brazil, 3 Research Laboratory in Hospital Infection (LAPIH), Oswaldo Cruz Institute–Oswaldo Cruz Foundation(FIOCRUZ), Rio de Janeiro, Brazil, 4 Graduate Program in Biotechnology for Health and Investigative Medicine–Oswaldo CruzFoundation (FIOCRUZ), Bahia, Brazil1Edited by:Helder Nakaya,University of São Paulo,BrazilReviewed by:Priyanka Baloni,Institute for Systems Biology (ISB),United StatesLeandro Marcio Moreira,Universidade Federal de Ouro Preto,Brazil*Correspondence:Floriano Paes Silva-Jrfloriano@ioc.fiocruz.brFabricio Alves Barbosa da Silvafabricio.silva@fiocruz.brSpecialty section:This article was submitted toSystems Biology,a section of the journalFrontiers in GeneticsReceived: 27 March 2019Accepted: 17 June 2019Published: 04 July 2019Citation:Merigueti TC, Carneiro MW,Carvalho-Assef APD’A,Silva-Jr FP and Silva FAB (2019)FindTargetsWEB: A User-FriendlyTool for Identification of PotentialTherapeutic Targets in MetabolicNetworks of Bacteria.Front. Genet. 10:633.doi: 10.3389/fgene.2019.00633Frontiers in Genetics www.frontiersin.orgBackground: Healthcare-associated infections (HAIs) are a serious public health problem.They can be associated with morbidity and mortality and are responsible for the increasein patient hospitalization. Antimicrobial resistance among pathogens causing HAI hasincreased at alarming levels. In this paper, a robust method for analyzing genome-scalemetabolic networks of bacteria is proposed in order to identify potential therapeutictargets, along with its corresponding web implementation, dubbed FindTargetsWEB.The proposed method assumes that every metabolic network presents fragile geneswhose blockade will impair one or more metabolic functions, such as biomassaccumulation. FindTargetsWEB automates the process of identification of such fragilegenes using flux balance analysis (FBA), flux variability analysis (FVA), extended SystemsBiology Markup Language (SBML) file parsing, and queries to three public repositories,i.e., KEGG, UniProt, and DrugBank. The web application was developed in Pythonusing COBRApy and Django.Results: The proposed method was demonstrated to be robust enough to process evennon-curated, incomplete, or imprecise metabolic networks, in addition to integrated hostpathogen models. A list of potential therapeutic targets and their putative inhibitors wasgenerated as a result of the analysis of Pseudomonas aeruginosa metabolic networksavailable in the literature and a curated version of the metabolic network of a multidrugresistant P. aeruginosa strain belonging to a clone endemic in Brazil (P. aeruginosaST277). Genome-scale metabolic networks of other gram-positive and gram-negativebacteria, such as Staphylococcus aureus, Klebsiella pneumoniae, and Haemophilusinfluenzae, were also analyzed using FindTargetsWEB. Multiple potential targets havebeen found using the proposed method in all metabolic networks, including someoverlapping between two or more pathogens. Among the potential targets, several havebeen previously reported in the literature as targets for antimicrobial development, andmany targets have approved drugs. Despite similarities in the metabolic network structure1July 2019 Volume 10 Article 633

Merigueti et al.FindTargetsWEB: Identification of Therapeutic Targetsfor closely related bacteria, we show that the method is able to selectively identify targetsin pathogenic versus non-pathogenic organisms.Conclusions: This new computational system can give insights into the identificationof new candidate therapeutic targets for pathogenic bacteria and discovery of newantimicrobial drugs through genome-scale metabolic network analysis and heterogeneousdata integration, even for non-curated or incomplete networks.Keywords: systems biology, flux balance analysis, metabolic network, COBRA analysis, Python (programminglanguage)BACKGROUNDknockouts where flux balance analysis (FBA) (Orth et al., 2010)is used to assess if this gene deletion is able to halt a selectedfunction of bacterial metabolism. Usually, such function isbiomass production (Rienksma et al., 2014). Other criteria canbe combined to prioritize genes among candidate drug targets,such as existence of druggable pockets (Kozakov et al., 2015) orspecificity to the bacteria as compared to the host proteins.The construction of genome-scale metabolic network is alaborious endeavor. It combines automated steps with manualcuration. The most used protocol, proposed by Thiele and Palsson(2010), lists a total of 94 steps. Nevertheless, the process is errorprone, and normally the resulting network may correctly predictsome phenomena while disregarding others, which are less relevantto the study related to the reconstructed metabolic network.The BioCyc database (Caspi et al., 2015) classifies pathway/genome databases (PGDB), each containing the full genome andpredicted metabolic network of one organism, into three tiers.Tier 1 corresponds to PGDBs that have received at least 1 year ofmanual curation and are updated continuously. Tier 2 includesPGDBs that have received moderate (less than a year) amounts ofreview and are usually not updated on an ongoing basis. Finally,Tier 3 refers to PGDBs that were created computationally andreceived no subsequent manual review or updating.In this work, the same classification for genome-scalemetabolic network models is adopted. The focus here is onmetabolic network models that can be classified as Tier 2 andTier 3, according to the BioCyc database classification. In thismanuscript, draft metabolic reconstructions are considered Tier3 models. Published curated metabolic models are classified asTier 2, unless the model is identified in the literature as Tier 1.Herein, a method for analyzing genome-scale metabolicnetworks of bacteria is proposed in order to identify potentialtherapeutic targets, along with its corresponding webimplementation, dubbed FindTargetsWEB. The proposedmethod is computationally efficient, user-friendly, and robust toerrors in reconstructed genome-scale metabolic networks, whichare more frequent in Tier 3 (draft) metabolic networks. The webinterface of the application is straightforward, and results are sentdirectly to an email address informed by the user. To demonstratethe flexibility of FindTargetsWEB, 10 genomic-scale metabolicnetworks of bacterial strains are analyzed in this paper. Nine ofthe 10 networks are available in the literature, all classified asTier 2 models in this work: P. aeruginosa PAO1—version 2008(Oberhardt et al., 2008), P. aeruginosa PAO1—version 2017Healthcare-associated infections (HAIs), previously called hospitalinfections, are a serious public health problem and can developeither as a direct result of medical or surgical treatment or frombeing in contact with a healthcare setting. These infectionsinclude central line-associated bloodstream infections, catheterassociated urinary tract infections, ventilator-associated pneumonia(VAP), and surgical site infections. Among the pathogens relatedto HAI, the group of bacteria is the one that stands out. Morethan 2 million HAIs occur each year in the USA (Stone et al.,2005), with 50–60% being caused by antimicrobial resistantbacteria. In 2014, the World Health Organization (WHO)published the report “Antimicrobial resistance: global report onsurveillance” (WHO, 2014) warning of the growing increase inantimicrobial resistance in the world. Antimicrobial resistanceamong hospital pathogens has increased at alarming levels,both in developed and developing countries. It is estimated thatthere will be a worldwide spread of untreatable infections bothinside and outside hospitals. According to a bulletin publishedin 2017 by WHO (WHO, 2017), there are 12 major antibioticresistant bacteria that deserve attention and urgently need moreresearch and development (R&D) of new and effective antibiotictreatments. Gram-negative bacteria are the most involved in HAI(carbapenem-resistant Acinetobacter baumannii, Pseudomonasaeruginosa, and Enterobacteriaceae family), and R&D on newantibiotics against these is considered to be of critical priority(WHO, 2017). In humans, P. aeruginosa is an opportunisticpathogen that causes severe infections in immunocompromisedindividuals. This pathogen is the main cause of morbi-mortalityin patients with cystic fibrosis (Kerr and Snelling, 2009) and is amajor cause of VAP.Given the potential severity of multidrug-resistant bacteriaand the lack of treatment options, the identification andimplementation of effective strategies to prevent such infectionsare urgent priorities.The integration of mathematical, statistical, and computationalmethods for biological data analysis to enable the discovery ofnew therapeutic targets for any bacteria is extremely relevant. Thecombination of bioinformatics, system modeling, and heterogeneousdata integration can be a powerful tool for this purpose.Several strategies have been proposed to search for drugtargets from genome-scale models of bacterial metabolism.More often, essential genes are identified from single virtualFrontiers in Genetics www.frontiersin.org2July 2019 Volume 10 Article 633

Merigueti et al.FindTargetsWEB: Identification of Therapeutic Targets(Bartell et al., 2017), P. aeruginosa PA14 (Bartell et al., 2017),Klebsiella pneumoniae (Liao et al., 2011), Haemophilus influenzae(Schilling and Palsson, 2000), a host-pathogen genome-scalereconstruction based on the Mycobacterium tuberculosismetabolic network (Bordbar et al., 2010), Staphylococcus aureus(Becker and Palsson, 2005), and Pseudomonas putida (Puchałkaet al., 2008). Results are also presented for two metabolic networksof P. aeruginosa CCBH4851, which is a multi-drug resistant strainbelonging to a clone endemic in Brazil (P. aeruginosa ST277)(Silveira et al., 2014). Both reconstructions of P. aeruginosaCCBH4851 were made by our group. One reconstruction can beclassified as Tier 3, and the other is the corresponding curatedversion, classified as Tier 2.The web application proposed in this work combines FBA,flux variability analysis (FVA) (Gudmundsson and Thiele, 2010),extended Systems Biology Markup Language (SBML) parsing,and heterogeneous data integration in order to identify themost promising therapeutic targets. All SBML files processedin this work are available as Supplementary Material. Theunderlying hypothesis related to FVA is that reactions whichthe maximum flux is equal to the minimum flux (i.e., flux rangeequal to zero), given the optimal biomass production, are lessrobust to potential perturbations. Indeed, a high rigidity for agiven reaction flux (i.e., flux range equal to zero) may indicatethat the flux through this reaction is crucial for sustainingoptimal growth, while a lower rigidity (i.e., flux range greaterthan zero) indicates that there might be alternate pathways tocarry the reaction flux (Oberhardt et al., 2010). Flux rangesfell into three categories: i) inflexible fluxes (flux range equalto zero), ii) fluxes with bounded flexibility (flux range greaterthan zero, but bounded), and iii) infinitely flexible fluxes (fluxrange greater than zero, unbounded). The FVA analysis carriedout by FindTargetsWEB aims to identify potential targetsassociated with inflexible fluxes, i.e., flux range equal to zero.The genome-scale metabolic network analysis is combined withseveral queries to multiple public repositories, such as KEGG(Ogata et al., 1999), UniProt (UniProt, 2018), and DrugBank(Wishart et al., 2008), to assess the druggability and toxicologyof potential targets. FindTargetsWEB has identified potentialtargets for all networks. Several of the potential targets havebeen described in the literature. Other targets are candidates forfuture experimental investigation.methods are widely used for genome-scale modeling ofmetabolic networks in prokaryotes and eukaryotes. The COBRAToolbox for MATLAB is a leading software package for analyzingmetabolism on a genomic scale. On the other hand, COBRApy(Ebrahim et al., 2013) is a Python module that provides supportfor basic COBRA methods. COBRApy is designed in an objectoriented way, which facilitates the representation of the complexbiological processes of metabolism. COBRApy does not requireMATLAB to work; however, it includes an interface to theCOBRA Toolbox for MATLAB to facilitate the use of legacycodes. To improve performance, COBRApy includes parallelprocessing support for computationally intensive processes.FindTargetsWEB is implemented as a web application. Therefore,the user only needs a web browser to access the system. The systeminterface is intuitive: the user needs to provide the SBML filedescribing the metabolic network reconstruction, the organismspecies associated with the metabolic network reconstruction,which defines a filter to KEGG queries, and information suchas name and e-mail address (Figure 1). It should be emphasizedthat the FindTargetsWEB list of analyzable species is easilyexpandable and can include both gram-negative bacteria, grampositive bacteria, and bacteria that cannot be classified as eithergram-positive or gram-negative. In the following screen, theuser decides if he/she wants to analyze the network using theFBA method alone or a combination of the FBA FVA methods(Figure 2). The FBA FVA method pinpoints reactions andassociated genes in which knockout completely stops (zeroes)biomass generation and has an FVA range of zero. Therefore, theFBA FVA method is more restrictive than the FBA-only option.It should be highlighted that the targets found by the FBA FVAmethod compose a proper subset of the set of targets found bythe FBA-only method. Robustness is provided by the design ofthe method itself, as described in the following paragraphs.IMPLEMENTATIONSome of the main requirements related to the implementationof the general method described in this work, dubbedFindTargetsWEB, were ease of use, availability, robustness,and performance. After careful consideration, Python wasselected as the implementation language. Python is a high-level,interpreted, scripted, imperative, object-oriented, dynamic, andstrongly typed programming language created by Van Rossumand Drake (2003). Its many advantages favor the fulfillment ofthe main requirements of the application. Another advantageis the availability of the COBRApy package. COnstraint-BasedReconstruction and Analysis Toolbox (COBRA) (Hyduke et al. 2011)Frontiers in Genetics www.frontiersin.orgFIGURE 1 FindTargetsWEB user interface—SBML file input.3July 2019 Volume 10 Article 633

Merigueti et al.FindTargetsWEB: Identification of Therapeutic Targets1. Validation of the SBML file describing the genome-scalemetabolic network—In this step, the system first createsa table containing gene/reaction/metabolite data obtainedfrom the SBML file and then checks if the metabolic networkreconstruction generates biomass. This is done through theFBA method, considering the biomass reaction as the targetfor maximization. If the biomass value is zero, the systemoutputs an error to the user and halts processing. If themaximum flux of the biomass reaction is greater than zero,the workflow proceeds to the next step.2. Use of FVA to filter reactions—After validating the metabolicnetwork, reactions are filtered using the FVA method, if theuser has decided to analyze the metabolic network using acombination of the FBA FVA methods. The objective is toconsider, in the following processing steps, those reactionswhich the range of possible flux values is equal to zero, giventhe optimal biomass generation value determined in theprevious step. The underlying assumption is that reactions witha range equal to zero are less robust, i.e., more susceptible toperturbations, as stated in the introduction. Note that the FVAmethod can be implemented in a computationally efficientway (Gudmundsson and Thiele, 2010), and the cost of FVAanalysis on the overall execution time of FindTargetsWEB isnegligible.3. Simulation of reaction knockout—In this step, single reactionknockouts are performed. The process is done by zeroingthe maximum and minimum reaction flux constraints andrunning FBA again, for each reaction in the network. If biomassgeneration is zeroed when knocking out a given reaction, itsinformation is stored in a list for further processing. If geneIDs are available in the SBML file, the workflow proceeds tostep 4. Otherwise, it jumps directly to step 6b.4. Simulation of gene knockout—In this step, the systemperforms single knockouts for each gene described in themodel, where the COBRApy framework queries the reactionsthat are linked to the selected gene and zeroes the minimumand maximum value of each reaction bound to the gene,taking into account gene-protein-reaction (GPR) relations. Inthe same way as the previous step, if the value of the generationof biomass has zeroed, the corresponding gene information isstored in a second list. It is worth noting that one gene can beassociated with more than one reaction, and one reaction mayrequire the expression of several genes.5. Consolidation/unification of knockouts results—In thisstep, both lists generated in the previous steps are unified,i.e., the list of reactions generated in step 3 and the gene listgenerated in step 4. In order to a gene to be included in the finallist, it should be included in the list of step 4 and be associatedwith at least one reaction stored in step 3 (see Algorithm 1).These are the candidate genes that the workflow is goingto consider in the following steps. It should be highlightedthat the final list is filtered according to the FVA processingperformed in step 2, if the option FBA FVA is selected by theuser.Algorithm 1: Consolidation of knockout results (SBML withmapped genes)FIGURE 2 FindTargetsWEB user interface—choice of analysis method.Target identification is carried out through a computationalworkflow that runs the metabolic network analysis and pinpointsgenes whose virtual knockout interrupts the generation ofbiomass. Therefore, the minimum level of curation required fora metabolic network model to be processed by FindTargetsWEBis to have a biomass reaction flux greater than zero. The listof potential targets is filtered using FVA (if the user decidesto do so), and the workflow retrieves possible inhibitors forthe identified genes, verify if such inhibitors are availableas approved drugs, and evaluate their toxicity to humans byquerying several repositories.The workflow was implemented using the Pythonprogramming language, version 3.6.3, and the COBRApyframework version 0.9.0. This framework has the necessarymethods for reading the SBML (Hucka et al., 2015) file thatdescribes the genome-scale metabolic network of the bacteriumunder analysis. The solver used for FBA and FVA analysis isGLPK (https://www.gnu.org/software/glpk/), which is theCOBRApy default solver that is easily deployable on Linuxplatforms. The system is deployed in an Ubuntu v18.04 serverwith 64GB RAM. Prior to processing, when needed, SBMLfiles were converted to the SBML level 3 format using thecommand cobra.io.sbml3.write sbml model fromCOBRA. The SBML files processed in this manuscript wereretrieved from the BioModels repository (Glont et al., 2017)or directly from the supplementary material of the associatedreference. The main steps of the method are described below.The whole method is depicted in Figure 3.Frontiers in Genetics www

strongly typed programming language created by Van Rossum and Drake (2003). Its many advantages favor the fulfillment of the main requirements of the application. Another advantage is the availability of the COBRApy package. COnstraint-Based Reconstruction and Analysis Toolbox