IMMUNOINFORMATICS: Bioinformatics Challenges In

Transcription

Bioinformatics 1 -- Lecture 22IMMUNOINFORMATICS:Bioinformatics Challenges in ImmunologyMost slides courtesy ofJulia Ponomarenko, San Diego Supercomputer CenterorOliver Kohlbacher, WSI/ZBIT, Eberhard-KarlsUniversität Tübingen

The Immune Reaction2

Vaccines have beenmade for 36 of 400human pathogensImmunological Bioinformatics, 2005 TheMIT press.4 HPV & Rotavirus

Epitope Quantum Surfaceunit of immunityon which an antibody binds Comprises Linearantigenic matterepitope consists of a short AA sequence Conformationalepitope depends on tertiary structure.4

Two branches of immunityAdaptiveimmunityInnateimmunity Recognizes molecules, called pathogenassociated molecular patterns ( 103), orPAMPs shared by groups of relatedmicrobes; e.g. LPS from the gram-negative cellwall, RNA from viruses, flagellin, and glucans fromfungal cell walls. Recognizes epitopes; that are specificB- and T-cell recognition sites onantigens. Is antigen-specific. 3 to 10 days response. Involves the following: Is antigen-nonspecific. Immediate or within several hours response. Involves body defense cells that havepattern-recognition receptors:antigen-presenting cells (APCs) such asmacrophages and dendritic cells; antigen-specific B-lymphocytes ( 109); Leukocytes: neutrophils, eosinophils, basophilsand monocytes; antigen-specific T-lymphocytes ( 1012); and cells that release inflammatory mediators:macrophages and mast cells; cytokines. natural killer cells (NK cells); and complement proteins and cytokines.Does not improve with repeated exposure toa given infection. Improves with repeated exposure andbecomes protective.3

Components of the immune system6

5

B and T cells recognize different epitopes of the same proteinT-cell epitopeB-cell epitopeDenatured antigenNative or denatured (rare) antigenLinear (often) peptide 8-37 aaSequential (continuous) orconformational (discontinuous)Internal (often)Binding to T cell receptor:Kd 10-5 – 10-7 M (low affinity)Slow on-rate, slow off-rate(once bound, peptide may stayassociated for hours to manydays)Accessible, hydrophilic, mobile,usually on the surface or could beexposed as a result ofphysicochemical changeBinding to antibody:Kd 10-7 – 10-11 M (high affinity)Rapid on-rate, variable off-rate11

B cell (magenta, orange) and T cell epitopes (blue, green, red)of hen egg-white lysozyme10PDB:1dpx

MHC class I pathwayIntracellular pathogen(virus, mycobacteria)Cytosolic proteinProteasomePeptidesCD8epitopeTAPERERMHC ITCRCD8Any cellCTL(TCD8 )13

Xenoreactive Complex AHIII 12.2 TCRbound to P1049 (ALWGFFPVLS) /HLAA2.1MHC class I pathwayIntracellular pathogen(virus, mycobacteria)T-CellReceptorCytosolic C ITCRCD8Any cellCTL(TCD8 )MHCclass Iβ-2Microglobulin141lp9

MHC class I pathwayIntracellular pathogen(virus, mycobacteria)Bioinformatics approachesat epitope prediction:Cytosolic protein(1) Prediction of proteosomal cleavagesites (several methods exist basedon small amount of in vitro data).ProteasomePeptidesCD8epitopeTAPERERMHC ITCRCD8Any cell(2) Prediction of peptide-TAP binding(ibid.).(3) Prediction of peptide-MHCbinding.(4) Prediction of pMHC-TCR binding.CTL(TCD8 )15

MHC class I epitope prediction: Challenges High rate of pathogen mutations. Pathogens evolve to escape: Proteosomal cleavage (HIV); TAP binding (HIV, HSV type I); MHC binding. MHC genes are highly polymorphic (2,292 human alleles/1,670 - 2 yeasago). MHC polymorphism is essential to protect the population from invasionby pathogens. But it generates problem for epitope-based vaccinedesign: a vaccine needs to contain a unique epitope binding to eachMHC allele. Every normal (heterozygous) human expresses six different MHC class Imolecules on every cell, containing α-chains derived from the twoalleles of HLA-A, HLA-B, HLA-C genes that inherited from the parents. Every human has 1012 lymphoid cells with a T-cell receptors repertoireof 107 , depending on her immunological status (vaccinations, diseasehistory, environment, etc.).16

Prediction of MHC class I binding peptide – potentialepitopesALAKAAAAMALAKAAAAN MHC allele or allele supertype (similar in sequencesalleles bind similar peptides) specific.ALAKAAAAT Peptide length (8-, 9-, 10-, 11-mers) specific.GMNERPILT Sequence-based approaches:ALAKAAAAVGILGFVFTMTLNAWVKVV Gibbs sampling (when the training peptides areof different lengths)KLNEPVLLL Hidden Markov Models (ibid.)AVVPFIVSV Sequence motifs, position weight matricesPeptidesknown tobind to theHLA-A*0201molecule. Artificial Neural Networks (require a largenumber of training examples) SVM*17

Performance measures for prediction methodsPredicted score(binding affinityvalue)TPFPScoreFN thresholdTNTrue positive rate, TP / (TP FN)ROC curveTP FN – actual binders (based on1.00.80.5AUCorAROC0.3000.20.40.60.81False positive rate, FP / (FP TN)a defined threshold on binding affinity values)TN FP – actual non-binders (ibid.)Sensitivity TP / (TP FN) 6/7 0.86Specificity TN / (TN FP) 6/8 0.7518

Performance measures for prediction methods(cont)Predicted score pi(binding affinityvalue)Pearson’s correlationcoefficientTPFPScoreFN thresholdTNTP FN – actual binders (based ona defined threshold on actual binding affinity values ai )TN FP – actual non-binders (ibid.)Sensitivity TP / (TP FN) 6/7 0.86Specificity TN / (TN FP) 6/8 0.7519

MHC class II pathwayMHC class I pathwayIntracellular pathogen(virus, mycobacteria)Extracellular proteinEndosomeCytosolic proteinProteasome?PeptidesCD8epitopeTAPERCD4 epitopeERMHC ITCRTCRCD8Any cellMHC IICTL(TCD8 )CD4TCD4 29B-cell, macrophage, or dendritic cell

Complex Of A Human TCR, Influenza HAAntigen Peptide (PKYVKQNTLKLAT) and MHCClass IIMHC class I pathwayT-CellIntracellular pathogenReceptor(virus,mycobacteria)MHC class II pathwayExtracellular proteinEndosomeCytosolic proteinProteasomeVβVα?PeptidesMHC class IIERαMHC IITAPCD4epitopeERMHC ITCRMHCTclassCD8 IIβAny cellCD4TCD4 30B-cell, macrophage, or dendritic cell

Complex Of A Human TCR, Influenza HAAntigen Peptide (PKYVKQNTLKLAT) and MHCClass IIEpitope, or antigenic determinant, isdefined as the site of an antigenT-Cellrecognized by immune responseReceptormolecules (antibodies, MHC, TCR)T cell epitope – a short linear peptide orother chemical entity (native ordenatured antigen) that binds MHC(class I binds 8-10 ac peptides; class IIVαbinds11-25acpeptides)andmaybeVβrecognized by T-cell receptor (TCR).Xenoreactive Complex AHIII 12.2 TCRbound to P1049 (ALWGFFPVLS) /HLAA2.1T-CellReceptorVβVαTMHCcell recognition of antigen involvesclass IIcomplexαtertiary“antigen-TCR-MHC”.MHCclass II1fytMHCclass Iββ-2Microglobulin311lp9

MHC class II epitope prediction: Challenges MHC class II genes are as highlypolymorphic as MHC class I (1,012human alleles for today).The repertoire of T-cell receptorsis 107 and depends on anindividual’s immunological status(vaccinations, disease history,environment, etc.). The epitope length 9-37 aa. The peptide may have non-linearconformation. The MHC binding groove is openfrom both sides and it is knownthat residues outside the grooveeffect peptide binding.Complex Of A Human TCR, Influenza HAAntigen Peptide (PKYVKQNTLKLAT) andMHCClass II lass IIMHC IIVαCD4epitopeαTMHCCD4 class II32β cellB-cell, macrophage, or dendritic

MHC class II epitope prediction: ChallengesExtracellular proteinEndosome The processing of MHC class IIepitopes is still a mystery and likelydepends on the antigen structure, thecell type and other factors.?MHC IICD4epitopeTCRCD4TCD4 33B-cell, macrophage, or dendritic cell

22

Prediction of MHC class II binding peptide – potentialepitopes MHC allele or allele supertype (similar in sequences alleles bindsimilar peptides) specific. Predictions for peptides of length 9 aa (the peptide-MHC bindingcore) Sequence-based approaches: Gibbs sampling Sequence motifs, position weight matrices Machine learning: SVM, HMM, evolutionary algorithms35Nielsen et al., Bioinformatics 2004

Benchmarking predictions of peptide binding to MHC II(Wang et al. PLoS Comput Biol. 2007) Data: pairs {peptide – affinity value in terms of IC50 nM} for a given MHC allele 16 different mouse and human MHC class II alleles. 10,017 data points. 9 different methods were evaluated: 6 matrix-based, 2 SVM, 1 QSAR-based. AUC values varied from 0.5 (random prediction) to 0.83, depending on theallele. Comparison with 29 X-ray structures of peptide-MHC II complexes (14different alleles): The success level of the binding core recognition was 21%-62%, with exception of TEPETOPE method (100%) that is based on structural informationand measured affinity values for mutant variants of MHC class II and peptides(Sturniolo et al., Nature Biotechnology, 1999). Structural information together with peptide-MHC binding data shouldimprove the prediction.36

38

26

]27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

Outstanding problems in TAP/Proteosome/MHC peptide prediction High throughput data generation Currently few verified peptide sequences. Overfitting likely Something better than the usual machine learningapproach? Structure-based motifs, length preferences? Combine multiple motifs, reflecting multiple enzymes? Vaccine design to incorporate cleavage motifs Optimize recombinant vaccine for cleavage?63

Cellular immunity64

65

Epitope-based vaccines A major reason for analyzing and predicting epitopes is because theymay lead to the development of peptide-based synthetic vaccines. Thousands of peptides have been pre-clinically examined; over 100 ofthem have progressed to phase I clinical trials and about 30 to phase II,including vaccines for foot-and-mouth virus infection, influenza, HIV. However, not a single peptide vaccine has passed phase III and becameavailable to the public. The only successful synthetic peptide vaccine has been made againstcanine parvovirus (causing enteritis and myocarditis in dogs andminks). It consists of several peptides from the N-terminal region ofthe viral VP2 protein (residues 1–15, 7– 1, and 3–19) coupled to acarrier induced an immune response in dogs and minks. However, it isexpensive and has lower than the conventional vaccine (attenuated45virus) coverage.

Immunoinformatics The goals: Modeling of the immune system at the population and individual levels (insilico immune system). Design of medical diagnostics and therapeutic/prophylactic vaccines forcancers, allergies, autoimmune and infectious diseases. Epitope discovery: how the antigens are recognized by the cells of theimmune system. Data collection and analysis: IEDB, HIV database, AntiJen (UK), IMGT(France) Evolution of the adaptive and innate immune system: Gary Litman (FL),Louis Du Pasquier (Switzerland) Evolution of pathogens and co-evolution of host and pathogen. Modeling of host-pathogen interactions: Leor Weinberger (UCSD) Deciphering regulatory networks in APCs, lymphocytes and other cells.47

Bioinformatics Challenges in Immunology Bioinformatics 1 -- Lecture 22 Most slides courtesy of Julia Ponomarenko, San Diego Supercomputer Center or Oliver Kohlbacher, WSI/ZBIT, Eberha