Machine Learning To Identify Lupus Phenotype In Electronic Health Records

Transcription

Machine learning to identify lupus phenotype inelectronic health recordsYuan Luo, PhD, FAMIAChief AI OfficerNorthwestern University Clinical and Translational Sciences InstituteInstitute for Augmented Intelligence in MedicineAssociate ProfessorDepartment of Preventive MedicineNorthwestern /26/20221

11 Hospitals200 Clinics and Rehabs4000 PhysiciansYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20222

Northwestern Medicine Enterprise Data WarehouseYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20223

Northwestern Medicine Enterprise Data WarehouseDataEngineeringDataIntegrationIntegrate data within the Enterprise DataWarehouse, and create custom datastructures, and cubes, to support analyticsfor specific business functionsData ModelingDevelop and maintain an enterprise datamodel that illustrates attributes andrelationships for all data storedwithin the EDWData ArchitectureDevelop and implement data marts,tabular models, and data structures whichenable tKPI Development& ReportingCollaborate and develop reports for FSMresearchers to identify patient cohorts forresearch studies & develop self-servicereports for NUCATS and FSMAdministrationResearch App DevCustom development of applicationsused to track research studies, patientcohorts, and research outcomesReporting Portal& Self-Service BICustomer SupportWork with FSM researchers to create SLAs,respond to ad-hoc research requests,conduct educational sessions for powerusers, and administer the FSM exceptionpolicyDevelop, and maintain new featureson the NMEDW Reporting Portal.Administer Tableau Server, the selfservice BI tool for the Health SystemCustomer Support& MaintenanceAdvanced AnalyticsDevelop custom data structures in avariety of data models to enable datasharing amongst AMC peers, and supportkey research initatives . Develop predictiveanalytic applications to support outcomesresearchService tickets related to new featurerequests, or incidents and maintain aninventory of 35 applications forNMHC & FSM.DataWarehouseOperationsSource SystemLoadsWork with vendors and customersacross the organizations to load newdata sources into the EDW. Maintain,monitor, and troubleshoot sourcesystem loads dailyInfrastructureOversee EDW infrastructure (server,storage, network, database) andcollaborate with NMIS to ensure highavailability and quick query responsetimesManage EpicReportingEnvironmentsManage, monitor, and troubleshootall Clarity environments on campus, aswell as Caboodle (Epic’s DW)Courtesy of NMEDW DirectorYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20224

Electronic Medical Records and Genomics (eMERGE) NetworkYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20225

AI to Integrate Multimodal Biomedical DataClinical narrative textMedical ImagingOmics dataTime seriesStructured dataYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20226

Bulk NLP to power R&D and BIYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20227

Lupus nephritis computational phenotyping Lupus nephritis is one of the major risk factors for systemic lupuserythematosus (SLE) mortality Chibnik et al. ICD codes; claims data; positive predictive value (PPV)88%; sensitivity and specificity not mentioned; not externally validated Li et al. used ICD codes; good sensitivity and specificity; low PPV, 63.4% A need to use natural language processing (NLP) method to improveperformance A need to validate algorithm internally and externallyChibnik et al 2010;Li et al 2020Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20228

Objectives Develop NLP based algorithms to identify lupus nephritis phenotypeamong SLE patients Validate the algorithms using internal dataset and external datasetYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/20229

SLE patient cohortsYuan Luo (Northwestern)Definite SLE in CLDdefined by meeting atleast 3 ACR criteria:1052Definite SLE in CLDthat meet both ACRclassification and SLICCclassification: 878Definite SLE in bothCLD and EDW: 818Patient who had atleast 4 encounters inEDW: 472ML to identify lupus phenotype in EHR3/26/202210

SLE patient cohortsNUVUMCYuan Luo (Northwestern)training (4/5)L2-regularizedlogisticregressiontesting (1/5)Evaluation (PPV,NPV, sensitivity,specificity)Apply pretrainedmodelEvaluation (PPV,NPV, sensitivity,specificity)Dataset(472 SLEtotal; 178 cases;294 controls)Dataset(13cases; 37controls)ML to identify lupus phenotype in EHR3/26/202211

Lupus nephritis phenotype: baseline modelTypeYuan Luo CD-10ICD-10R80R80.9DiagnosisICD-10R82.99This is a top-level code that contains a tree ofother codes. R82.99 is not used for diagnosisLaboratoryLOINC2889-4(value 500mg (/24H), 0.5g (/24H)LaboratoryLOINC21482-5(value 500mg (/24H), 0.5g (/24H)LaboratoryLOINC51790-4value 0/hpf or s is a top-level code that contains a tree ofother codes. R80 is not used for diagnosisvalue 0/hpf or /lpfvalue 0/hpf or /lpfML to identify lupus phenotype in EHR3/26/202212

Lupus nephritis phenotype: NLP for feature extractionLupusnephritisBaseline ts)FeaturesAny positivemention oflupus nephritisin ICD9/10 orlaboratory tests- AllMetaMapconcepts(excludenegatedconcept)- C10000- NAC(0.000110000);solver(newton-cg,lbfgs, sag andsaga)C(0.000110000);solver(newton-cg,lbfgs, sag andsaga)C(0.0001-10000); solver (newton-cg, lbfgs, sag and saga)Yuan Luo (Northwestern)Mixed filtered metamap/regex/ICD model- Structured data feature (1 variable): positive mention in ICD orlaboratory test- NLP features- MetaMap (7 variables): C0024143, C0268757, C0268758,C4053955, C4053958, C4053959, C4054543- Regular expression (5 variables)- Mention of any keywords related to nephritis class II- Mention of any keywords related to nephritis class III- Mention of any keywords related to nephritis class IV- Mention of any keywords related to nephritis class V- Mention of any keywords related to and proteinuriaML to identify lupus phenotype in EHR3/26/202213

CUIs and their definition in mixed modelYuan Luo itis in the context of systemic lupus erythematosus.C0268757C0268758C4053955Lupus nephritis - WHO Class IVLupus nephritis - WHO Class VSystemic lupus erythematosus nephritis, with active or inactive diffuse, segmental orglobal endo- or extracapillary glomerulonephritis involving greater than or equal to 50%of all glomeruli, typically with diffuse subendothelial immune deposits, with or withoutmesangial alterations.C4053958Systemic lupus erythematosus nephritis exhibiting mesangial hypercellularity ormesangial expansion by light microscopy, with mesangial immune deposits. Isolatedsubepithelial or subendothelial deposits may be visible by immunofluorescence orelectron microscopy, but not by light microscopyC4053959Systemic lupus erythematosus nephritis with active of inactive focal, segmental or globalendo- or extracapillary glomerulonephritis involving less than 50% of all glomeruli,typically with focal subendothelial immune deposits with or without mesangialalterations.C4054543Membranous nephritis associated with systemic lupus erythematosus.ML to identify lupus phenotype in EHR3/26/202214

cityPPVNPVF MeasureLupus nephritisNU (testingset)Baseline0.430.60.390.640.41Lupus nephritisNU (testingset)Full MetaMap(binary)0.630.920.820.810.71Lupus nephritisNU (testingset)Full MetaMap(counts)0.60.950.880.80.71Lupus nephritisNU (testingset)MetaMap mixed0.740.920.840.860.79Lupus nephritisVUMCBaseline0.920.610.460.960.62Lupus nephritisVUMCMetaMap mixed10.970.9310.96Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202215

ResultsArea under curve (AUC) for Full MetaMap (binary), Full MetaMap (counts), andMetaMap mixed model in NU testing setYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202216

SHAP plot for Full MetaMap (binary) model SHAP feature importance measured as the mean absolute Shapley values– C0005558 was the most important feature, changing the predicted absolute lupus nephritisprobability on average by 0.45– C0005558: Patient required removal of tissue or fluid specimen to establish a diagnosis– C0024143: Glomerulonephritis in the context of systemic lupus erythematosus– C0027697: Inflammation of renal tissue– C1318439: A quantitative measurement of the total amount of creatinine present in a sample ofurine– C1962972: Proteinuria, CTCAE 3.0– C0022658: Kidney Disorder– C0033687: Proteinuria– C0428283: Urine creatinine level finding– C0194073: Kidney BiopsyYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202217

SHAP plot for MetaMap (counts) model SHAP feature importance measured as the mean absolute Shapley values– C1561535 was the most important feature, changing the predicted absolute lupus nephritisprobability on average by 20.51– C1561535: Creatinine, CTCAE– C0022646: Kidney– C0024143: Lupus Nephritis– C0033687: Proteinuria– C1962972: Proteinuria, CTCAE 3.0– C1707664: Delayed Release Dosage Form– C0027697: Nephritis– C0262923: Urine protein test– C0149745: Oral UlcerYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202218

SHAP plot for MetaMap Mixed model SHAP feature importance measured as themean absolute Shapley values– C002413 (glomerulonephritis in the context ofsystemic lupus erythematosus ) was the mostimportant feature, changing the predicted absolutelupus nephritis probability on average by 0.9– RENAL: renal indictor from structured data– renal C4054543: Membranous Lupus Nephritis– renal C0268758: SLE glomerulonephritis syndrome,WHO class V– renal C4053955: Systemic Lupus ErythematosusNephritis Class IV– renal C4053959: Systemic Lupus ErythematosusNephritis Class IIIYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202219

Discussion Developed and validated three NLP algorithms to identify lupus nephritis phenotypeamong SLE patients in EHR All three NLP algorithms outperformed the baseline algorithm in the internalvalidation dataset The MetaMap/regex/ICD mixed model (NLP based) outperformed baseline model inboth internal and external validation sets (0.79 vs 0.41; 0.96 vs 0.62) Limitations: limited sample size of external validation datasetY Deng, J Pacheco, A Chung, C Mao, J Smith, J Zhao, WQ Wei, A Barnado, C Weng, C Liu, A Cordon, J Yu, Y Tedla, A Kho, R RamseyGoldman, T Walunas, Y Luo, Natural language processing to identify lupus nephritis phenotype in electronic health records, under reviewYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202220

Latent Class Analysis for SLE Sub-phenotype IdentificationSLE is a systemic auto-immune diseaseSLE affects many organsHeterogeneity in natureClinical trial on lupus treatment has been challengingStratification on SLE may potentially improve clinical trial resultsYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202221

ACR (American College of Rheumatology ) criteriaACR Criteria [1,2]ExplanationMalar RashFlat or raised erythema, often sparing the nasolabial foldsDiscoid RashRaised erythematous patches with keratotic scaling, follicular plugging, and atrophic scarringPhotosensitivityBy patient history or physician observationOral UlcersOral or nasopharyngeal ulceration, usually painlessArthritisInvolving 2 peripheral joints, with tenderness and swellingSerositisPleuritic pain or rub or evidence of pleural effusionRenal DisorderConfirmed by ECG, rub or evidence of pericardial effusion, Persistent proteinuria 0.5 g per day; OR Cellularcasts (RBC, granular, mixed)Neurologic DisorderSeizures - in the absence of other causes or drugs; OR Psychosis - in the absence of other causes or drugsHematologicDisorderHemolytic anemia - with reticulocytosis; OR Leukopenia: 4000/ mm; OR Lymphopenia: 1500/ mm; ORThrombocytopenia:ImmunologicalDisorderPositive anti-DNA; OR positive anti-Sm; OR positive test for antiphospholipid antibodiesAntinuclearantibodiesBy immunofluorescence or ELISA, at any point in timeYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202222

Objectives To evaluate if latent class analysis (LCA) [3] could identify distinct SLEsubtypes based on ACR criteria To evaluate the potential association of these identified clusters withthe risk of mortalityYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202223

MethodsLatent clusteranalysis (LCA)Patient characteristicscomparison amongclusters: Chi-squaretest/ANOVASurvival outcomecomparison: KaplanMeierModel selection:optimal number ofclusters (BIC)Cluster1Model Evaluation:bootstrapping Jaccard scoreCluster2 Cluster clinicalinterpretationYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202224

The LCA ModelItem1Item2Item3Item4 Within the same latent class,individual has similarresponse pattern Between latent classes,response patterns aredifferentLatent classLatent class1Item1Item2Yuan Luo (Northwestern)Item3Latent class2Item4Item1Item2ML to identify lupus phenotype in EHRItem3Item43/26/202225

The LCA Model Assumption:– Mixture of C classes [3] P(y) σ𝑐𝑥 1 𝑃(𝑋 𝑥)𝑃(𝑦 𝑋 𝑥)– Independency within classes P(y X x) Likelihood:Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202226

Model stability evaluationCluster onoriginalSampleYuan Luo (Northwestern)BootstrapSample clusterclustcluster1 er1clustcluster2 er2BootstrapSample clusterclustcluster1 er1clustcluster2 er2BootstrapSample clusterclustcluster1 er1clustcluster2 er2 1000 timesJaccard coefficient foreach clusterML to identify lupus phenotype in EHR3/26/202227

Model stability: calculate Jaccard score For every cluster in the original clustering, find the most similar clusterin the new clustering, calculate Jaccard coefficient [4] Jaccard coefficient: Intersection over Union For 1000 repetitions, get the average Jaccard coefficient for each clusterYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202228

Results: Patient cohortLinkage856 SLE patients in CLDEND826 SLE patientsYuan Luo (Northwestern)NorthwesternDataWarehouseObtain death data fromdeath master file:available till 2014-03-21846 SLE patientsExclude patients withmissing dataExclude patientshaving SLE after2014-03-21826 SLE patientsML to identify lupus phenotype in EHR3/26/202229

Patient cohort characteristics 826 patients in both NMEDW and CLDThe average onset age of SLE is 30As of 2014-03-21, 68/826 patients had diedThe mortality rate was 8.2%VariablePercentage (%)Male8.23Caucasian54.48African American(AA)27.00Young onset (1-16)10.77Adult onset (17-50)83.66Late onset sitis63.56Renal disease34.75Neurological disease7.51Hematological disorder 52.78Yuan Luo (Northwestern)ML to identify lupus phenotype in EHRImmnological disorder73.12Oral ulcer44.67Rash63.00Antinuclear antibodies94.793/26/202230

LCA on our data: two clusters generated Performed LCA on the features mentioned above we repeated 30 times to automate the search for the global maximum of the loglikelihood function When the number of clusters was 2, the BIC is the lowest Average Jaccard Coefficient (1000 times bootstrapping) for clusters 1 and 2 were0.876 and 0.906, respectivelyModel1 Model2 Model3 Model4 Model5 Model6Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202231

Between cluster characteristics comparisonParameterTotal NumberYoungOnsetAdultOnsetLateOnsetRace (Caucasian)Race(AA)Race (others)SEX 5%)189(40.40%)157(33.57%)122(26.03%)63(13.37%)Yuan Luo ValueNA0.042160.042160.04216 2.2e-16 2.2e-16 2.2e-162.467e-06ParameterTotal euralHEMEANAIMMUNUlcerML to identify lupus phenotype in 3/26/2022PValueNA 2.2e-16 2.2e - 160.00069970.0499 2.2e-160.00059994.974e-050.003458 2.2e-16 2.2e-1632

Heatmap plot for two clusters Cluster 1 was enriched in patients withorgan involvement and fewer patientswith skin manifestations (49.66% rash,45.21% photosensitivity, 29.67% oralulcer) Cluster 2 was enriched in patients withskin manifestations (80.35% rash,91.77% photosensitivity, 64.31% oralulcer) but less organ involvement cluster 1 also consists of more nonwhite (59.60% vs 27.07%), male(12.14% vs 3.12%) and younger onsetpatients (13.42% vs 7.32%)Yuan Luo (Northwestern)ML to identify lupus phenotype in EHRcluster1cluster23/26/202233

Survival curve The 475 SLE patients in cluster1,we had 51 death event, mortalityrate was 10.7% Cluster2, out of 351 patients wehad 26 death event, mortality ratewas 7.4% Log-rank test showed that twoclusters had significant survivalcurve with P 0.05Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202234

Conclusion LCA identified 2 clusters (primarily organ involvement group and skininvolvement group) with distinct clinical manifestations and survivaloutcomes Although results from this study are preliminary, it shows strongpotential of unsupervised learning in identifying homogeneous SLEgroupsYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202235

Ongoing and future workMachine learning to integrate multi-modal healthcare dataClinical narrative textMedical ImagingOmics dataTime seriesStructured dataYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202236

Ongoing and future workCurrent status of multi-modal machine learning in healthcareC U R R E N T L I M I TAT I O N S I N T H E M L D ATA F U S I O N P I P E L I N EMODEL TRAINING Training multiple modelsWeighting of data interactionVoting issues for multiple modelsMODEL BUILDING MODEL TESTING Lack of comparison withsingle modalityLack of comparison withalternative fusion strategies Complex and timeconsuming multi-modalmodels, creating a barrierto creationUnclear which fusionmodels are superiorMerging complementary andcorrelated dataYuan Luo (Northwestern)DATA FLOW-THROUGHTRANSLATIONAL SUPPORT Digitally recorded data/retrospectiveMissing dataSingle siteML to identify lupus phenotype in EHRLack of FDA approved tools (0%)Ease of use for clinical partnershipsClinical relevance is unclear3/26/202237

Ongoing and future workCreating flagship dataset Collaborative Resource for Intensive care Translational science, Informatics,Comprehensive Analytics, and LearningCredentialed accessJohnLucyJane SBP78123127 DBP496866 Na143NA140 K Cl Glucose Ca4 1111625.83 1081199.14.3 109NA8.9 SBP DBP Na K Cl Glucose CaMike 81 56 132 NA 1151716.8Yang 115 NA 145 3.9 110NA8.1 Federated access DischargeSummaries Northwestern 182kpatientsWUSTL 68k patients DavidMaryRobertAndrea SBP91NA143136 DBP537172NA Na151135156138 K53NA4.7 Cl Glucose CaNA1844.7991256.51021608.6120115NA PathologyReportsDischargeSummaries Tufts 45k patientsUAB 73k patientsMITTechnology Advisory from the MIMIC TeamYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202238

Ongoing and future workModeling missingness and bias in distributed settingSecure Central ,(2)BiasInputEncrypted local parametersSecure aggregationGlobal model updatesEncrypted global parametersLocal model updatesAggregated resultsGlobal Bias and Missing data Modeling(3)for longitudinal health data (BMM)(1)(1)(1)Modelparameters canbe protectedusingEncryption(4)(4)(4) (5)Local dataLocal BMMInstitution 1Yuan Luo sultsIntermediaryresults(5)Local dataLocal BMMInstitution 2ML to identify lupus phenotype in EHRLocal dataLocal BMMInstitution 33/26/202239

Ongoing and future workGrowing the tree: AI4H Clinic The event is open to clinicians who want to discuss a clinical problem or challengethey face that might be addressable through Artificial Intelligence Integrated with classroom teaching– Students, guided by instructor, brainstorm on solutions to the clinical problem– Students also get the opportunity to work on projectsAbel N Kho, MDLeena Mithal, MD,Sadiya Khan, MD, MSc,Ceylan Z Cankurtaran,Internal MedicinePediatricsCardiologyMD, RadiologyJames Adams, MD,David Liebovitz, MD,Srikanth Divi, MD,Scott Dresden, MD,Emergency MedicineInternal MedicineOrthopaedic SurgeryEmergency MedicineYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202240

VisionThe pandemic as a stress test for machine learning in healthcareYuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202241

VisionMoving from reactive to proactive machine learningLuo Y, Wunderink RG, Lloyd-Jones D. Proactive vs Reactive Machine Learning inHealth Care: Lessons From the COVID-19 Pandemic. JAMA. 2022.Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202242

References [1]M.C. Hochberg, Updating the American College of Rheumatology revised criteriafor the classification of systemic lupus erythematosus, ArthritisRheum40(1997),1725. [2]E.M. Tan, A.S. Cohen, and R.J. Winchester, The 1982 revised criteria for theclassification of systemic lupus erythematosus, Arthritis Rheum 25 (1982), 12711277. [3]D.L. Linzer, J, poLCA: An R Package for Polytomous Variable Latent Class AnalysisJournal of Statistical Software 42 (2011), 29. [4] Daniel Oberski,2015, latent class analysis, lecture notes, Dept of Methodologyand statistics, Tilburg university, delivered 2015Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202243

Thank you! Collaboration welcomeyuan.luo@northwestern.edu@yuanhypnosluoWe are hiring, multiple postdocs position ab/Yuan Luo (Northwestern)ML to identify lupus phenotype in EHR3/26/202244

Work with vendors and customers across the organizations to load new . ML to identify lupus phenotype in EHR SLE patient cohorts 3/26/2022 10 Definite SLE in CLD defined by meeting at least 3 ACR criteria: 1052 Definite SLE in CLD . Diagnosis ICD-10 N28.9 Diagnosis ICD-10 R80 This is a top-level code that contains a tree of other codes. R80 .