Developing Data Quality And Data Sharing Tools For A Global . - Harmonist

Transcription

Developing Data Quality and Data Sharing Tools for aGlobal HIV Research ConsortiumJudy Lewis, PhDApplication Developer, Vanderbilt Institute for Clinical and Translational Research, VUMCAdjoint Assistant Professor, Department of Biomedical Engineering, Vanderbilt University

Harmonist Team at VanderbiltStephany DudaPrincipalInvestigatorEva BascompteMoragasHub LeadJudy LewisToolkit LeadJeremy StephensToolkit DeveloperHilary VansellGrant CoordinatorHarmonist: Developing informatics solutions to harmonizeobservational data in a global research consortium

Today’s Agenda1. IeDEA research consortium2. Challenges in IeDEA multiregional data sharing, merging, and analysis3. Harmonist software tools: design and implementation4. Example workflow5. Initial feedback and results6. Lessons learned3

International epidemiologic Databases to Evaluate AIDSNorthAmerica(NA-ACCORD)West ficEast AfricaSouthernAfrica Established in 2005Funded by NIH7 regions46 countries400 clinics 2 million patients100’s of publications

Flow of IeDEA DataRegionalData Center#1IeDEA SitesIeDEA RegionsRegionalData Center#2Global IeDEA Projects“Multi-regional”In IeDEA Sites generatethe data. Regional DataCenters combineall the data fromone region. Researchers canget data frommultiple regionsfor a globalIeDEA project.5

Data Considerations Data from every clinic can be different. Data at every Regional Data Center can bedifferent.Regional DataCenter #1Regional DataCenter #2 Global IeDEA data are not stored centrally – subsetsof the data are merged for specific projects. Sites and Regions have the ultimate say in whethertheir data is included for a specific project.6

In the Early Days of IeDEA We had no standardized way to share data forglobal projects. Multi-regional projects (projects with 3 IeDEAregions) were very slow, in part because it wasdifficult to merge the data.Cumulative numberof IeDEA publicationsby publication year(figure from Constantin Yiannoutsos)7

IeDEA Data Harmonization Challenges Data from multiple regions must be merged Need common data model that can evolve, is easy to share and access Meaningful research requires quality data Need data quality checking algorithms Need report generation to summarize dataset quality and characteristics Datasets must be transferred from regions to investigators Need secure method for submitting and receiving datasets Regions must communicate to track requests, submit votes Need project management hub Computing resources vary across regions and data managers are busy Need all software tools to require minimal user resources andmaintenance8

Common Data Model

What happens when everyone has a differentdata format or coding? (ex: sex at birth)SEXMaleFemaleOtherUnknownRequires a Common Data Model?MALE Y01With 400 sites in IeDEA, thiscould be difficult.SEXMFXSex12sex0129979899

IeDEA Data Exchange Standard (DES)The IeDEA DES defines the variable names, variable definitions,and code lists for data sharing for global IeDEA projects.

DES Growth Over TimeChange from 2015 to 2019IeDEA DES VersionDES Feature201520172019Data Tables92529Variables60215269New variables are related to pregnancy, mental health,substance use, hospitalizations, diagnoses, etc.We plan to work on additional variable types (e.g., TB,cervical cancer) in 2020.12

Maintaining the IeDEA DES Challenges with MS Word documents Multiple versions, potentially conflicting editsHard to find latest version in files, emailSingle copy is not group editableNot machine-readable Needed a machine-readable solution that was easy toedit and didn’t require technical training. Solution: Use REDCap to create human-readableforms that produce machine-readable structures

Re

Application Developer, Vanderbilt Institute for Clinical and Translational Research, VUMC Adjoint Assistant Professor, Department of Biomedical Engineering, Vanderbilt University. Harmonist Team at Vanderbilt Eva Bascompte Moragas . Secure file transfer