TriNetX Data Dictionary - University Of Washington

Transcription

TriNetX Data DictionaryJune 2019Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIALJune 2019

TriNetX Data DictionaryJune 2019Table of ContentsTriNetX Data Overview . 3Data Tables Relationship . 5Patient Demographic Table . 6Encounter Table . 7Diagnosis Table . 8Procedure Table . 9Medication Table .10Lab Result Table .11Vital Sign Table.12Tumor Properties Table .13Oncology Treatment Table .14Tumor Table .15Chemotherapy Lines of Treatment Table .16Genomics Table .17Cohort Details Table .18Dataset Details Table .19Patient Cohort Table .20Standardized Terminology Table .21Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019TriNetX Data OverviewTriNetX datasets provide researchers access to de-identified patient data from networks of healthcareorganizations (HCO) and other data providers.Below are a set of questions and answers that describe the overall characteristics of TriNetX data.What kind of data comes in a TriNetX dataset?TriNetX datasets are comprised of clinical patient data such as demographics, diagnoses, procedures,labs, and medications. This is commonly referred to as real-world data (RWD).The data in TriNetX datasets are: Primarily from HCOs electronic medical record (EMR) systemsCollected for the primary purpose of providing care to patientsThe data in TriNetX datasets are not: Claims data, data primarily collected for the purposes of billingData collected for the purposes of randomized clinical trialsWhere does the data in a TriNetX dataset originate?Data in TriNetX datasets comes from HCOs and other data providers. The data these entities provideprimarily comes from: EMR systemso Structured datao Unstructured data processed by Natural Language Processing (NLP) technologyCancer registriesOther sources (e.g., genomic data from third party genomic testing labs)What are the characteristics of the HCOs that provide TriNetX with data?The majority of the HCOs are large academic medical institutions with both inpatient and outpatientfacilities. Most of these HCOs are adult acute-care hospitals with multiple facilities and locations. AllHCOs are currently located within the United States.HCOs provide TriNetX with both inpatient and outpatient data. The data they provide is representativeof the entire patient population at the HCO. Most HCOs provide an average of seven years of historicaldata.How is data transformed from its original source?TriNetX typically receives data from HCOs and other data providers in one of two ways:1. TriNetX ingests data directly from an HCO’s research repository (e.g., i2b2) into the TriNetXenvironment2. An HCO or data provider sends TriNetX data extracts in the form of CSV filesCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019TriNetX maps the data to a standard and controlled set of clinical terminologies. The data is thentransformed into a proprietary data schema. This transformation process includes an extensive dataquality assessment that includes ‘data cleaning’ that rejects records that don’t meet the TriNetX qualitystandards.How fresh (up to date) is the data?One of the distinguishing characteristics of the TriNetX dataset is that it is continuously refreshed. HCOsand other data providers update their data at various times with over 80% refreshing in 1, 2, or 4-weekfrequency intervals. The average lag time for an HCO’s source data refresh is one month.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryData Tables RelationshipCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIALJune 2019

TriNetX Data DictionaryJune 2019Patient Demographic TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (de-identified).sexVARCHARMThe biological sex of the patient. Possiblevalues are M, F, Unknown.raceVARCHARWhiteThe race of the patient. Possible values areAmerican Indian or Alaska Native, Asian, Blackor African American, Native Hawaiian or OtherPacific Islander, White, Unknown.ethnicityVARCHARHispanicThe ethnicity (cultural background) of thepatient. Possible values are Hispanic or Latino,Not Hispanic or Latino, Unknown.year of birthBIGINT1958The birth year of the patient. May be blank ifthe birth year occurred more than 90 yearsbefore the year the dataset was created.57The age at the time of the patient’s death. If apatient’s age at death is above 90, the age atdeath is rounded to 90 in order to protectpatient privacy.age at deathBIGINTpostal codeVARCHARThe postal code of the patient. Only availableon Diamond network data.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Encounter TableData ElementData TypeSample DataDescriptionencounter idVARCHAR987654321The unique ID for the encounter (deidentified).patient idVARCHAR123456789The unique ID for the patient (de-identified).start dateDATETIME(YYYYMMDD)20110315The date the encounter began.end dateDATETIME(YYYYMMDD)20110318The date the encounter ended.typeVARCHARAMBThe care setting of the encounter. Possiblevalues are Ambulatory (AMB), Emergency(EMER), Field (FLD), Home Health (HH),Inpatient Encounter (IMP), Inpatient Acute(ACUTE), Inpatient Non-acute (NONAC),Observation (OBSENC), Pre-admission(PRENC), Short Stay (SS), Virtual (VR). Thesevalues are based on HL7 v3 Value SetActEncounterCode.derived by TriNetXBOOLEANTFlag that indicates whether the encounterstart or end date was derived by TriNetX.Possible values are T for TRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Diagnosis TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).encounter idVARCHAR987654321The unique ID for the encounter (deidentified).code systemVARCHARICD-10-CMThe name of the code system in whichthis diagnosis is coded. Possible codesystems are ICD-9-CM, ICD-10-CM.codeVARCHARE11The diagnosis code.dateDATETIME(YYYYMMDD)20110315The date the diagnosis was recorded.derived by TriNetXBOOLEANTFlag that indicates whether the diagnosiswas derived by TriNetX. Possible valuesare T for TRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Procedure TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (de-identified).encounter idVARCHAR987654321The unique ID for the encounter (deidentified).code systemVARCHARICD-10-PCSThe name of the code system in which thisprocedure is coded. Possible code systemsare ICD-9-CM, ICD-10-PCS, CPT.codeVARCHAR03CJ0ZZThe procedure code.dateDATETIME(YYYYMMDD)20150314The date the procedure was recorded.derived by TriNetXBOOLEANTFlag that indicates whether the procedurewas derived by TriNetX. Possible values areT for TRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Medication TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (de-identified).encounter idVARCHAR987654321The unique ID for the encounter (de-identified).code systemVARCHARRxNormThe name of the code system in which thismedication is coded. The code system isRxNorm.codeVARCHAR26225The medication code.start dateDATETIME(YYYYMMDD)20120914The date the medication order, prescription, oradministration was recorded.routeVARCHAROral ProductThe route of administration. Possible values areDrug implant, Inhalant, Injectable,Intraperitoneal, Nasal, Ophthalmic, Oral, Otic,Rectal, Topical, Urethral, Vaginal, Unknown.brandVARCHARZofranThe medication brand.strengthVARCHAR4 mgThe medication strength.derived by TriNetXBOOLEANTFlag that indicates whether the medication wasderived by TriNetX. Possible values are T forTRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Lab Result TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).encounter idVARCHAR987654321The unique ID for the encounter (deidentified).code systemVARCHARLOINCThe name of the code system in whichthis lab observation is coded. The codesystem is LOINC.codeVARCHAR2885-2The code representing the lab test.dateDATETIME(YYYYMMDD)20120914The date the test result was recorded.lab result num valDECIMAL7The lab result for numeric results.derived by TriNetXBOOLEANTFlag that indicates whether the labresult was derived by TriNetX. Possiblevalues are T for TRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Vital Sign TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).encounter idVARCHAR987654321The unique ID for the encounter (deidentified).code systemVARCHARLOINCThe name of the code system in which thisvital sign is coded. The code system isLOINC.codeVARCHAR8302-2The code representing the vital sign.dateDATETIME(YYYYMMDD)20120914The date the vital sign was recorded.valueVARCHAR72The value of this vital sign.derived by TriNetXBOOLEANTFlag that indicates whether the vital signwas derived by TriNetX. Possible values areT for TRUE and F for FALSE.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Tumor Properties TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).diagnosis dateDATETIME(YYYYMMDD)20120914The date of the original primarycancer diagnosis.observation dateDATETIME(YYYYMMDD)20121114The date the property was recorded.tumor site code systemVARCHARICD-OThe name of the code system inwhich the tumor site is coded. Thecode system is ICD-O.tumor site codeVARCHARC50The tumor site code.morphology code systemVARCHARICD-OThe name of the code system inwhich morphology is coded. The codesystem is ICD-O.morphology codeVARCHAR8500/3The morphology code.tumor property code systemVARCHARTriNetX –TumorPropertyThe name of the code system inwhich the tumor property is coded.The code system is TriNetX. This is acode system created by TriNetX foroncology specific factors.tumor property codeVARCHARCSF07Colon 060The code that indicates the type oftumor property.TFlag that indicates whether the tumorproperty was derived by TriNetX.Possible values are T for TRUE and Ffor FALSE.derived by TriNetXBOOLEANCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Oncology Treatment TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for thepatient (de-identified).diagnosis dateDATETIME(YYYYMMDD)20120914The date of the primarycancer diagnosis.tumor site code systemVARCHARICD-OThe name of the codesystem in which the tumorsite is coded. The codesystem is ICD-O.tumor site codeVARCHARC50The tumor site code.morphology code systemVARCHARICD-OThe name of the codesystem in which morphologyis coded. The code system isICD-O.morphology codeVARCHAR8500/3The morphology code.oncology treatment start dateDATETIME(YYYYMMDD)20121001The start date of the courseof oncology treatment.oncology treatment code systemVARCHARTriNetX –OncologyTreatmentThe name of the codesystem in which theoncology treatment iscoded. The code system isTriNetX. This is a codesystem created by TriNetXfor oncology treatment.oncology treatment codeVARCHAR1390 1The code for the oncologytreatment.TFlag that indicates whetherthe oncology treatment wasderived by TriNetX. Possiblevalues are T for TRUE and Ffor FALSE.derived by TriNetXBOOLEANCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Tumor TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for thepatient (de-identified).diagnosis dateDATETIME(YYYYMMDD)20120914The date of the originalprimary cancer diagnosis.observation dateDATETIME(YYYYMMDD)20121114The date the property wasrecorded.tumor site code systemVARCHARICD-OThe name of the codesystem in which the tumorsite is coded. The codesystem is ICD-O.tumor site codeVARCHARC50The tumor site code.morphology code systemVARCHARICD-OThe name of the codesystem in which morphologyis coded. The code system isICD-O.morphology codeVARCHAR8500/3The morphology code.stage code systemVARCHARTriNetX –OncologyStageThe name of the codesystem in which the tumorstage is coded. The codesystem is TriNetX. This is acode system created byTriNetX for tumor stages.stage codeVARCHAR2bThe code for the tumorstage.TFlag that indicates whetherthe tumor entry was derivedby TriNetX. Possible valuesare T for TRUE and F forFALSE.derived by TriNetXBOOLEANCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Chemotherapy Lines of Treatment TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (de-identified).start dateDATETIME(YYYYMMDD)20150314The date the chemotherapy line oftreatment was determined to start.1The sequential order of chemotherapyregimens. Possible values are 1, 2, 3, 4, or 5with 1 the first regimen and 5 the lastregimen. These lines are derived by TriNetX.TFlag that indicates whether thechemotherapy line of treatment was derivedby TriNetX. Possible values are T for TRUEand F for FALSE.lineBIGINTderived by TriNetXBOOLEANCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Genomics TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).code systemVARCHARHGVSThe name of the code system inwhich genomic data is coded. Thesyntax of the code conforms to HGVS.codeVARCHARBRAF p.V600Ec.1799T AVariant description.test dateDATETIME(YYYYMMDD)20120914The date the genetic test wasrecorded.TFlag that indicates whether thegenomic data was derived by TriNetX.Possible values are T for TRUE and Ffor FALSE.derived by TriNetXBOOLEANCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Cohort Details TableData ElementData TypeSample DataDescriptioncohort nameVARCHARDiabetes womenaged 18-45The name of the cohort included inthe dataset.cohort numberBIGINT1The number of the cohort includedin the dataset.total patient recordsBIGINT20,000The total number of patient recordsin the cohort in the dataset.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Dataset Details TableData ElementData TypeSample DataDescriptiontotal number unique patientsBIGINT19,000The total number of unique patientrecords across multiple cohorts inthe dataset. A patient’s recordcould be in a single cohort multipletimes if the patient visited morethan one HCO that contributeddata to a cohort.total number HCOsBIGINT7The total number of healthcareorganizations contributing data tothe dataset.date createdDATETIME(YYYYMMDD)20180316The date the dataset was created.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Patient Cohort TableData ElementData TypeSample DataDescriptionpatient idVARCHAR123456789The unique ID for the patient (deidentified).cohort nameVARCHARDiabetes womenaged 18-45The name of the cohort in whichthe patient’s record is included.cohort numberBIGINT1The number of the cohort in whichthe patient’s record is included.Copyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

TriNetX Data DictionaryJune 2019Standardized Terminology TableData ElementData TypeSample DataDescriptioncode systemVARCHARRxNORMThe name of the code system inwhich the data element is coded.codeVARCHAR1191The code for the data element.code descriptionVARCHARAspirinThe textual description of the 00029133/N0000029135/1191The terms the data element ismapped to and the path in whichthose terms exist.inchesThe unit of measurement for acode value. This field only appliesto codes in the Lab Result table andthe Vital Sign table.unitVARCHARCopyright 2019 TriNetX, Inc. All Rights Reserved. CONFIDENTIAL

Data in TriNetX datasets comes from HCOs and other data providers. The data these entities provide primarily comes from: EMR systems o Structured data o Unstructured data processed by Natural Language Processing (NLP) technology Cancer registries Other sources (e.g., genomic data from third party genomic testing labs)