Creating The Golden Record - DAMA NY

Transcription

Creating the Golden RecordBetter Data through ChemistryDonald J. SoulsbymetaWright.com

Agenda The Golden Record Master Data DiscoveryIntegrationQuality Master Data Strategy

DAMA – LinkedIn GroupC. Lwanga Yonke- Information Quality Practitioner.Spewak advocated using data dependencyto determine the ideal sequence in whichapplications should be developed andimplemented: “Develop the applications thatcreate data before those that need to usethat data” (p.10).

Architecture AdvocatesWilliam Smith– Entity LifecycleClive Finkelstein- Information Engineering – CRUDRon Ross- Resource Life Cycle Analysis

CRUD in a Perfect World

Canonical SynthesisBroadly speaking, materials scientists investigate two typesof phenomena. Both are based on the microstructures ofmaterials: ii. How do these microstructures influence the propertiesof the material (such as strength, electrical conductivity, orhigh frequency electromagnetic absorption)?http://www.its.caltech.edu/ matsci/WhatIs2.html

Business VS Development Life CyclesZachman Framework for Enterprise ArchitectureYWHENWHOWHEERWHWHOATWHList of thingsList ofProcessesList ofLocationsList ofOrganizationUnitsList ofEventsList ofBusinessGoals/Stat.BusinessEntity ModelBusinessProcessModelLogisticsNetworkWork FlowModelMasterScheduleBusinessPlanLOGICALLogical ule ModelPHYSICALPhysicalData elPresentationArchitectureControlStructureRule CONTEXTUALScopeCONCEPTUALBusiness ModelSystem ModelTechnology ModelOUT-OFCONTEXTComponentsPRODUCTFunctioning System

Tibetan Proverb“ If upstream is dirty,downstream will be muddy”

Data Warehousing - ETL

Transmutation

The SecretBernard Trevisan, a 15th centuryalchemist, spentmuch of his life andsa sizable fortune in search of thesecret of turning base metals intogold, realized while dying was “To make gold, one muststart with gold.”

DAMA Guide to the DataManagement Body of KnowledgeReference and MasterData Management:Managing goldenversions and replicas

Data Management Body of Knowledge 2008 DAMA International

DAMA DictionaryReference DataAny data used to categorize other data, or forrelating data to information beyond theboundaries of the enterprise. See master data.Master DataSynonymous with reference data. The data thatprovides the context for transaction data. Itincludes the details (definitions and identifiers)of internal and external objects involved inbusiness transactions. Includes data aboutcustomers, products, employees, vendors, andcontrolled domains (code values).

DAMA DictionaryMaster Data Management (MDM)Processes that ensure that reference data is keptup to date and coordinated across anenterprise. The organization, management anddistribution of corporately adjudicated datawith widespread use in the organization.Reference & Master Data ManagementEnsuring consistency with a “golden version” ofdata values. One of nine data managementfunctions identified in the DAMA-DMBOKFunctional Framework.

Top 13 MDM buzzwords1. Metadata (Don’t say metadata)2.3.4.5.6.7.8.9.10.11.12.13.Product information management (PIM)Enterprise master patient index (EMPI)Data governanceCustomer data integration (CDI)MDM hubMDM architecture"Collaborative" vs. "analytical" MDMMDM return on investment (ROI)MDM stakeholdersEnterprise hierarchy managementMDM metricsData competency center02 Jul 2009 Justin Aucoin, Assistant Site Editor, searchdatamanagement.com

The QuestBloor Research: the discovery of relationshipsbetween data elements, regardless of where thedata is storedDiscoverydDIQMDMQualityDAMA: The degree to which datais accurate, complete, timely,consistent with all requirementsand business rules, and relevantfor a given use.IntegrationDAMA: The planned andcontrolled transformation andflow of data across databases,for operational and/or analyticaluse.

Atomic Data of the EnterpriseProvide and maintain a consistent view of anenterprise's core business information assetsMendeleevALCHEMYVS.CHEMISTRY

What is Master Data?Point In TimeCurrency ProductCustomerCountryTransactionA Transaction is relationship ofMaster Data Reference Files time stamp volumes secondary descriptors

Master Data - ossReferenceDataReferenceDataISO odeOperational AnalyticalUniversalLocalUnified

Master Data TypesOperational Master DataDefinition, creation, and synchronization of MasterData required for transactional systems anddelivered via service-oriented architecture (SOA).Analytical Master DataDefinition, creation, and integration, includingmultiple historical versions, of Master Datarequired for Enterprise Reporting or DataWarehousing applications.Unified Master DataMaster Data that is defined and crated to apply toboth Operational and Analytical applications.Implementation of the “single view of the truth”.It requires achieving agreement on a complextopic among a group of people

Master Data IntegrationUniversalMaster Data that is created and maintained by an external organization.For reference data it is often developed by a standards organizationsuch as the ISO. Master Data Models are often available to membersof an industry trade or professional associationEnterpriseMaster Data that represents the common business information assetsthat need to be agreed on and shared throughout the enterpriseCommunityMaster Data that is shared between two or more applications. Typicallydata is replicated from the application that contains the system ofrecord for the Master Data, by federation (key linkage) orpropagation (materialized views).LocalMaster or Reference Data that is created and maintained for a singleoperational application. Much of the Reference Data will remain local,such as Status Codes, for the transactions within the application.

Master Data - cessPeriodPerson

Atomic EPARTYCDI

Master Data ssActivityServiceGeo-PoliticalSeasonalityGo To Market

MDM – Integration TechniquesData Propagation (Today)Copies Master Data from one location toanother (materialized view).Data ConsolidationCaptures data from multiple data sources andintegrates into a single Master Data hub.Data FederationProvides a single virtual Master Data view ofone or more data sources. No data is storedin the Master Data hub.

VS

Architecture choicesIndependentData MartData Mart BusArchitectureRefFileCentralizedData Warehouse(Enterprise Data Warehouse)Hub & SpokeArchitecture(Corporate Information Factory)FederatedData Stores

Metadata RepositoryAre master data files metadata?Need for central repositoryInside MetadataRepositoryMetadataRepositoryMaster DataOutside MetadataRepositoryMetadataRepositoryMaster DataMetaData

Master Data Architecture – Repository Comprehensive, single source for Master Data Centrally managed, highly effective data governance Fixed data model – typically Customer & Product Extensive and sophisticated data integration process All applications need to change to conform to Hubs Expensive to modify to new business requirements Hard to adapt to rapid CESLegacyApplicationNewApplication

Master Data Architecture – Registry Linkages to other data stores with transform logic Low Cost, small footprint, only unique key kept in DB Minimal disruption of current applications Limited Data Governance, no merged Master Data No resolution of semantic disintegrity No History, System of Record linked to Data Source Query performance - data availability from tionDataSourceNewApplication

Master Data Architecture – Hybrid Relatively low cost solution Extensible Data Model based on industry templates “Best we have” Master Data Model Generalized governance & maintenance facilities ETL to load HUB may be complex Data redundancy may lead to update conflicts and datalatency icationNewApplication

Technology Adoption

Product Life Cycle1. Market introduction Costs are high Slow sales volumes to start Demand has to be created Customers have to be prompted to try theproduct Makes no money at this stage2. Growth Costs reduced due to economies of scale Sales volume increases significantly Public awareness increases Profitability begins to rise

Market StrategyOtherSubject AreaMDMProof ofConceptMore of sameSubject Area

Master Data HarmonizationIndustryModelGovernance CouncilEnterpriseModelData ArchitectLogicalData ModelData ModelerDB AdministratorPhysicalData ModelPhysicalData MethodReverseEngineering

By any other name.AnalyticalMaster Data

Subject Area IntegrationPrimaryData ElementsPRODUCT-IDProduct IdentificationPOWER ASSISTED DATA RATIONALIZATIONProd-IDChar 9PROD IDPacked 9Install baseChar 9Prod NumPacked 5UniqueData ElementsSource DataStructuresFieldOccurrences

Data Modeling vs Data ProfilingData Modeling documents metadata or columnsData Profiling analyzes instance data or rows

Traditional Meta-driven approachDesign does not alwaysmatch Reality!Data Architects

Data DiscoveryData ProfilerData Modeler Discover primaryforeign keys Import inferredstructures from DataProfiler Identify orphanrows Visualize information inan intuitive data model Find overlappingcolumns Integrate with othermodeling efforts Profile data

Data Profiling – Match, Merge and CleanseMaster Data Quality AnalysisIdentify critical data elementsReview the cardinality, range, mode, and null rulesReview the value, pattern & length frequenciesReview parent-child relationshipsIdentify orphaned rows and valuesCross-System AnalysisIdentify reference data overlap between data sourcesIdentify data content discrepancies between datasourcesValidate mappings to MDM targetMaster Data Error TypesSynonyms (False negatives) - two representations ofthe same entity are incorrectly defined as twoseparate entities

Appearing next month Why re-invent the wheel? thatapply to well over 50 percent ofmost data model constructs andthat can be reused. It is this 50percent that I address in thispresentation.

Universal ModelsThe Data Model Resource Book: Len Silverston

Architecture Models Process basedHierarchy mode Topical or subjectbasedNetworkmodel1:MM:M

Data Governance‘The act or process of directing,leading and assuring thatinformation is managedeffectively as an enterpriseresource, including resolvinginformation conflicts, across theenterprise.’Larry English

Data Governance – Building Trust in DataPeopleData StewardsData ArchitectsData ModelersProcessData Security / AccessData Quality AssuranceMaster Data ManagementMetadata ManagementEnterprise Model Management

Data Quality AssuranceData StewardshipData Profiling/CleansingSystem of Record – (Definition)– Data ModelVersioningChange ManagementHealth and WelfareSystem of Reference – (Lifecycle)– WorkflowSynchronization and IntegrationBusiness Rules and StandardsExternal Referencing

You can’t managewhat you don’tmeasure!

Scorecard – TDWI.org

Data Stewardship Difficult to assign,not process based Cross-functional Not a Single ility,Authority,ResponsibilityStructure vs. ContentColumnsRowsCells

Master Data StewardshipDifficult to assign, not processbasedInfrastructure — cross-functionalBusiness stewards - ContentDescriptionsTechnical stewards - StructureCodes and structure

Data Management Body of Knowledge 2008 DAMAInternational

ReferencesManaging Reference Data in Enterprise Databases:Binding Corporate Data to the Wider WorldMalcolm Chisholm - Morgan KaufmannThe Data Model Resource Book: A Library of UniversalData Models for All Enterprises (VOL. 1,2)Len Silverston - John Wiley & SonsThe Data Model Resource Book: Universal Patterns forData ModelingLen Silverston - John Wiley & SonsMaster Data Management & Customer Data Integrationfor a Global EnterpriseAlex Berson & Larry Dubov - McGraw Hill

Donald J. SoulsbyDSoulsby@MetaWright.comn

The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (VOL. 1,2) Len Silverston - John Wiley & Sons The Data Model Resource Book: Universal Patterns for Data Modeling Len Silverston - John Wiley & Sons Master Data Management & Customer Data Integration for a Global Enterprise Alex Berson & Larry Dubov - McGraw Hill