Transcription
Creating the Golden RecordBetter Data through ChemistryDonald J. SoulsbymetaWright.com
Agenda The Golden Record Master Data DiscoveryIntegrationQuality Master Data Strategy
DAMA – LinkedIn GroupC. Lwanga Yonke- Information Quality Practitioner.Spewak advocated using data dependencyto determine the ideal sequence in whichapplications should be developed andimplemented: “Develop the applications thatcreate data before those that need to usethat data” (p.10).
Architecture AdvocatesWilliam Smith– Entity LifecycleClive Finkelstein- Information Engineering – CRUDRon Ross- Resource Life Cycle Analysis
CRUD in a Perfect World
Canonical SynthesisBroadly speaking, materials scientists investigate two typesof phenomena. Both are based on the microstructures ofmaterials: ii. How do these microstructures influence the propertiesof the material (such as strength, electrical conductivity, orhigh frequency electromagnetic absorption)?http://www.its.caltech.edu/ matsci/WhatIs2.html
Business VS Development Life CyclesZachman Framework for Enterprise ArchitectureYWHENWHOWHEERWHWHOATWHList of thingsList ofProcessesList ofLocationsList ofOrganizationUnitsList ofEventsList ofBusinessGoals/Stat.BusinessEntity ModelBusinessProcessModelLogisticsNetworkWork FlowModelMasterScheduleBusinessPlanLOGICALLogical ule ModelPHYSICALPhysicalData elPresentationArchitectureControlStructureRule CONTEXTUALScopeCONCEPTUALBusiness ModelSystem ModelTechnology ModelOUT-OFCONTEXTComponentsPRODUCTFunctioning System
Tibetan Proverb“ If upstream is dirty,downstream will be muddy”
Data Warehousing - ETL
Transmutation
The SecretBernard Trevisan, a 15th centuryalchemist, spentmuch of his life andsa sizable fortune in search of thesecret of turning base metals intogold, realized while dying was “To make gold, one muststart with gold.”
DAMA Guide to the DataManagement Body of KnowledgeReference and MasterData Management:Managing goldenversions and replicas
Data Management Body of Knowledge 2008 DAMA International
DAMA DictionaryReference DataAny data used to categorize other data, or forrelating data to information beyond theboundaries of the enterprise. See master data.Master DataSynonymous with reference data. The data thatprovides the context for transaction data. Itincludes the details (definitions and identifiers)of internal and external objects involved inbusiness transactions. Includes data aboutcustomers, products, employees, vendors, andcontrolled domains (code values).
DAMA DictionaryMaster Data Management (MDM)Processes that ensure that reference data is keptup to date and coordinated across anenterprise. The organization, management anddistribution of corporately adjudicated datawith widespread use in the organization.Reference & Master Data ManagementEnsuring consistency with a “golden version” ofdata values. One of nine data managementfunctions identified in the DAMA-DMBOKFunctional Framework.
Top 13 MDM buzzwords1. Metadata (Don’t say metadata)2.3.4.5.6.7.8.9.10.11.12.13.Product information management (PIM)Enterprise master patient index (EMPI)Data governanceCustomer data integration (CDI)MDM hubMDM architecture"Collaborative" vs. "analytical" MDMMDM return on investment (ROI)MDM stakeholdersEnterprise hierarchy managementMDM metricsData competency center02 Jul 2009 Justin Aucoin, Assistant Site Editor, searchdatamanagement.com
The QuestBloor Research: the discovery of relationshipsbetween data elements, regardless of where thedata is storedDiscoverydDIQMDMQualityDAMA: The degree to which datais accurate, complete, timely,consistent with all requirementsand business rules, and relevantfor a given use.IntegrationDAMA: The planned andcontrolled transformation andflow of data across databases,for operational and/or analyticaluse.
Atomic Data of the EnterpriseProvide and maintain a consistent view of anenterprise's core business information assetsMendeleevALCHEMYVS.CHEMISTRY
What is Master Data?Point In TimeCurrency ProductCustomerCountryTransactionA Transaction is relationship ofMaster Data Reference Files time stamp volumes secondary descriptors
Master Data - ossReferenceDataReferenceDataISO odeOperational AnalyticalUniversalLocalUnified
Master Data TypesOperational Master DataDefinition, creation, and synchronization of MasterData required for transactional systems anddelivered via service-oriented architecture (SOA).Analytical Master DataDefinition, creation, and integration, includingmultiple historical versions, of Master Datarequired for Enterprise Reporting or DataWarehousing applications.Unified Master DataMaster Data that is defined and crated to apply toboth Operational and Analytical applications.Implementation of the “single view of the truth”.It requires achieving agreement on a complextopic among a group of people
Master Data IntegrationUniversalMaster Data that is created and maintained by an external organization.For reference data it is often developed by a standards organizationsuch as the ISO. Master Data Models are often available to membersof an industry trade or professional associationEnterpriseMaster Data that represents the common business information assetsthat need to be agreed on and shared throughout the enterpriseCommunityMaster Data that is shared between two or more applications. Typicallydata is replicated from the application that contains the system ofrecord for the Master Data, by federation (key linkage) orpropagation (materialized views).LocalMaster or Reference Data that is created and maintained for a singleoperational application. Much of the Reference Data will remain local,such as Status Codes, for the transactions within the application.
Master Data - cessPeriodPerson
Atomic EPARTYCDI
Master Data ssActivityServiceGeo-PoliticalSeasonalityGo To Market
MDM – Integration TechniquesData Propagation (Today)Copies Master Data from one location toanother (materialized view).Data ConsolidationCaptures data from multiple data sources andintegrates into a single Master Data hub.Data FederationProvides a single virtual Master Data view ofone or more data sources. No data is storedin the Master Data hub.
VS
Architecture choicesIndependentData MartData Mart BusArchitectureRefFileCentralizedData Warehouse(Enterprise Data Warehouse)Hub & SpokeArchitecture(Corporate Information Factory)FederatedData Stores
Metadata RepositoryAre master data files metadata?Need for central repositoryInside MetadataRepositoryMetadataRepositoryMaster DataOutside MetadataRepositoryMetadataRepositoryMaster DataMetaData
Master Data Architecture – Repository Comprehensive, single source for Master Data Centrally managed, highly effective data governance Fixed data model – typically Customer & Product Extensive and sophisticated data integration process All applications need to change to conform to Hubs Expensive to modify to new business requirements Hard to adapt to rapid CESLegacyApplicationNewApplication
Master Data Architecture – Registry Linkages to other data stores with transform logic Low Cost, small footprint, only unique key kept in DB Minimal disruption of current applications Limited Data Governance, no merged Master Data No resolution of semantic disintegrity No History, System of Record linked to Data Source Query performance - data availability from tionDataSourceNewApplication
Master Data Architecture – Hybrid Relatively low cost solution Extensible Data Model based on industry templates “Best we have” Master Data Model Generalized governance & maintenance facilities ETL to load HUB may be complex Data redundancy may lead to update conflicts and datalatency icationNewApplication
Technology Adoption
Product Life Cycle1. Market introduction Costs are high Slow sales volumes to start Demand has to be created Customers have to be prompted to try theproduct Makes no money at this stage2. Growth Costs reduced due to economies of scale Sales volume increases significantly Public awareness increases Profitability begins to rise
Market StrategyOtherSubject AreaMDMProof ofConceptMore of sameSubject Area
Master Data HarmonizationIndustryModelGovernance CouncilEnterpriseModelData ArchitectLogicalData ModelData ModelerDB AdministratorPhysicalData ModelPhysicalData MethodReverseEngineering
By any other name.AnalyticalMaster Data
Subject Area IntegrationPrimaryData ElementsPRODUCT-IDProduct IdentificationPOWER ASSISTED DATA RATIONALIZATIONProd-IDChar 9PROD IDPacked 9Install baseChar 9Prod NumPacked 5UniqueData ElementsSource DataStructuresFieldOccurrences
Data Modeling vs Data ProfilingData Modeling documents metadata or columnsData Profiling analyzes instance data or rows
Traditional Meta-driven approachDesign does not alwaysmatch Reality!Data Architects
Data DiscoveryData ProfilerData Modeler Discover primaryforeign keys Import inferredstructures from DataProfiler Identify orphanrows Visualize information inan intuitive data model Find overlappingcolumns Integrate with othermodeling efforts Profile data
Data Profiling – Match, Merge and CleanseMaster Data Quality AnalysisIdentify critical data elementsReview the cardinality, range, mode, and null rulesReview the value, pattern & length frequenciesReview parent-child relationshipsIdentify orphaned rows and valuesCross-System AnalysisIdentify reference data overlap between data sourcesIdentify data content discrepancies between datasourcesValidate mappings to MDM targetMaster Data Error TypesSynonyms (False negatives) - two representations ofthe same entity are incorrectly defined as twoseparate entities
Appearing next month Why re-invent the wheel? thatapply to well over 50 percent ofmost data model constructs andthat can be reused. It is this 50percent that I address in thispresentation.
Universal ModelsThe Data Model Resource Book: Len Silverston
Architecture Models Process basedHierarchy mode Topical or subjectbasedNetworkmodel1:MM:M
Data Governance‘The act or process of directing,leading and assuring thatinformation is managedeffectively as an enterpriseresource, including resolvinginformation conflicts, across theenterprise.’Larry English
Data Governance – Building Trust in DataPeopleData StewardsData ArchitectsData ModelersProcessData Security / AccessData Quality AssuranceMaster Data ManagementMetadata ManagementEnterprise Model Management
Data Quality AssuranceData StewardshipData Profiling/CleansingSystem of Record – (Definition)– Data ModelVersioningChange ManagementHealth and WelfareSystem of Reference – (Lifecycle)– WorkflowSynchronization and IntegrationBusiness Rules and StandardsExternal Referencing
You can’t managewhat you don’tmeasure!
Scorecard – TDWI.org
Data Stewardship Difficult to assign,not process based Cross-functional Not a Single ility,Authority,ResponsibilityStructure vs. ContentColumnsRowsCells
Master Data StewardshipDifficult to assign, not processbasedInfrastructure — cross-functionalBusiness stewards - ContentDescriptionsTechnical stewards - StructureCodes and structure
Data Management Body of Knowledge 2008 DAMAInternational
ReferencesManaging Reference Data in Enterprise Databases:Binding Corporate Data to the Wider WorldMalcolm Chisholm - Morgan KaufmannThe Data Model Resource Book: A Library of UniversalData Models for All Enterprises (VOL. 1,2)Len Silverston - John Wiley & SonsThe Data Model Resource Book: Universal Patterns forData ModelingLen Silverston - John Wiley & SonsMaster Data Management & Customer Data Integrationfor a Global EnterpriseAlex Berson & Larry Dubov - McGraw Hill
Donald J. SoulsbyDSoulsby@MetaWright.comn
The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (VOL. 1,2) Len Silverston - John Wiley & Sons The Data Model Resource Book: Universal Patterns for Data Modeling Len Silverston - John Wiley & Sons Master Data Management & Customer Data Integration for a Global Enterprise Alex Berson & Larry Dubov - McGraw Hill