White Paper Information Governance With TopBraid EDG

Transcription

Maturing Information Governance with TopBraid EDGWhite Paper 1MaturingInformationGovernance withTopBraid EDGINFORMATION MANAGEMENT ANDGOVERNANCE IN THE ENTERPRISE CONTEXTData and, in a more general sense, information is the bloodlineof a modern enterprise. It is used in every decision, it is producedby every activity, and it flows across all systems. Maturity ofinformation management is, therefore, essential to success,profitability and even survivability of every enterprise.TopQuadrant’scomprehensive datagovernance solution,TopBraid EDG, withits semantic standardsbased foundation, canmanage the entire rangeof enterprise informationassets and the crossconnections between them.In recent years, organizations large and small have come torealize that maturing enterprise information management (EIM)requires establishing and maturing information governance.Information governance “is the oversight and management”discipline leveraging people, process, and technology to ensureinformation assets are suitable for use to run the business. Asmodern enterprises depend on technology for all day to dayoperations, their technology and data landscape has grownincreasingly rich and complex. Most organizations can’tgovern this landscape without sophisticated tools. Simply put,the complexity of today’s information landscape makes thetechnology part of information governance increasingly critical.

Maturing Information Governance with TopBraid EDGJust like the words data and information sometimes,depending on the context, are used interchangeably, theterms data governance and information governance mayalso be used interchangeably by some practitioners. Otherpractitioners may want to be clear on the difference between them. In short, information governance encompassesdata governance as one of its key components.structured and unstructured information available to anorganization. Information governance brings into the picture the lifecycle and business context of the information.This context includes regulatory, legal, risk, environmental,and operational requirements. TopBraid EDG supportsboth types of governance in an integrated way — the moretactical and detail oriented data governance together withthe more strategic, business policy and context orientedinformation governance.Data governance is about creating and using policies formaximizing availability, integrity, security, and usability ofAs the first semantic technology based information governance and EIM platform, TopBraid EDG provides flexibletooling to handle the complexity of information assets. Itenables staff to begin in one area, shift to another, fill gaps inanother, add relationships between them, and it can be doneincrementally. It can bring information together throughdirect data entry, by importing from a set of common dataformats, or through a standard API.As a result, continued management of data and metadatawithin siloes, integration of the business across informationsiloes, and the ability to automate on top of the metadatadirectly — are all supported.Figure 1, (see below), provides a view of EIM capabilitiesand components. The legend provides some examples ofcomponents with further descriptions, along with a briefindication of the support that TopBraid EDG provides for therespective component or capability.Enterprise Information Management (EIM)1Enterprise Information GovernanceInformationQualityReference DataManagementAnalysisModelingandDesignMaster DataManagement8Information Architecture3Context Management4Information Security5Semantics and Metadata ManagementOperationaland PartnerDataManagement76210918Documents,Content andRecordsManagement1117InformationSystemsBig DataManagement12Integration andInteroperabilityData Storage and OperationsFigure 1. Enterprise Information Management Components and CapabilitiesInformation ServicesStewardship16 21314BusinessIntelligenceand DataWarehousing15

Maturing Information Governance with TopBraid EDG 3LEGEND FOR FIGURE 1: ENTERPRISE INFORMATION MANAGEMENT (EIM)1. E nterprise Information Governance —Guidance,Oversight, Policy Decisions.In TopBraid EDG, “governance asset collection” supports capturingpolicy decisions, documenting governance processes, guidelines,metrics and best practices.2. Information Architecture — Data Organization.TopBraid EDG can help users organize all their model and data assets.3. C ontext Management — Business Friendly Views and Viewpoints.Manages the separation of information architecture layers thatdeliver business perspectives from the data architecture physicallystoring data. TopBraid EDG handles enterprise-designed, information layers, as customized collections, that provide business contextimplemented by architects to organize data, information, and servicelayers for business use. This describes the ”Information Layer,” that is,the business facing views of data warehouses, federated distributeddata, business services and their data exchange formats, and thebusiness centric meta-models from analytics tool vendors. Semanticmappings of meaning and preferred terms make the informationconsistent across these and other layers. Data Lineage mappingrelationships demonstrate how the information was developed frommany data sources.4. Information Security —Data Classification, Data Access and Protection.TopBraid EDG provides a rich set of pre-built properties to capturecompliance information including compliance with the informationsecurity, and privacy guidelines and regulations all to the finest security grain at the atomic “field” level.5. Semantics and Metadata Management — Ontologies, Concepts, Preferred Terms, Descriptions, Definitions, Mappings.TopBraid EDG uses standard ontology models to define all metadataand other aspects of its functional behavior, views, etc. These modelsare user configurable -- they can be enriched or modified as needed.6. Analysis Modeling and Design —Business Models, Conceptual Models, Data Models, Data ExchangeFormat Models, Data Specifications.One of the strongest features of TopBraid EDG is that it can captureand connect any type of models, specifications and formats — bethey about data or about the broader enterprise landscape such asprocess models.7.Reference Data Management — Permissible Values.RDM is a focus of one of EDG’s modular packages. Data elementsgoverned within data asset collections refer to reference data aspermissible values.8. Master Data Management — Identity, Standardization.Depending on the tools you already have, TopBraid EDG can storeand govern master data or it can integrate with dedicated masterdata management systems.9. Information Quality — Profiles, Measurement, Metrics.TopBraid EDG supports information quality initiatives by providing capabilities for defining business rules and quality metrics,integrating with the data profiling tools and storing outputs of datascanning and profiling.10. Operational and Partner Data Management —Formats, Mappings, Contracts.TopBraid EDG can capture contracts with partners and mappingsbetween metadata and controlled values used internally to thoseused by partners and vendors. Once this information is captured,Topbraid EDG can be used to perform the required translations.11. Document, Content and Records Management —Document Metadata, Taxonomies, Tags, Indexes.TopBraid EDG can be used to govern structured and unstructureddata. It includes support for taxonomies, content corpora andcontent tags. Automated tagging is available through the TopBraidTagger and Autoclassifier module.12. I nformation Systems — Business Systems, IT Systems,Software Platforms, Software Components.Information about these assets is captured using TopBraid EDGtechnical asset collections.13. I ntegration and Interoperability —Data Exchange, Data Flows, Translations and Transformations.The Data Lineage asset collection of TopBraid EDG captures information about data flows and transformations.14. B ig Data Assets — High Volume, High Velocity, Platforms.In TopBraid EDG, big data asset collections are used to governinformation not only about the datasets in data lakes, but alsothe overall big data infrastructure such as nodes, controllers andjobs. EDG can also be used to support movement of data from theoperational data stores into data lakes, for example, by automatically generating AVRO schemas and capturing what data can bearchived and when this can happen.15. B usiness Intelligence and Data Warehousing —Traditional BI Platforms, Reports, Analytics.TopBraid EDG makes analytics more powerful by governing BI hierarchies and assisting with data aggregation and data flows acrossall data sources.16. Stewardship — Asset Level Responsibilities.The RACI matrix capability in TopBraid EDG captures responsibilities at the desired level of granularity - either an entire asset collection or an individual asset.17. Information Services —Data Services, Business Services, Metadata Services.TopBraid EDG offers many pre-built services. It also offers powerfultools to make it easy for users to configure additional services.18. D ata Storage and Operations — the Data itself, its Data Structure,Data Architecture, Operational Metadata, Measurements, Metrics,Jobs, DevOps.TopBraid EDG is able to support governance of all data sources andtechnology components

Maturing Information Governance with TopBraid EDGAs an application, TopBraid EDG has implemented keyuse cases for information governance and EIM. Its userinterface is consistent across all assets and becomes morepowerful as a user comes to know it. It is a power tool for anenterprise to know, model, manage, and “implement itself.”(See Sidebar I, below.)Because TopBraid EDG uses layered models for each typeof asset with rich metadata based on standard ontologies,users can ask questions directly against the model,against the data instances, and/or against the model plusinstances by using EDG’s powerful search and visualizationcapabilities, or by using the semantic query standard calledSPARQL. Further, advanced stewards can easily extendTopBraid EDG to support their unique requirements byworking with the integrated development environment (IDE)provided with it — TopBraid Composer Maestro Edition.SIDEBAR I: 4Users can start small and rapidly grow into more advancedcapabilities (see Sidebar I II, page 7). Enterprise prioritieschange often and staff need to adapt quickly. TopBraid EDGmakes it easy to take a bite at a time, shift direction, stop fora while, add some automation, and not lose a thing. It keepsfull history so you can easily follow through later. Whileauditors will love this, it is the practitioners themselves whowill use it regularly as they identify quality issues, fill gaps,fix mistakes, measure effectiveness, and automate upon it.In the rest of this white paper we will focus on a specificsubset of information — reference data, and explore howTopBraid EDG can help enterprises to mature RDM.Topbraid EDG — An Agile Data Governance SolutionActionable understanding of enterprise informationrequires connecting business, technical andoperational metadata into a single data landscape.Many data governance solutions, based onproprietary approaches, are limited and can’teffectively meet this goal. Instead, they may simplycreate more data silos.TopBraid Enterprise Data Governance (TopBraid EDG) is a new type of agile data governancesolution. It uses a non-proprietary, graph standardsbased, model-driven approach to capturing andpreserving the meaning of data — something we callsemantic information management.*In contrast to traditional approaches, the semanticapproach can capture both business and technicalmetadata. It is flexible and can adapt to changes in thedata, metadata, business needs and the organizationitself — because connections are important!* See also our whitepaper: How Can Semantic Information Management Help to Preserve Meaningin a Dynamic Data Environment?, available here: topquadrant.com/resources/whitepapers

Maturing Information Governance with TopBraid EDG 5THE IMPORTANCE OF REFERENCE DATA AND ITS EFFECTIVE MANAGEMENTReference data is found in every application used by anenterprise including back-end systems, front-end commerceapplications, data exchange formats, and in outsourced,hosted systems, big data platforms, and data warehouses.It can easily be 20 to 50% of the tables in a data store.And the values are used throughout the transactionaland mastered data sets to make the system internallyconsistent. How well it is managed has a major impact onevery aspect of an organization’s use of data — from theintegrity of its business intelligence reports, to the successor failure of its system integration efforts.In the whitepaper The Foundations of Successful ReferenceData Management, Malcolm Chisholm discussed thechallenges associated with implementing a reference datamanagement solution, and the essential components of anyvision for the governance and management of referencedata. Malcolm addressed the following topics: Whatis reference data? Why is reference data managementimportant? What are the challenges of reference datamanagement? What are some best practices for thegovernance and management of reference data? Whatcapabilities should you look for in a reference data solution?Reference Data Management (RDM) is an integral partof overall Enterprise Information Management (EIM).Managing reference data well requires aligning itsgovernance with other key functions of EIM. If done well,this has a high payback. Done poorly, it has a high cost.Both of these and other whitepapers are availablefor download from TopQuadrant’s website:In the whitepaper Maturing Reference Data Management,we describe a Reference Data Management road map usinga maturity model, with the five levels shown in Figure 2, (seebelow), as one likely path an enterprise can take to matureRDM, and the benefits it will gain as a result.topquadrant.com/resources/whitepapersIn this whitepaper, we further discuss connections betweenRDM and EIM and then describe how TopBraid EDG canhelp enterprises move to the higher levels of RDM maturityand, more generally, support best practices for informationmanagement.We have also mapped the best practices and keycapabilities for RDM described in The Foundations ofSuccessful Reference Data Management to the RDMmaturity levels described in the Maturing Reference DataManagement whitepaper (see Table 1, page 3).Direction andfunding forenterpriseRDM0 Unaware1 AwareNo enterprisereference datamanagementFigure 2. Maturity levels and high level maturation steps2 Reactive3 Proactive4 ManagedEnterprise RDMis established5 Optimizing

Maturing Information Governance with TopBraid EDG 6HOW TOPBRAID EDG HELPS WITH REALIZING RDM MATURITYRDM as Part of EIM Requires Flexible,Comprehensive SupportAs an enterprise evolves its RDM, its Enterprise InformationManagement (EIM) processes coevolve. Business and dataanalyses move closer together by seeking common terminology and meaning through semantic analysis and mapping. Data modeling shifts up to standards-based, layeredmodels to enable connected abstract conceptual models,Figure 3. High level EIM assets and roles.through logical models, and on into physical models of dataimplementation, data storage and movement — with a clearconnection to conceptual business meanings. RDM evolvesinto governance of controlled business terminologies andlists of meanings within a conceptual reference domain.Enterprise Standard Code sets become a hub of meaningsthat map to physical implementations of systems’ code sets.

Maturing Information Governance with TopBraid EDGThis enables business semantic integrity (no loss orconfusion of meanings) between people and systems orsystems to systems. It allows for more efficient and quickerintegrations between systems. It increases businessagility by making integration with trading partners fasterand easier. By optimizing enterprise’s management ofits assets, it enables enterprise to effectively executeits business models.While having a stand-alone focused RDM solution is betterthan having no solution, moving to the higher maturity levelsrequires the leverage provided by flexible solutions that canmanage inventories of any asset type, and the business andtechnical relationships between them. Such solutions musthave a view into a full enterprise scope of business assetsSIDEBAR II: 7and technical assets. Figure 3, (page 6), provides a veryhigh level information-centric development lifecycle view ofthe assets and roles commonly used to implement information systems. These assets have traditionally been capturedin a variety of different metadata and configuration management tools.But at an enterprise level, EIM requires tooling that can support an interconnected inventory of all enterprise capabilities and information assets (e.g., code sets in the case ofRDM), their knowledge capture, access and contributionto them by the broad range of stakeholders, and the abilityto use automation to make information-intense processespractical and inexpensive.Governance Packages Available in TopBraid EDGIn ramping up a data governance program, differentorganizations may have differing priorities and startingpoints. With TopBraid EDG, you can start incrementally. Forexample, your first focus may be on governing just businessglossaries, just reference data, or on metadata management.No matter where you start using EDG, you can alwaysextend your scope to governing other assets when you areready to do so.To support this comprehensive but staged approach,TopBraid EDG provides the following packages, any of whichare available as an initial configuration of EDG. Each packagecan also be used in combination with the other packagestoward your targeted scope of data governance. ks/ fordetails on available EDG packages and additional modules.Vocabulary ManagementMetadata ManagementReference Data ManagementBusiness GlossariesCreate, connect, and usetaxonomies and ontologies forimproving search, clarifyingenterprise terminology andenriching unstructured data.Govern technical metadata aboutdata assets including databases,datasets, logical, and physicaldata models. Combine withBusiness Glossaries to map keydata elements to business terms.Profile, govern, update, andprovision reference datasets.Use comprehensive metadatato document the meaning ofreference data.Define and connect glossaryterms or combine withmetadata managementand establish connectionsbetween terms andtechnical metadata.TopBraid EDG organizes catalogs by type of asset withcomplete, built-in metadata to create complete businessdefinition and descriptions, unlimited annotations, semanticmapping between business metadata and technical metadata, the configuration relationships between technicalassets, data lineage, and the ability to extend it as needed.Practitioner-role silos disappear as they collaborate onthe information describing the assets and mutual management of the quality of the data itself. The completeness,validity, timeliness/currency, and accuracy ensures thebusiness can trust the information assets to act andautomate processes on.

Maturing Information Governance with TopBraid EDG 8RDM CAPABILITIES IN TOPBRAID EDGIn support of RDM, EDG can model simple to complex codesets including any format (e.g., JSON, XML, csv, tsv, ttl)of reference data coming from external standards bodies,the wide variety of vendor applications, and enterprisedeveloped code sets.Code Sets significantly improves reuse, quality, and time tomarket. Once done, it can be leveraged repeatedly. TopBraidEDG provides pre-built services to translate data codedusing one reference dataset into data coded using anotherreference dataset.In TopBraid EDG, the code sets are standard, controlledvocabularies within various contexts such as the“enterprise,” “industry,” “system,” “capability,” “ad hoc,” andmany other possible contexts. EDG lets users manage thesecode sets within and across contexts. Establishing semanticrelationships from business terms and/or ontologies toother objects ensures that they are well understood in theiruse and that they move between system boundaries withoutloss of meaning (e.g., to maintain semantic integrity).Integrating metadata management with RDM, referencedatasets in TopBraid EDG can be used to specify“permissible values” (with the associated “permissiblemeanings”) possible for certain data elements. Such valuesare often called “lookups,” “enumerations,” “enumeratedvalues,” “list-of-values,” “data element constraints”; butin use, they are a “code set” that can be well managedenabling data design, data integration, reporting andanalytics. These are fundamental information assets. Whenan enterprise manages these well and views them as acomponent to be understood, modeled, mapped, moved,translated/transformed, and validated against, all EIMactivities become easier. IT activities become easier. Thebusiness becomes more agile.The import capability allows you to map an extract of dataagainst a model for import. In fact, the import can runagainst different extracts to bring in different data elements,from different sources, as needed, to build (or master)a complete “golden” record of the asset. Because theunderlying models are flexible, standards-based ontologies,they can capture relationships between modeled conceptsand their data instance values — bridging across differentcatalogs and data stores, including the custom catalogsenterprises may design.When information flows across systems, the semanticmapping to each of the system code sets and to thedata exchange formats, needs to be based on commonmeaning and reflect the transformation logic and theresultant change in meaning as it moves through datalineage pathways. A large percentage of such translationsand transformations occur on reference data based on therules defined for using the reference data. They participateheavily in integration logic. One of the highest costs andmost fragile parts of enterprise systems — integration —requires semantic analysis, consistency, and understandingof each code set across system boundaries.While enterprises want to minimize, over time, the numberof individual reference datasets, achieving this goal willrequire analysis and harmonization. The collaborativesemantic analysis, modeling, and mapping done duringharmonization and the mastering into Enterprise StandardAs the quality of reference data gets better and its use moreconsistent, data practitioners and other stakeholders can takeadvantage of the enterprise information services such as: metadata driven integration impact analysis/dependency analysis information lookup services information directories system data dictionariesTopBraid EDG provides many such services “out of thebox” and makes it very easy to create additional services.Like a hologram, as the pixels are rendered, and becomedenser, the picture becomes clearer and understanding,transparency, and the ability to leverage the assets for theenterprise becomes possible.Table 1 (page 9), describes how RDM capabilities discussedin the “The Foundations of Successful Reference DataManagement” are supported by TopBraid EDG. It also mapsthe capabilities to the RDM maturity levels discussed in theMaturing Reference Data Management whitepaper.

Maturing Information Governance with TopBraid EDG 9TABLE 1.Malcolm Chisholm’s RDM capabilities and related best practices, RDM maturity and TopBraid EDG supportCapabilityDescriptionMatures in levelTopBraid EDG SupportAbility to create a profileof an external referencedata standard.External reference data is maintained by authorities outside theenterprise. It needs to be discovered,selected and understood before anenterprise decides to use it.3-ProactiveTopBraid EDG offers a flexibleapproach to cataloging assets. Withit, users can capture and describe anexternal reference data standard.Then, through analysis and discussion, decide whether to use it directly,map it to the reference data currentlyin use in the enterprise, and so on.This capability is also used to trackall interactions with the externalauthority and to assess its reliability.Ability to create a profileof a reference data setmaintained by anexternal authority.Once external reference data hasbeen set up, it needs to be keptcurrent. Capturing information suchas “update frequency” is key to beable to keep up to date with changesand new developments. Subscription management also ensures thatchanges are detected and assimilated as rapidly as possible.3-ProactiveEach reference dataset in TopBraidEDG has rich metadata. It includesinformation about onboarding anexternal reference dataset, its updatefrequency, relevant subscriptioninformation and more. If built-inmetadata fields provided areinsufficient, users can easily extendthe metadata with their own fields.Ability to performsemantic analysis of eachelement in the datasetand identify the businessconcepts that it maps to.Metadata is required to describethe dataset and each element in it.3-ProactiveWith TopBraid EDG, each elementin the dataset (column) exists as aproperty in an ontology. It can bemapped to glossary terms, technicalmetadata and other managed assets.Ability to properlydocument the semanticanalysis after it hasbeen performed.This may include facts about thereference dataset or individualcodes. Such facts help users of thereference data understand how tointerpret and use it.3-ProactiveEach element has description fields,connections to data and applicationrequirements and other metadata.These are specific metadata fields.The “fact” field can be used for moregeneral, “catch all” statements. Factsare supported at both the datasetand individual code level.Ability to import externalor internal reference datainto a central repository.Such import must include capabilities for extraction, filtering,transformation, and enrichment. Asmuch as possible, this shouldbe metadata driven.3-ProactiveTopBraid EDG offers rich importcapabilities that are highly flexible anduser configurable. Any transformationcan be included as part of an import.Establish and ManageEnterprise StandardCode setsBegins harmonization acrosssystems and standards withstrategic choice of what to workon based on contextual scope,usefulness, project priority, andother factors.3-ProactiveTopBraid EDG lets users identify somereference datasets as “EnterpriseStandard Code sets” for a given entitye.g., country. Users can then mapbetween the code set used by eachspecific system and “EnterpriseStandard Code sets” by creatingcrosswalks.Semantic analysis may involvemanaging decisions by appropriateSMEs and stakeholders about whatthese mapping decisions are, allof which generates even morereference data metadata thatneeds to be captured.

Maturing Information Governance with TopBraid EDG 10TABLE 1. (continued)CapabilityDescriptionMatures in levelTopBraid EDG SupportAbility to assignaccountabilities for allaspects of reference datamanagement per referencedataset, particularly forinternal reference datasets.Internal reference data is for businessconcepts that are completely specificto the enterprise. It requires a federatedapproach, because it is created andmanaged by many different subjectmatter experts (SMEs). The centralRDU must ensure that groupsaccountable for internal reference datause a standardized approach.4-ManagedTopBraid EDG lets you define a RACImatrix. RACI can be specified for anasset collection such as a referencedataset or, alternatively for an assetsuch as an individual code.Task assignments and configurable,targeted notifications supportcollaboration across the enterprise.This achieves the federated governancemodel needed for internal referencedata. Obviously, this capability requiresa rich set of metadata elements forreference data.Ability to track changesto reference dataFor example, if an external referencedataset changes or new values areadded into operational systems.4-Managed (in catalog)Profile ReferenceData Quality in UsageProfiles actual data where codevalues are used to categorize it tosee if new, incorrect, invalid, or othernon-permissible values are in actualuse. Errors result in Data Qualitytriage to disposition activities whichmay include a cycle of analysis andmanagement in RDM.5-OptimizingSince many applications storereference data locally, invalid useof reference data is typically dueto the local reference data beingout of date with the enterprisestandard. TopBraid EDG offerspre-built services for verifyinglocal reference data against thecode sets governed in EDG.Results of such periodicverifications are stored over timeletting users see any issues withthe quality of local reference dataand how these issues have beenresolved.Ability to distributereference data.Reference data is used widelythroughout the enterprise. It is vitalthat all applications have synchronizedcopies, so distribution must beaddressed. This requires a varietyof approaches ranging from thefully automated to the fully manual.However, these approaches mustbe chosen carefully to maintainoperational efficiency.A variety of distribution mechanismssuch as exports, web services and ESBintegration should be provided.5-OptimizingMost capabilities of TopBraid EDGthat are available through its userinterfaces are also available asRESTful web services. This includesdata exports. Reference data canbe provisioned to applications asa service that exports an entirereference dataset, subset of it orinformation about a single code.Services can be scheduled oraccessed on demand. There is alsoa pre-built integration with EnterpriseService Buses (ESB).5-Optimizing (in data use)All changes are audit trailed inTopBraid EDG. Users can seehistory of changes and, if desired,revert them.

Maturing Information Governance with TopBraid EDGSIDEBAR III: 11A Practioner’s View — From David ChasteenAs an enterprise practitioner, not a

In the rest of this white paper we will focus on a specific subset of information — reference data, and explore how TopBraid EDG can help enterprises to mature RDM. SIDEBAR I: Topbraid EDG — An Agile Data Governance Solution Actionable understanding of enterprise information requires connecting business, technical and