White Paper Of Successful Reference Data Management

Transcription

Successful Reference Data ManagementWhite Paper 1The Foundationsof SuccessfulReference DataManagementINTRODUCTIONReference data isa special type of data.It is essentially codeswhose basic job is toturn other data intomeaningful businessinformation and toprovide an informationalcontext for the widerworld in which theenterprise functions.Data management is becoming more and more central tothe business model of enterprises. The time when data waslooked at as little more than the by-product of automationis long gone, and today we see enterprises vigorouslyengaged in trying to unlock maximum value from their data,even to the extent of directly monetizing it. Yet, many ofthese efforts are hampered by immature data governanceand management practices stemming from a legacy thatdid not pay much attention to data. Part of this problem is afailure to understand that there are different types of data,and each type of data has its own special characteristics,challenges and concerns.

Successful Reference Data ManagementReference Data Management OverviewReference data is a special type of data. It is essentiallycodes whose basic job is to turn other data intomeaningful business information and to provide aninformational context for the wider world in which theenterprise functions.Reference data is also the most widely shared classof data in an enterprise; applications as different asHuman Resources and Trade Settlement will needthe same state table, postal code table, and currencytable. Yet, while reference data is very important formodern enterprises, it is rarely managed well — whichhas significant associated costs (see The Costs of PoorReference Data Management).A major reason is lack of clarity about the specificgovernance and management needs involved. Afurther reason is that until recently there have been nodedicated tools to help enterprises deal with the large 2number of specialized tasks and wide scope involvedin reference data management. Enterprises have beenleft to themselves to cope as best they can, usinggeneralized products such as Excel.As we shall see in this paper, the challenges thatneed to be addressed to enable effective referencedata management cannot be solved with such limitedtechnologies. Equally important are the organizationalresponse to the reference data challenge and the needfor an effective methodology.We will explore all of these themes, focusing on themost important areas of reference data managementthat an enterprise must address. We shall also identifythe problems that can arise if reference data is notmanaged well. The overall objective is to outlinethe capabilities that are required to achieve modernreference data management and to provide theenterprise with a foundation to mature its practices.The Costs of Poor Reference Data ManagementThe ultimate cost of reference data managementproblems for a business varies widely. In financialservices, and elsewhere, a great deal of back officestaff is dedicated to correcting problems that havetheir origin in mismanagement of reference data. Inalmost every enterprise, analytics are hampered bymisunderstandings about reference data that lead tounsafe results that cannot be used for decision support.Enterprises that rely on large-scale data entry, suchas the healthcare industry, find an abundance of dataquality problems due to “miscodings” of referencedata. So, while each individual reference data issue mayseem insignificant, in the aggregate they are very costly.But it is not just errors in reference data that canbe costly. Immature management practices can beincredibly inefficient. Consider a country code table in amedium-sized enterprise: such an enterprise may have100 different applications, each with a country codetable. Suppose it takes one hour every three monthsfor each team that uses each application to check ifthe country code table is up to date. For the entireenterprise, this adds up to 400 hours per annum.Now suppose there is an average of 20 reference datatables per application, and the average time to checkeach one is the same as for the country code table.The enterprise will spend some 8,000 hours per yearchecking whether its reference data tables are fullyupdated. This is the equivalent of roughly four fulltime staff. Additionally, checking that a table is up todate is only one task out of many in reference datamanagement.Given that there are not enough resources in anyenterprise to do this, something has to give, and whatusually happens is that many of the necessary tasks ofreference data management are simply not done.

Successful Reference Data ManagementWhat is Reference Data?Many define reference data as “codes”, “lookup tables”,“domains”, or “static data”, but it can be formallydefined as follows:Reference data is any kind of data that isused solely to categorize other data found ina database, or solely for relating data in adatabase to information beyond theboundaries of the enterprise.Enterprise applications typically implement referencedata as database tables that have just a couple ofcolumns — a code and a description — and whichcontain a few hundred rows at most and change slowlyover time.Figure 1 shows an example of the beginning of a typicalcode table, the UN Country List. Applications inmultiple domains use such standard country codes tocategorize other data — for instance, to indicate thelocation of a business office or customer address. 3Because of its perceived structural simplicity, relativelylow volume, and slow rate of change, reference datais often overlooked. On the other side of the ledger,however, are these facts: A nywhere from 20% to 50% of the tables in adatabase are reference data tables. A ny data quality issue in reference data can havewidespread results, such as errors in reporting anddata integration. T ables covering the same or similar reference data getwidely duplicated across many applications.While in the context of a single application, theimplementation, maintenance and use of referencedata are fairly simple, in the broader context of anenterprise they are complex. Figure 2 illustrates someof these facts.Numerical CodeCountry or Area NameISO ALPHA-3 stanÅland IslandsAlbaniaAlgeriaAmerican SamoaAndorraAngolaAnguillaAntigua and SMANDAGOAIAATGARGARMABWAUSFigure 1: Example of Reference Data — Fragment of UN Country List tm)

Successful Reference Data ManagementWhat is Reference Data? (continued from page 3)In Figure 2, the fragment of the Customer record shownhas six reference data fields. These require six tablesin each system that has a Customer table. Typically,due to constraints on speed of delivery or developmentresources, not every system implements all the tables,but instead create custom shortcuts in coding.Tables implemented by different systems may alsocontain different code values for the same businessconcepts. In Figure 2, the Gender table may beimplemented differently in each system, where ideallyeach table should code for “Male”, “Female”, and“Unknown” in the same way.Thus, even in this simple example we see severalkey challenges: M any tables are needed to represent reference data. 4 D escription of the external world: The ability to easilycapture data about the world outside the enterprise,even expected future changes. In one example, manyenterprises populated their Currency tables with acode for the Euro years before they started to use it intransactions. Aid to analytics: Reference data provides a fast way tocategorize data, even for very short term needs, suchas when a six-week marketing campaign for a charityneeds to classify donors according to their likelyattitudes towards a particular cause.Conversely, organizations that fail to effectively managereference data face critical operational risks.The risks and costs of poor reference data managementinclude: (see also on page 2: The Costs of Poor Reference T hese tables must be implemented in many differentsystems.Data Management) D iscrepancies can easily arise in reference data acrosssystems. C oding errors: If reference data is misunderstood, dataentry operators can make “coding errors.” For example,if a data entry operator onboarding an institutionalcustomer does not understand the difference between“HF-Hedge Fund” and “AM-Asset Manager” theycan invoke the wrong compliance checks which theprospective customer cannot possibly comply with.Figure 3 shows how reference data compares to otherkinds of data found in databases. Each kind of data hasits own particular characteristics and governance andmanagement needs.We will now look at what these characteristics andneeds are for reference data, starting with what makesreference data important.Why is Reference Data Management Important?Enterprises that manage reference data well gainsignificant benefits.The benefits include: Agile responsiveness to new data requirements:The ability to implement a new business conceptin an application without database restructuring —for example, adding a new code for a new CustomerType, as opposed to changing the Customer table toadd a new column. M iscommunications across enterprise systems: Ifdifferent systems do not share the same referencedata, they cannot communicate effectively. In Figure2, System 1 may send System 2 records with Genderhaving values of “M”, “F”, or “9”, where System 2expects either “1” or “2”. Such problems often lead totransaction rejection. H igh costs and errors in data integration: Differencesin implementation of reference data create the needfor “mappings” in systems that integrate data such asdata warehouses. If System 1 and System 2 in Figure 2both feed into a data warehouse, the warehouse must“map” the Gender Codes between the two sources.Perhaps “1” is equivalent to “M” and “2” to “F”, butthe problem still remains of what to do with the “9”from System 1.

Successful Reference Data ManagementCUSTOMER RECORDCustomer IDCustomerFirst NameCustomerLast NameCustomerTypeCreditLevelSYSTEM 1TABLESGenderResidenceStateAge BandLoyaltyStatusSYSTEM 2SYSTEM 3TABLESTABLESCustomerCustomerCustomerCustomer TypeCustomer TypeCustomer TypeCredit LevelGenderStateAge BandRECORDS“M” “Male”“F” “Female”“9” “Unknown”Credit Female”Credit LevelGenderRECORDS“m” “Male”“f”“Female”Loyalty StatusFigure 2: A Typical Reference Data ScenarioMoreLessMetadataReference Data Semanticsvalue Perdata qualityImportance Datavolume PerformancerequirementsTransaction Structure Data DurationwithoutchangeEnterprise Structure DataTransition Activity DataTransaction Audit DataLessMoreFigure 3: Putting Reference Data in ContextData that specifies the tables andcolumns of a database“Codes” that classify other data or representthings found outside the enterpriseData that represents the parties to the transactionsof the enterprise, e.g., Customer, Product.Traditional Master DataData that describes business activity bybusiness responsibility, e.g., OrganizationalStructure, Chart of Accounts. Similar toReference Data but more hierarchicalThe transactions of the enterprise, e.g., Sales,Trades, Employee ActionsData that tracks change of state of transactions,including the data in database logs and weblogs 5

Successful Reference Data ManagementMeeting the Challenges of ReferenceData ManagementOnce an enterprise understands the purpose ofreference data and decides that it wants to effectivelygovern and manage their reference data, what doesit do next? Here we meet the problem of immaturepractices head on. There are no pre-existing templatesinto which reference data management can be fitted.It might seem that either Information technology (IT)or Operations could do this work, but it is a poor fitfor both. IT tends to think only in terms of time-boundprojects that deliver something to business users, afterwhich IT moves on to the next project. Operations,typically the business focus of data management, isinherently tactical and suffers from its goals beingoriented to quantity and timeliness, rather than quality.BEST PRACTICESBest Practices for Managing Reference Data (see page8) summarizes the best practices that are essentialcomponents of any vision for the governance andmanagement of reference data. We will now considereach of these in further detail. 6Given that neither IT nor Operations are natural homesfor reference data management, executive managementneeds to provide leadership. However, doing thisrequires a vision of how an enterprise can achieveeffective reference data governance and management.Such a vision must account for organizational,process and infrastructure needs in order to providean integrated solution that meets the challenges ofreference data management.Today, many enterprises have a Chief Data Officer(CDO) who can provide the leadership to implementthis vision, but if not, this leadership can also come froma Chief Information Officer (CIO) or Chief OperationsOfficer (COO).What are the best practices for RDM, and whatcapabilities must be present in the RDM solutions toeffectively support these best practices?for the enterprise, and it deals with regulatoryissues where these exist such as in healthcare andfinance. Sometimes, the central RDU may alsoundertake a certain amount of management ofexternal reference data.Locating the Central Reference Data UnitEstablishing a Central Reference DataUnit (RDU)Today, the best practice is to establish a centralreference data unit (RDU) that governs both internaland external reference data and plays a strong role inthe management of external reference data. (We willdiscuss the differences between external and internalreference data later.) This unit is staffed with personnelwho have expertise in reference data management andgovernance.The central RDU should be responsible for creatingall the policies needed for reference data managementand ensuring compliance with them. It also ensuresthat appropriate reference data standards are chosenImportant stakeholders should agree upon a charterfor the central RDU that reflects the overall vision.The charter should state the benefits that the centralRDU will provide to the enterprise, which will helpwith judging its success in the long term. It should alsodefine the scope of the unit, which will help preventarguments about where the central RDU has authorityand where it does not.The next question is where the RDU should belocated. A stand-alone RDU is most advisable, and isindispensable in large global enterprises. Ideally, suchan RDU will be located close to similar functions. Thesecould be Data Governance (usually located in the Officeof the CDO) or Master Data Management (oftenin Operations). It is less advisable to have a central

Successful Reference Data ManagementRDU in an analytics environment, since this is toofar downstream in terms of data flow to be effective.Likewise, housing a central RDU in IT is not a goodidea, because IT is often shunned by the rest of theenterprise, particularly Operations.If a suitable organizational location for the RDU cannotbe found, then an alternative is a highly federatedmodel. As we shall see below, some degree offederation will always be needed. However, in a highlyfederated model, a group such as Data Governancetakes responsibility only for creating policies andstandards and ensuring compliance with these. Theactual reference data management tasks get dividedup amongst the groups most willing and best ableto take them on. If Data Governance is chosen forthe governance aspects of reference data, then DataGovernance must not undertake operational referencedata management tasks, particularly for externalreference data, as this would create a conflict of interestwith their role-setting policies.Each enterprise must work out how it can implementa central RDU or a highly federated model. Figure 4illustrates how one such arrangement might work.Provides overall governanceframework for reference dataReports metrics on activitiesEnsures IT delivery alignsto reference data standardsData Reference Data UnitAnalyticsEnsures reference datais managed well 7OperationsAllocates accountabilities forreference data and ensures itis managed wellGeneralBusinessProvides support forreference data usageFigure 4: High-Level View of How a Central Reference Data Unit Can Interface with the Enterprise

Successful Reference Data ManagementExternal Reference Data ManagementOnce accountability for reference data governancehas been assigned, the focus must shift to managementpractices. A good place to start is with externalreference data: reference data that is created andmaintained by an authority outside the enterprisesuch as ISO 3166-1 Country Codes, NACE Codes,and SIC Codes. 8 Creating a profile of the external standards authorityfor use in interacting with the external authority andassessing its reliability. C reating a profile of the reference data set maintainedby the authority. D eciding which of the external reference data sets fora given business concept (for example, Country names)will be adopted by the enterprise to represent thebusiness concept.A sample of the tasks to carry out includes: S emantic analysis of the chosen reference dataset. D iscovery of external standards that exist for a givenbusiness concept, such as Country names. D ocumentation of the semantics of the chosenreference dataset (especially in some kind of tool orrepository).Best Practices for Managing Reference Data1. Establishing a Central Reference Data Unit (RDU).A Central RDU oversees reference data managementacross the enterprise to achieve overall goals —especially standardization, quality, and operationalefficiency.2. Locating the Central Reference Data Unit. It needsto be decided where in the organization a centralRDU is located. Ideally it will be close to similarfunctions such as Data Governance and Master DataManagement, perhaps in the office of the Chief DataOfficer (CDO). It is less desirable to locate it in IT orin areas responsible for Business Intelligence.3. Managing External Reference Data. Externalreference data is maintained by authorities outsidethe enterprise. It needs to be discovered, selected,understood and ingested. Standard practices are amajor help in doing this.4. Managing Subscriptions for External ReferenceData. Once external reference data has beenset up, it needs to be kept current. Subscriptionmanagement does this by ensuring that changes aredetected and assimilated as rapidly as possible.5. Governing Internal Reference Data. Internalreference data is for business concepts that arecompletely specific to the enterprise. It requiresa federated approach, because it is created andmanaged by many different subject matter experts(SMEs). The central RDU must ensure that groupsaccountable for internal reference data use astandardized approach.6. Governing Reference Data in OperationalEnvironments. Operational units are challengedby changes to the business that often require rapidchanges to reference data in application systems.This can create discrepancies and inconsistencies,and the central RDU must find ways to deal withlocal needs for change without creating difficulties atthe enterprise level.7. Distribution of Reference Data. Reference data isused widely throughout the enterprise. It is vitalthat all applications have synchronized copies,so distribution must be addressed. This requiresa variety of approaches ranging from the fullyautomated to the fully manual. However, theseapproaches must be chosen carefully to maintainoperational efficiency.

Successful Reference Data Management 9 A ssigning responsibility for the ingest of the chosenreference dataset.External Reference Data SubscriptionManagement Executing the ingest of the chosen reference dataset.After the initial ingest of an external reference dataset,it needs ongoing maintenance. External reference datamay change slowly, but it does change. For instance,country codes change an average of 3-5 times per year,and many more in some years. Currency codes changeat an average rate of 5 to 10 times per year, again withsome years seeing significantly more. C hecking the ingest of the chosen reference dataset. E stablishing a means by which the rest of theenterprise can access the chosen reference dataset. Communicating the availability of the chosenreference dataset to the rest of the enterprise.Onboarding the chosen reference dataset, which includes: S etting up the environment to house the chosenreference dataset. S etting up the mechanism to ingest the chosenreference dataset. D eciding what to filter out of the chosen referencedataset. D eciding what transformations to apply to the chosenreference dataset. D eciding how to enrich the chosen reference data set. Establishing criteria for testing the success of theingest of the chosen reference dataset.This list is not exhaustive, but does illustrate the careneeded for successfully ingesting an external referencedataset. It also gives a sense of the extensive need forreference data metadata to manage all of these tasksfor maintenance, auditing, and other purposes.In the past, many enterprises did not govern this work,with the result that individual application developmentteams simply did it for themselves on an as-neededbasis, often using generalized tools like Excel.Such efforts typically involved just enough effort for thereference data to be “good enough” for the particularapplication involved and did not include an enterpriseperspective.Small wonder, then, that multiple standards areimplemented in many enterprises for concepts such asCountry or Industrial Sector, leading to misalignmentsand errors when data must be shared or integrated.However, this maintenance is rarely dealt with in theabsence of a central RDU. If there is no central RDU, anapplication development team may well be forced toperform an initial ingest of a reference dataset just to beable to test an application they are building. However,once the application is in production the IT team rarelyworries about updating the reference data — they seethat as a task for the users.Business users — usually Operations staff — typicallylack the time and understanding to track changes inan external reference data standard. Even where theycan, it is often isolated teams that make their owndecisions about what changes should be included inan application, when this should be done, and how itshould be done.Such lack of governance means that individualapplications inevitably drift away from synchronizationwith the external standard, and from each other. Asa result, no matter how well an individual applicationfunctions, the data it produces will become increasinglydifficult to share, integrate, and understand outside ofthe context of the application itself.A central RDU can ensure that subscriptions areestablished with external reference data authorities.Very often these subscriptions are free, such as a freenewsletter of changes. In other cases, the subscriptionscost money, and some may be quite expensive.Sometimes, no subscription is available, and no onecan determine if a change in an external reference datastandard has occurred except by periodically examiningthe actual data maintained by the external authority.It makes sense to centralize subscriptions, for bothoperational efficiency and reduction of overall

Successful Reference Data ManagementExternal Reference Data Subscription Management(continued from page 9)subscription costs. If you adopt a more federatedmodel, the central RDU can oversee the management ofsubscriptions by other units. However, a single technicalenvironment that houses subscription information ishighly desirable. Modern reference data managementtools are beginning to support this requirement,especially the reference data metadata requirementsinvolved.A more significant element of the work to be doneis what happens when a change is announced ina subscription. This must be carefully processedin order to be ready for adoption by the relevantapplications in the enterprise. There will be metadataand documentation that needs to be updated to ensurethat business users have the correct and up-to-dateinformation on how to properly use the reference data.However, besides the actual detection of the change ina timely manner, the greatest challenge is to distributethe reference data to the rest of the enterprise, whichwe will discuss in more detail below.Governing Internal Reference DataInternal reference data is reference data for which noexternal authority exists and which must be managedentirely within the enterprise. Typically, enterpriseshave their own Customer Types, Product Lines, andso on. Certain groups produce more of this type ofreference data than others. For instance, Marketing isalways looking at new ways of “segmenting” customers,products, and markets. Such classifications may onlyapply to short-lived marketing campaigns, but theyshould still be well-managed.A central RDU is much less likely to manage internalreference data, because it is generated by SMEs withinthe enterprise who understand the business conceptsinvolved. The central RDU must concentrate on goodgovernance for the development of reference datacontent, which is not the case for external referencedata, where the RDU has no influence over the externalauthority. 10Examples of the tasks involved in good governance ofinternal reference data include: E nsuring that each business concept represented inreference data has formal assigned accountabilities,and that these are not duplicated. E nsuring that the accountabilities for internal referencedata management are known across the enterprise. E nsuring that each internal reference dataset issemantically analyzed. E nsuring that the semantics of each internal referencedataset are documented in a standardized manner thatis accessible across the enterprise. E nsuring that the content of each internal referencedataset is of the highest quality and does not containdefects. E nsuring that internal reference datasets that haverelationships or dependencies are harmonized,especially in terms of update cycles. E nsuring the effective distribution of updates tointernal reference datasets across the enterprise. E nsuring that each internal reference dataset isperiodically reviewed for relevance and kept up to datewith business changes. E nsuring that information on material changes to theenterprise are channeled to the accountable partiesfor internal reference datasets so that these can beupdated in a proactive manner. E nsuring that accountable parties for internal referencedatasets provide adequate support for these datasets.As we can see, governance of internal referencedata inevitably requires a federated model, whereasgovernance of external reference data can, if desiredand practical, be more centralized.There is no doubt that federated environments requiregreater governance skills than centralized ones,because many parties must work in harmony andadopt standardized ways of working. This is why theoption for Data Governance to take on these tasks,rather than a central RDU, is a real one. However, thismust be balanced with the need for specialized domain

Successful Reference Data Management 11Combining Methodology with Adequate Infrastructure is Key to Effective ManagementHaving a functioning central reference data unit(RDU) means that a single overall methodologycan be developed for all aspects of reference datamanagement, thereby improving efficiency and raisingreference data quality.This is particularly important today as more andmore reference data tools are appearing that providefunctionality that is difficult for enterprises to developby themselves. By doing this, the enterprise canachieve greatly increased efficiency gains.Accountabilities can be distributed in a formal mannerso that everyone knows just what they have to do andwhom they must communicate with. Most importantly,enterprise-level infrastructure can be built to supportthis aspect of reference data management andintegrated with the methodology.A sound methodology coupled with adequate toolingwill also ensure that the enterprise can be agile andadapt to new demands, because all applications usethe same set of external reference data, rather thana cacophony of different standards that is wildlyexpensive and difficult to integrate.knowledge of what is required for good governance ofinternal reference data. Take, for instance, the followingpoint in the list on page 10:Ensuring that the content of each internalreference dataset is of the highest quality anddoes not contain defects.This implies that reference data governance teamsunderstand the quality criteria and have adequatetools to address reference data identification, defects,and maintenance. The individuals performing thesetasks must have specific domain knowledge aboutreference data that cannot be substituted for by generalknowledge of data management.So, if a Data Governance unit will govern the referencedata environment in an enterprise, it should containone or more individuals who really understandreference data management and provide adequatetool support to the enterprise for managing referencedata issues.Governing Reference Data in OperationalEnvironmentsStaff in operational environments find from time totime that a particular reference data table is no longeradequate for business needs. This is because theenterprise changes in small ways every day, and veryoften these small changes manifest themselves asthe content of a reference data table no longer beingadequate for business needs.Since most applications allow operational staff toupdate reference data tables, it is quite natural foroperational staff to do so when the need a

Coding errors: If reference data is misunderstood, data entry operators can make "coding errors." For example, if a data entry operator onboarding an institutional customer does not understand the difference between "HF-Hedge Fund" and "AM-Asset Manager" they can invoke the wrong compliance checks which the