Data Masking Best Practices

Transcription

An Oracle White PaperJuly 2010Data Masking Best Practices

Oracle White Paper—Data Masking Best PracticesExecutive Overview . 1Introduction . 1The Challenges of Masking Data . 2Implementing Data Masking . 2Comprehensive Enterprise-wide Discovery of Sensitive Data . 3Enforcing Referential Relationships during Data Masking . 4Rich and Extensible Mask Library. 6Sophisticated Masking Techniques . 7High Performance Mask Execution . 9Integrated Testing with Application Quality Management solutions11Oracle’s Comprehensive Solutions for Database Security . 12Customer Case Studies . 12Conclusion . 13

Oracle White Paper—Data Masking Best PracticesExecutive OverviewEnterprises need to share production data with various constituents while also protectingsensitive or personally identifiable aspects of the information. As the number of applicationsincreases, more and more data gets shared, thus further increasing the risk of a data breach,where sensitive data gets exposed to unauthorized parties. Oracle Data Masking addressesthis problem by irreversibly replacing the original sensitive data with realistic-looking scrubbeddata that has same type and characteristics as the original sensitive data thus enablingorganizations to share this information in compliance with information security policies andgovernment regulations.This paper describes the best practices for deploying Oracle Data Masking to protect sensitiveinformation in Oracle and other heterogeneous databases such as IBM DB2, MicrosoftSQLServer.IntroductionEnterprises share data from their production applications with other users for a variety ofbusiness purposes. Most organizations copy production data into test and developmentenvironments to allow application developers to test application upgrades. Retail companiesshare customer point-of-sale data with market researchers to analyze customer buyingpatterns. Pharmaceutical or healthcare organizations share patient data with medicalresearchers to assess the efficacy of clinical trials or medical treatments.Numerous industry studies on data privacy have concluded that almost all companies copytens of millions of sensitive customer and consumer records to non-production environmentsfor testing, development, and other uses. Very few companies do anything to protect this dataeven when sharing with outsourcers and third parties. Almost 1 out of 4 companies respondedthat live data used for development or testing had been lost or stolen and 50% said they hadno way of knowing if data in non-production environments had been compromised.1

Oracle White Paper—Data Masking Best PracticesThe Challenges of Masking DataOrganizations have tried to address these issues with custom hand-crafted solutions or repurposedexisting data manipulation tools within the enterprise to solve this problem of sharing sensitiveinformation with non-production users. Take for example, the most common solution: databasescripts. At first glance, an advantage of the database scripts approach would appear that theyspecifically address the unique privacy needs of a particular database that they were designed for. Theymay have even been tuned by the DBA to run at their fastestLet’s look at the issues with this approach.1.Reusability: Because of the tight association between a script and the associated database, thesescripts would have to re-written from scratch if applied to another database. There are nocommon capabilities in a script that can be easily leveraged across other databases.2.Transparency: Since scripts tend to be monolithic programs, auditors have no transparency intothe masking procedures used in the scripts. The auditors would find it extremely difficult to offerany recommendation on whether the masking process built into a script is secure and offers theenterprise the appropriate degree of protection.3.Maintainability: When these enterprise applications are upgraded, new tables and columnscontaining sensitive data may be added as a part of the upgrade process. With a script-basedapproach, the entire script has to be revisited and updated to accommodate new tables andcolumns added as a part of an application patch or an upgrade.Implementing Data MaskingBased on Oracle Data Masking , Oracle has developed a comprehensive 4-step approach toimplementing data masking called Find, Assess, Secure, and Test (FAST). These steps are: Find: This phase involves identifying and cataloging sensitive or regulated data across the entireenterprise. Typically carried out by business or security analysts, the goal of this exercise is to comeup with the comprehensive list of sensitive data elements specific to the organization and discoverthe associated tables and columns across enterprise databases that contain the sensitive data. Assess: In this phase, developers or DBAs in conjunction with business or security analystsidentify the masking algorithms that represent the optimal techniques to replace the originalsensitive data. Developers can leverage the existing masking library or extend it with their ownmasking routines. Secure: This and the next steps may be iterative. The security administrator executes the maskingprocess to secure the sensitive data during masking trials. Once the masking process hascompleted and has been verified, the DBA then hands over the environment to the applicationtesters.2

Oracle White Paper—Data Masking Best Practices Test: In the final step, the production users execute application processes to test whether theresulting masked data can be turned over to the other non-production users. If the maskingroutines need to be tweaked further, the DBA restores the database to the pre-masked state, fixesthe masking algorithms and re-executes the masking process.Comprehensive Enterprise-wide Discovery of Sensitive DataTo begin the process of masking data, the data elements that need to be masked in the applicationmust be identified. The first step that any organization must take is to determine what is sensitive. Thisis because sensitive data is related to specific to the government regulations and industry standards thatgovern how the data can used or shared. Thus, the first step is for the security administrator to publishwhat constitutes sensitive data and get agreement from the company’s compliance or risk officers.A typical list of sensitive data elements may include:Person NameBank Account NumberMaiden NameCard Number (Credit or Debit Card Number)Business AddressTax Registration Number or National Tax IDBusiness Telephone NumberPerson Identification NumberBusiness Email AddressWelfare Pension Insurance NumberCustom NameUnemployment Insurance NumberEmployee NumberGovernment Affiliation IDUser Global IdentifierMilitary Service IDParty Number or Customer NumberSocial Insurance NumberAccount NamePension ID NumberMail StopArticle NumberGPS LocationCivil Identifier NumberStudent Exam Hall Ticket NumberCredit Card NumberClub Membership IDSocial Security NumberLibrary Card NumberTrade Union Membership NumberOracle Data Masking provides several easy-to-use mechanisms for isolating the sensitive dataelements.3

Oracle White Paper—Data Masking Best Practices Data Model driven: Typical enterprise applications, such as E-Business Suite, Peoplesoft and Siebel,have published their application data model as a part of their product documentation or the supportknowledge base. By leveraging the published data models, data masking users can easily associatethe relevant tables and columns to the mask formats to create the mask definition. Application Masking Templates: Oracle Data Masking supports the concept of application maskingtemplates, which are XML representations of the mask definition. Software vendors or serviceproviders can generate these pre-defined templates and make them available to enterprises to enablethem to import these templates into the Data Masking rapidly and thus, accelerate the data maskingimplementation process. Ad-hoc search: Oracle Data Masking has a robust search mechanism that allows users to search thedatabase quickly based on ad hoc search patterns to identify tables and columns that representsources of sensitive data. With all the database management capabilities, including the ability toquery sample rows from the tables, built into Enterprise Manager, the Data Masking a can assistenterprise users rapidly construct the mask definition – the pre-requisite to mask the sensitive data.For deeper searches, Oracle provides the Oracle Data Finder tool during data masking implementationto search across enterprises based on data patterns, such as NNN-NN-NNNN for social securitynumbers or 16 or 15 digit sequences beginning with 3, 4 or 5 for credit card .numbers.Using the combination of schema and data patterns and augmenting them with published applicationmeta data models, enterprises can now develop a comprehensive data privacy catalog that captures thesensitive data elements that exist across enterprise databases. To be clear, this is not a static list. This isa dynamic living catalog managed by security administrators that needs to be refreshed as businessrules and government regulations change as well as when applications are upgraded and patched andnew data elements containing sensitive data are now discovered.Enforcing Referential Relationships during Data MaskingIn today’s relational databases (RDBMS), data is stored in tables related by certain key columns, calledprimary key columns, which allows efficient storage of application data without have to duplicate data.For example, an EMPLOYEE ID generated from a human capital management (HCM) applicationmay be used in sales force automation (SFA) application tables using foreign key columns to keep trackof sales reps and their accounts. When deploying a masking solution, business users are oftenconcerned with referential integrity, the relationship between the primary key and the foreign keycolumns, in a database or across databases.4

Oracle White Paper—Data Masking Best PracticesCUSTOMERSEMPLOYEES EMPLOYEE IDFIRST NAMELAST NAMEDatabase enforcedApplication enforced CUSTOMER IDSALES REP IDCOMPANY NAMESHIPMENTS SHIPMENT IDSHIPPING CLERK IDCARRIERFigure 1:The Importance of Referential IntegrityOracle Data Masking automatically identifies referential integrity as a part of the mask definitioncreation. This means that when a business user chooses to mask a key column such asEMPLOYEE ID, the Oracle Data Masking discovers all the related foreign key relationships in thedatabase and enforces the same mask format to the related foreign key columns. This guarantees thatthe relationships between the various applications tables are preserved while ensuring that privacyrelated elements are masked. In applications where referential integrity is enforced in the database,Oracle Data Masking allows these relationships to be registered as related columns in the maskdefinition, thereby applying the same masking rules as applied to the database-enforced foreign keycolumns.5

Oracle White Paper—Data Masking Best PracticesFigure 2: Automatic enforcement of referential IntegrityRich and Extensible Mask LibraryOracle Data Masking provides a centralized library of out-of-the-box mask formats for common typesof sensitive data, such as credit card numbers, phone numbers, national identifiers (social securitynumber for US, national insurance number for UK). By leveraging the Format Library in Oracle DataMasking, enterprises can apply data privacy rules to sensitive data across enterprise-wide databasesfrom a single source and thus, ensure consistent compliance with regulations. Enterprises can alsoextend this library with their own mask formats to meet their specific data privacy and applicationrequirements.6

Oracle White Paper—Data Masking Best PracticesFigure 3: Rich and extensible Mask Format LibraryOracle Data Masking also provides mask primitives, which serve as building blocks to allow thecreation of nearly unlimited custom mask formats ranging from numeric, alphabetic or date/timebased. Recognizing that the real-world masking needs require a high degree of flexibility, Oracle DataMasking allows security administrators to create user-defined-masks. These user-defined masks, writtenin PL/SQL, let administrators create unique mask formats for sensitive data, e.g. generating a uniqueemail address from fictitious first and last names to allow business applications to send testnotifications to fictitious email addresses.Sophisticated Masking TechniquesData masking is in general a trade-off between security and reproducibility. A test database that isidentical to the production database is 100% in terms of reproducibility and 0% in terms of securitybecause of the fact that it exposes the original data. Masking technique where data in sensitive columnsis replaced with a single fixed value is 100% in terms of security and 0% in terms of reproducibility.When considering various masking techniques, it is important to consider this trade-off in mind whenselecting the masking algorithms.Oracle Data Masking provides a variety of sophisticated masking techniques to meet applicationrequirements while ensuring data privacy. These techniques ensure that applications continue tooperate without errors after masking. For example, Condition-based masking: this technique makes it possible to apply different mask formats to thesame data set depending on the rows that match the conditions. For example, applying differentnational identifier masks based on country of origin. Compound masking: this technique ensures that a set of related columns is masked as a group toensure that the masked data across the related columns retain the same relationship, e.g. city, state,zip values need to be consistent after masking.7

Oracle White Paper—Data Masking Best PracticesDeterministic MaskingDeterministic masking is an important masking technique that enterprises must consider when maskingkey data that is referenced across multiple applications. Take, for example, three applications: a humancapital management application, a customer relationship management application and a sales datawarehouse. There are some key fields such as EMPLOYEE ID referenced in all three applications andneeds to be masked in the corresponding test systems: a employee identifier for each employee in thehuman resources management application, customer service representative identifiers, which may alsobe EMPLOYEE IDs, in the customer relationship management application and sales representativeIDs, which may be EMPLOYEE IDs in the sales data warehouse.To ensure that data relationships are preserved across systems even as privacy-related elements areremoved, deterministic masking techniques ensure that data gets masked consistently across thevarious systems. It is vital that deterministic masking techniques used produce the replacement maskedvalue consistently and yet in a manner that the original data cannot be derived from the masked value.One way to think of these deterministic masking techniques is as a function that is applied on theoriginal value to generate a unique value consistently that has the same format, type and characteristicsas the original value, e.g. a deterministic function f(x) where f(x1) will always produce y1 for a givenvalue x1. In order for the deterministic masking to be applied successfully, it is important that thefunction f(x) not be reversible, i.e. the inverse function f-1(y1) should not produce x1 to ensure thesecurity of the original sensitive data.Deterministic masking techniques can be used with mathematical entries, e.g. social security numbersor credit card numbers, as well as with text entries, e.g., to generate names. For example, organizationsmay require that names always get masked to the same set of masked names to ensure consistency ofdata across runs. Testers may find it disruptive if the underlying data used for testing is changed byproduction refreshes and they could no longer locate certain types of employees or customer recordsthat were examples for specific test cases. Thus, enterprises can use the deterministic maskingfunctions provided by Oracle Data Masking to consistently generate the same replacement mask valuefor any type of sensitive data element.Deterministic masking becomes extremely critical when testing data feeds coming from externalsystems, such as employee expense data provided by credit card companies. In productionenvironments, the feed containing real credit card numbers are processed by the accounts payableapplication containing employee’s matching credit card information and are used to reconcile employeeexpenses. In test systems, the employee credit card numbers have been obfuscated and can no longerbe matched against the data in the flat files containing the employee’s real credit card number. Toaddress this requirement, enterprises pre-load the flat file containing data using tools such asSQL*Loader, into standard tables, then mask the sensitive columns using deterministic maskingprovided by Oracle Data Masking and then extract the masked data back into flat file. Now, theapplication will be able to process the flat files correctly just as they would have been in Productionsystems.8

Oracle White Paper—Data Masking Best PracticesHigh Performance Mask ExecutionNow that the mask definition is complete, the Oracle Data Masking can now execute the maskingprocess to replace all the sensitive data. Oracle Enterprise Manager offers several options to clone theproduction database: Recover from backup: Using the Oracle Managed Backups functionality, Oracle EnterpriseManager can create a test database from an existing backup. Clone Live Database: Oracle Enterprise Manager can clone a live production data into any nonproduction environment within a few clicks. The clone database capability also provides theoption to create a clone image, which can then be used for other cloning operations.With the cloned (non-production) database now ready for masking, the Oracle Data Masking builds awork list of the tables and columns chosen for masking. Other tables that are not required to bemasked are not touched. Further, the tables selected for masking are processed in the optimal order toensure that only one pass is made at any time even if there are multiple columns from that tableselected for masking. Typically, the tables with the primary keys get masked first, followed by thedependent tables containing foreign keys.Once the mask work list is ready, the Oracle Data Masking generates mapping tables for all thesensitive fields and their corresponding masked values. These are temporary tables that are created as apart of the masking process, which will be dropped once all data has been masked successfully.Using a highly efficient data bulk mechanism, Oracle Data Masking rapidly recreates the maskedreplacement table based on original tables and the mapping tables and restores all the related databaseelements, such as indexes, constraints, grants and triggers identical to the original table. Compare thiswith the typical data masking process, which usually involves performing table row updates. Becauserows in a table are usually scattered all over the disk, the update process is extremely inefficient becausethe storage systems attempts to locate rows on data file stored on extremely large disks. The bulkmechanism used by Oracle Data Masking lays down the new rows for the masked table in rapidsuccession on the disk. This enhanced efficiency makes the masked table available for users in afraction of the time spent by an update-driven masking process. For large tables, Oracle Data Maskingautomatically invokes SQL parallelism to further speed up the masking process.Other performance enhancements include using the NOLOGGING option when recreating the tablewith the masked data. Typical database operations such as row inserts or updates generate redo logs,which are used by the database to capture changes made to files. These redo logs are completelyunnecessary in a data masking operation since the non-production database is not running in aproduction environment, requiring continuous availability and recoverability. Using theNOLOGGING option, the Oracle Data Masking bypasses the logging mechanism to furtheraccelerate the masking process efficiently and rapidly.9

Oracle White Paper—Data Masking Best PracticesIn internal tests run on a single-core Pentium 4 (Northwood) [D1] system with 5.7G of memory, thefollowing performance results with reported.CriteriaBaselineMetricColumn scalability215 columns100 tables of 60G20 minutesRow scalability100 million rows6 columns1 hour 20 minutesFigure 4: Oracle Data Masking Performance scalability testsAs these results clearly indicate, Oracle Data Masking can handle significant volumes of sensitive dataeffortlessly both in terms of the number of sensitive columns as well as tables with large numbers ofrows.Oracle Data Masking is also integrated with Oracle Provisioning and Patch Automation in OracleEnterprise Manager to clone-and-mask via a single workflow. The secure high performance nature ofOracle Data Masking combined with the end-to-end workflow ensures that enterprise can provisiontest systems from production rapidly instead of days or weeks that it would with separate manualprocesses.Optimized for Oracle databasesOracle Data Masking leverages key capabilities in Oracle databases to enhance the overallmanageability of the masking solution. Some of these include: Flashback: Administrators can optionally configure Oracle databases to enable flashback to a premasked state if they encounter problems with the masked data. PL/SQL: Unlike other solutions, Oracle Data Masking generates DBA-friendly PL/SQL thatallows DBAs to tailor the masking process to their needs. This PL/SQL script can also be easilyintegrated into any cloning process.10

Oracle White Paper—Data Masking Best PracticesSupport for heterogeneous databasesOracle Data Masking supports masking of sensitive data in heterogeneous databases such as IBM DB2and Microsoft SQLServer through the use of Oracle Database Gateways.Figure 5: Data masking support for heterogeneous databasesIntegrated Testing with Application Quality Management solutionsThe final step of the masking process is to test that the application is performing successfully after themasking process has completed. Oracle Enterprise Manager’s Application Quality Management (AQM)solutions provide high quality testing for all tiers of the application stack. Thorough testing can helpyou identify application quality and performance issues prior to deployment. Testing is one of the mostchallenging and time consuming parts of successfully deploying an application, but it is also one of themost critical to the project’s success. Oracle Enterprise Manager’s AQM solutions provide a uniquecombination of test capabilities which enable you to: Test infrastructure changes: Real Application Testing is designed and optimized for testingdatabase tier infrastructure changes using real application workloads captured in production tovalidate database performance in your test environment.11

Oracle White Paper—Data Masking Best Practices Test application changes: Application Testing Suite helps you ensure application quality andperformance with complete end-to-end application testing solutions that allow you to automatefunctional & regression testing, execute load tests and manage the test process.Oracle’s Comprehensive Solutions for Database SecurityOracle provides a comprehensive portfolio of security solutions to ensure data privacy, protect againstinsider threats, and enable regulatory compliance. With Oracle's powerful privileged user and multifactor access control, data classification, transparent data encryption, auditing, monitoring, and datamasking, customers can deploy reliable data security solutions that do not require any changes toexisting applications, saving time and money.Customer Case StudiesCustomers have had a variety of business needs which drove their decision to adopt the Oracle DataMasking for their sensitive enterprise data. These benefits of using Oracle Data Masking were realizedby a major global telecommunications products company that implemented the above methdology.Their database administrators (DBAs) had developed custom scripts to mask sensitive data in the testand development environments of their human resources (HR) application. As the company wasgrowing and offering new services, their IT infrastructure was also growing thus placing an increasedburden on their DBAs. By implementing Oracle Data Masking, the organization was able to use therole-based separation of duties to allow the HR analysts to define the security policies for maskingsensitive data. The DBAs then automated the implementation of these masking policies whenprovisioning new test or development environments. Thus, the telecommunications company was ableto allow business users to ensure compliance of their non-production environments while eliminatinganother manual task for the DBAs through automation.The need for data masking can come from internal compliance requirements. In the case of this UKbased government organization, the internal audit and compliance team had identified that the nonproduction copies of human resource management systems used for testing, development andreporting did not meet the established standards for privacy and confidentiality. In joint consultationswith their IT service provider, the organization quickly identified the Oracle Data Masking as ideallysuited to their business needs based on the fact that it was integrated with their day-to-day systemsmanagement operations provided by Oracle Enterprise Manager. Within a few weeks, the serviceprovider deployed the mask definitions for their Oracle eBusiness Suite HR application and therebyrapidly brought the internal non-productions systems into compliance.There are organizations that have internally developed data masking solutions that have discovered thatcustom scripts ultimately have their limits and are not able to scale up as enterprise data sets increase involume. This Middle East-based real estate company found that their data masking scripts wererunning for several hours and were slowing down as data volumes increased. Due to the stringentrequirement to create production copies available for testing within rapid time-frames, the companyevaluated the Oracle Data Masking among other commercial solutions. Upon deploying the Oracle12

Oracle White Paper—Data Masking Best PracticesData Masking, they discovered that they were able to accelerate the masking time from 6 hours usingtheir old scripts to 6 minutes using the Oracle Data Masking, an improvement of 60x inperformance.ConclusionStaying compliant with policy and government regulations while sharing production data with nonproduction users has become a critical business imperative for all enterprises. Oracle Data Masking isdesigned and optimized for today’s high volume enterprise applications running on Oracle databases.Leveraging the power of Oracle Enterprise Manger to manage all enterprise databases and systems,Oracle Data Masking accelerates sensitive data identification and executes the masking process with asimple easy-to-use web interface that puts the power of masking in the hands of business users andadministrators.Organizations that have implemented Oracle Data Masking to protect sensitive data in test anddevelopment environment have realized significant benefits in the following areas: Reducing Risk through Compliance: By protecting sensitive information when sharing productiondata with developers and testers, organizations have able to ensure that non-production databaseshave remained compliant with IT security policies while enabling developers to conductproduction-class testing. Increasing Productivity through Automation: By automating the masking process, organizationshave been able to reduce the burden on DBAs who previously had to maintain manuallydeveloped masking scripts.13

Data Masking Best PracticesJuly 2010Copyright 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and theAuthor: Jagan R. Athreyacontents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any otherContributing Authors:warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability orfitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations areOracle Corporationformed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by anyWorld Headquartersmeans, electronic or mechanical, for any purpose, without our prior written permission.500 Oracle ParkwayRedwood Shores, CA 94065Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respectiveU.S.A.owners.Worldwide Inquiries:AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. IntelPhone: 1.650.506.7000and Intel Xeon are trademar

Oracle White Paper—Data Masking Best Practices 3 Test: In the final step, the production users execute application processes to test whether the resulting masked data can be turned over to the other non-production users. If the masking routines need to be tweaked further, the DBA restores the database to the pre-masked state, fixes