Deep Dive Into RedPoint Data Management

Transcription

Deep DiveDeep Dive intoRedPoint Data Management RedPoint Data Management software (RPDM) empowersorganizations to ingest, cleanse, transform, and integrate data througha robust yet easy to use visual interface. From design to execution,data flows can be operationalized to enable business criticalapplications, fuel analytic teams and processes, and generate insightsacross the enterprise. All of this is done at unparalleled speed, scaleand levels of productivity.RedPoint provides a single point of control over all your data, connectingall types and sources of customer data – batch or streaming, structuredor unstructured, 1st-, 2nd-, and 3rd-party – at a high speed and scale.RPDM enables business success through connected data andstreamlined processes. It provides workflow and data flow; usercontrols for design, automation, and data management; and thedetailed functionality needed to ingest, validate, cleanse, and mergecustomer information into consolidated and robust data views.RPDM also helps to reduce operational costs while improving datamanagement. It provides enterprise-level operational capabilities forhandling sensitive customer information with appropriate compliance,security, and privacy while providing the performance, flexibility, andquality results needed to engage customers at scale in real time; eitheron premise or in the cloud.Broad Function Data Management:Any Data, Any SourceRedPoint Data Management offers an open garden approachby integrating all data sources, types, and formats, thanks tohundreds of standard, easily configured connectors. As a result,marketers and data scientists will achieve unprecedented speed,efficiency, and accuracy as they extract, transform, and aggregateany data. RPDM goes beyond just bringing in data; it reconciles,cleanses, and validates customer details automatically, somarketers can focus on customer engagement.A typical, automated dataflow for ingesting data from multiplesources, normalizing and validating customer details, and creatingcustomer “Golden Records” is shown in Figure 1. The interfaceprovides an easy drag-and-drop model for defining data flows, andallows access to predefined flows, connections, and processingtools with direct configuration of controls – no developer coding orcomplex scripting is required.The platform is highly configurable to each client’s specificenvironment and needs. Below we drill down into: Data Ingestion: Connections and ETL. Identity Processing: Match, merge, and integration. Data quality and enrichment: Validation, profiling, data hygiene,and geo-spatial analysis. Big Data Processing: Storage, access, and computation inHadoop environments. Data Storage and Access: Persistent key management andupdates. Process design: Macros, automations and notifications. Operational controls and reporting. Scalability, architecture, and usability.Figure 1: RedPoint Data Management Data FlowsPAGE 1 REDPOINT GLOBAL

Data Read/WriteRedPoint Data Management provides the ability to read and write singleand multi-byte data with full Unicode support from a variety of sources: Extensive support for regular expressions (including regularexpressions in string transformations) and for building/classifyingtokens and pattern matching based on regular expressions.Basic read/write: access to flat and delimited text files, withsupport for single and multi-byte data with full Unicodesupport and EBCDIC, with or without EOL, and with anydelimiter and surround character. Logical processing functions including data filtering,contingent value processing, and data-based branching. Support for local and global variables that can be used insingle data flow projects and across data flow projects. Support for parsing structured files including XML and JSON. Traditional DBF files and proprietary DLD files, which provide ahighly compressed file format for high throughput, check-pointrestart, and indexed query/append.Conversion of data types and transliteration across Unicodecode pages. Removal of non-printing characters (single and multi-byte). Databases access: many database types are accessed usingnative drivers and most others using OLEDB or ODBC drivers.Supported databases include Microsoft SQL Server, Azure SQLServer, Azure SQL Data Warehouse, IBM Netezza, Teradata,Oracle, Sybase, Postgres/Greenplum, MySQL, DB2, AWSRedshift, AWS Aurora, Snowflake, Access/Excel, and others. EDI data formats: including X12, EDIFACT, Electronic HealthRecords and others. NoSQL / Document databases such as MongoDB and Cosmos DB. Hadoop databases and file types such as HBase, Hive, Avro,and Parquet (see Hadoop section below). Message queues including IBM MQ, Kafka, AWS SQS, andAzure Service Bus. Salesforce.com CRM.Figure 3: RPDM includes an extensive set of transformation capabilitiesAll of the RedPoint transformation functions are available in bothautomated expression building and in an expression language.This allows users to nest/stack functions as deeply as desiredwithout sacrificing usability or performance.RedPoint also supports high performance sorting, joining and filtering(including single step file splitting and identification of unique records).Web ServicesRPDM allows users to integrate the full range of ETL, data quality/hygiene, and customer data integration functions with web-baseddata sources. RPDM web services capabilities allow for:Figure 2: RPDM supports a wide range of the commonly used extract and load typesTransformationRPDM includes an extensive set of data transformationcapabilities. This set covers the full range of functions neededto transform and integrate data from any source for use in anytarget. The set of functions includes: Capturing session and cookie information to create securemulti-task (complex) service processes. Encoding session data in headers, URL parameters, XML, or JSON. Processing HTTP posts and gets, allowing for both push andpull exchanges. Full support for OAuth2. Defining execution and retry strategy.More than 40 date/time functions that can be used totransform or synchronize date/time attributes. More than 30 numeric and arithmetic functions. More than 60 string transformation functions for managingtext, including managing unstructured block text, commentdata, and “white-space” analysis.Figure 4: RPDM Web ServicesRedPoint allows service calls within any project or data flow.Additionally, RPDM jobs can be deployed as SOAP-based webservices with advanced web service load balancing capabilities.PAGE 2 REDPOINT GLOBAL

Solve Customer IdentityChallengessingular, accurate, and continuously updated view of eachcustomer that is maintained with a persistent key, in minutes,seconds, or on demand.Identity resolution helps marketers gain a better understanding ofcustomers by building an accurate and usable representation ofthem (anonymous or known). This insight allows organizationsto predict and shape customers’ behavior, improve retention,increase sales, reduce friction in the customer experience, andmaximize customer profitability. In addition, identity resolutionthrough RedPoint Data Management supports digital interactionsthat are primarily real time. Accurate recognition is the single mostimportant step in providing a tailored and relevant experience forthe customer in their moment of truth.RedPoint Data Management: Provides the most powerful set of advanced data quality,identity resolution, matching, and master data managementcapabilities available on the market today. Using its advancedprobabilistic and deterministic matching (with more than 375built-in functions), marketers can easily identify, match, link,and de-duplicate files and standardize and correct data formore than 200 countries. Deciphers and relates individuals, households, cookies, IPaddresses, IoT smart devices, and more to form a clear andcomplete picture of every customer. Using RPDM, marketerscan create and enhance a customer “Golden Record” – a Includes a full range of data quality and cleansing capabilities.These are native to the product. Users need not separatelylicense and integrate a data quality tool to performnormalization, validation, and probabilistic matching alongwith the ETL functions described above.The example Identity Management data flow below includesnormalization, validation, and probabilistic matching for name,address, email, and phone to produce accurate matches across arange of sources and interactions.Identity Matching ApproachRedPoint Data Management handles matching with a broad set ofcapabilities: Out-of-the-Box Matching Tools for matching at thehousehold, person, address, email, URL, and account levelswithout coding or developing match algorithms. Probabilistic and Deterministic Matching that combinesstatistically based matching with customer data integrationbusiness rules to create highly accurate match results. Completeness and Accuracy with tools and reference data tocleanse and standardize names, addresses, phone numbers,and email addresses in North America and 240 other territories.Figure 5: Matching data flow handles validation, normalization, and deterministic and probabilistic matchingPAGE 3 REDPOINT GLOBAL

Alias Resolution for name-variance mapping (e.g. Lewis,Louis, Lew, and Lou or Street, St, and Str). Iterative Cycles mirror person/business natural changes andhandle new data becoming available over time. This includessupport for adding records or breaking apart existing person,household, or business groups. Performance and Scalability is a key requirement for thedeep and iterative matching. RedPoint has been benchmarkedto process high volumes of data 500 percent to 1,900 percentfaster than leading alternatives.RPDM name parsing can be extended to include pattern matchingfor roles (e.g., guardian, beneficiary) and “extended data” (e.g.,notes, deceased indicators, record references) that are sometimesoverloaded into name fields. Name parsing can also be extended toaccount for name patterns and distribution for international locations.This allows users to customize the name processing to account forregional and ethnic variances and user specific data requirements. Flexibility, with multiple matching levels, from very tightmatches with close to zero over-matching rates forcompliance uses to simultaneous looser matching formarketing or fraud detection.Address StandardizationRPDM includes tools to parse, standardize, correct, complete, andcertify addresses from around the world. The following options areavailable: US address standardization that is certified by the UnitedStates Postal Service, with certifications in Coding AccuracySupport System (CASS), Delivery Point Validation (DPV),Locatable Address Conversion System (LACS), and SuiteLink. Appending of Extended Line of Travel (ELOT) data, also knownas carrier walk sequence. Geocoding using US Census TIGER Geospatial directories. Canadian address standardization that is certified by CanadaPost with a certification in the Software Evaluation andRecognition Program (SERP). International addressing standardization based on relevantlocal laws/regulations.Figure 6: RPDM address analysis and transformationName ParsingRPDM includes a full range of name parsing capabilities, including: Splitting full names into component parts (including namefields that have been overloaded with multiple persons suchand John and Jane Smith).Adding custom names files to account for cultural namedifferences and known aliases.MatchingRPDM uses a mix of probabilistic and deterministic techniquesto easily identify, match, link, and de-duplicate files. The platformincludes out-of-the-box consumer (B2C) and business (B2B)matching with highly configurable tools that combine flexibilityand best practices into a single package.Figure 8: RedPoint matching automates keys, segments, and reportsRPDM supports simultaneous matching across multiple matchlevels (tightness or looseness of matching) and multiple matchtypes (e.g. name/address, name/phone number, name/accountnumber). Simultaneous matching allows different match criteriaand confidence factors to be specified for each pass/comparison,allowing for matches to layer without overmatching or having towrite complex integration rules.Duplication HandlingRPDM can roll up duplicate records to master records based on dataquality/completeness, data frequency, or user specified rules. Thesecapabilities support cross-matching data from multiple sources withcommon entities (e.g., customers, suppliers) that do not have sharedkeys. Other uses include grouping account-level data at the person/household/ group or business level to create “rollups” or commonkeys; creating unique records in files (de- duplication); and rolling updata across multiple records based on user specifications.Data Classification and StandardizationRedPoint Data Management provides out-of-the-boxstandardization for other field types, including: Phone (North American and international, area-code validity,area-code state, letter-number conversion, etc.). Standardization of name prefixes and salutations (e.g., Dr.,Mr., Mrs., the Honorable). Social Security Number formatting and validation (sequencevalidity, area and group validity); assignment cannot be validated. Probabilistic gender assignment (based on census name/gender assignment). Email format. Business name processing, including keyword and alternativename identification. Social media handle format (and validation via external webservice as provided by the various social media channels). URL format and URL encoding.PAGE 4 REDPOINT GLOBAL

Data ProfilingRPDM offers robust data profiling capabilities, including: Source validation tests input data against one or morespecified formats (data layout, type, values, etc.) foracceptance testing and flow control; matching is based onthe percentage of records that meet defined criteria and filescan be accepted or rejected in whole or in part. Profiling input data sources for counts (by value and null),uniqueness (by field or record), data characteristics (min/max, longest/shortest, etc.), and compliance with patterntemplates or masks (data type and data pattern). Table and column compliance with user-defined sets of rules(e.g., pattern templates and masks), range (by value or othercolumn), specific values (by value or another column), etc. Key constraints (e.g., uniqueness within a table or value/keyconstraints across tables), including support for multi-columnor compound keys.Persistent Key ManagementRedPoint Data Management includes tools that allow users toeasily manage persistent keys – creating, updating, merging, andsplitting as data changes over time.The platform is designed to deal with the complexities ofconstant key persistence (new/additional records, record splits/reassignment, historic key management, etc.) and householding(new/additional persons, household break-apart, maintaininghead-of-house across match-groups, etc.). Persistent keys canalso be assigned at the group level and source level, allowingcustomers to track performance at these levels.Important capabilities and features relating to key managementinclude: Handling new/updated data sources (new/additional databeing added). Support for natural changes in household structure, including merging groups (such as marriage or cohabitation), newgroup members (such as births and adoptions), and splits(such as divorce or deaths). Managing persistent keys at the person, household, group,address, and business levels. Enforcing unique key constraints. Compiling master records from multiple sources based oncompleteness, frequency, and validation rules.Figure 9: RPDM ProfilingGeospatial AnalysisRPDM provides many spatial analyses and transformations, including: Shapefile import/export. MID/MIF (MapInfo) import/export. Point-in-polygon (market penetration, political analysis, heatmap generation) analysis. Spatial join (territory overlap, productivity, flood plains). Find nearest neighbors (store location, service areas). Transform polygons to/from raw data lists (custom regionmanipulation logic). Spatial object operations such as inflate, cut, intersect, union,convex hull. Spatial summarize (aggregation of spatial objects). Grid cell (mapping of areas onto uniform grids for analysis).Figure 11: Persistent Key Support and Key MatchingProcess Automation andNotificationsRedPoint Data Management allows users to automate boththe processing and monitoring of data management jobs andprojects, resulting in highly streamlined and effective operations.AutomationsWith RPDM, users can build complex workflows (automations)that combine the various data-flow projects – ETL, data quality,customer data integration, and key management functions.Other automation capabilities include:Figure 10: RPDM geospatial analysis Scheduling jobs based on calendar, presence of files, orchanges to a database.PAGE 5 REDPOINT GLOBAL

Creating webs of interdependent jobs that are managed froma single control point. Executing external programs. Waiting for user review and acceptance. Transferring files via FTP. Determining source format and validity. Notifying users and operations of job outcomes.Automations are used to: Wait for files to appear in an upload directory, andautomatically process them when they do. Break a long transformation process into smaller steps, so asingle failure does not require a complete restart. Include other tools (e.g., compression, encryption, specializedfile transfer tools, client proprietary data processes) inRedPoint processes. Integrate other tools through standard in-process datatransfer (e.g., XML, data pipes and APIs, structured data files). Let the user enter parameters to control execution at startupsuch as filenames, filter options, or report options. Suspend execution midway and let the user review and editdata directly, for data stewardship approval, special-casecorrections, and fuzzy-matching review. Loop over all files in a directory and run the same set of stepson each file/data feed. Validate a file against many possible formats, and when aformat matches, take appropriate action. Remain “live” continuously, waiting for external factors totrigger execution (such as the appearance of a file, SQL queryresults, a calendar event, or an FTP data transfer).NotificationsAutomations use success/failure logic for error handling and cansend notifications to operators via email or SMS. Automations canloop over files and sets of values, and they have broad supportfor variables and parameters. Finally, automations automaticallysupport checkpoint and restart, so that if any steps fail in a multistep procedure, only the failed steps need be restarted.managed through RPDM’s integrated version control. Versioncontrol tracks user, update time, comments, and notes.Version control includes full rollback capabilities. Alternatively,RPDM jobs can be stored externally and managed by any toolcapable of handling XML files. Multi-server job distribution – RedPoint software can run ina centrally controlled environment with numerous processingnodes executing jobs. Execution monitoring and logging – All jobs can bemonitored by administrators and operators to determineboth overall progress and the actions of specific job steps.Developers and operators can drill down to the specificworking load (e.g., CPU, memory allocation, disk use) of eachcomponent in a job for monitoring and optimization. Command-line execution and web-service interfaces – AllowRPDM to be integrated into other operational control systems.ReportingRedPoint users maintain oversight of data management functionswith summarization and reporting capabilities, including: A project execution log that tracks records read, processedby various steps, and written to various outputs (includingdatabase load counts, non-processed record counts, etc.). Aggregate and summary functions (e.g., number of recordsby source or by criteria, min/max values) that can either bewritten to reports or out to databases

through RedPoint Data Management supports digital interactions that are primarily real time. Accurate recognition is the single most important step in providing a tailored and relevant experience for the customer in their moment of truth. RedPoint Data Management: Provides the most powerful set of advanced data quality,