Using Oracle GoldenGate For Big Data

Transcription

Oracle Fusion MiddlewareUsing Oracle GoldenGate for Big DataRelease 21c (21.1.0.0.0)F26378-06May 2022

Oracle Fusion Middleware Using Oracle GoldenGate for Big Data, Release 21c (21.1.0.0.0)F26378-06Copyright 2015, 2022, Oracle and/or its affiliates.This software and related documentation are provided under a license agreement containing restrictions onuse and disclosure and are protected by intellectual property laws. Except as expressly permitted in yourlicense agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverseengineering, disassembly, or decompilation of this software, unless required by law for interoperability, isprohibited.The information contained herein is subject to change without notice and is not warranted to be error-free. Ifyou find any errors, please report them to us in writing.If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it onbehalf of the U.S. Government, then the following notice is applicable:U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software,any programs embedded, installed or activated on delivered hardware, and modifications of such programs)and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government endusers are "commercial computer software" or "commercial computer software documentation" pursuant to theapplicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use,reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/oradaptation of i) Oracle programs (including any operating system, integrated software, any programsembedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oraclecomputer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in thelicense contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloudservices are defined by the applicable contract for such services. No other rights are granted to the U.S.Government.This software or hardware is developed for general use in a variety of information management applications.It is not developed or intended for use in any inherently dangerous applications, including applications thatmay create a risk of personal injury. If you use this software or hardware in dangerous applications, then youshall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure itssafe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of thissoftware or hardware in dangerous applications.Oracle, Java, and MySQL are registered trademarks of Oracle and/or its affiliates. Other names may betrademarks of their respective owners.Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks areused under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc,and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registeredtrademark of The Open Group.This software or hardware and documentation may provide access to or information about content, products,and services from third parties. Oracle Corporation and its affiliates are not responsible for and expresslydisclaim all warranties of any kind with respect to third-party content, products, and services unless otherwiseset forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not beresponsible for any loss, costs, or damages incurred due to your access to or use of third-party content,products, or services, except as set forth in an applicable agreement between you and Oracle.

ContentsPreface1AudiencexxvDocumentation AccessibilityxxvConventionsxxvRelated InformationxxviIntroducing Oracle GoldenGate for Big Data1.1What’s Supported in Oracle GoldenGate for Big Data?1-11.2Configuring Oracle GoldenGate for Big Data1-11.2.1Running with Replicat1-11.2.1.1Configuring Replicat1-21.2.1.2Adding the Replicat Process1-21.2.1.3Replicat Grouping1-21.2.1.4About Replicat Checkpointing1-31.2.1.5About Initial Load Support1-31.2.1.6About the Unsupported Replicat Features1-31.2.1.7How the Mapping Functionality Works1-31.2.2Overview of Logging1-31.2.2.1About Replicat Process Logging1-41.2.2.2About Java Layer Logging1-41.2.3About Schema Evolution and Metadata Change Events1-51.2.4About Configuration Property CDATA[] Wrapping1-51.2.5Using Regular Expression Search and Replace1-61.2.5.1Using Schema Data Replace1-61.2.5.2Using Content Data Replace1-71.2.6Scaling Oracle GoldenGate for Big Data Delivery1-81.2.7Configuring Cluster High Availability1-111.2.8Using Identities in Oracle GoldenGate Credential Store1-121.2.8.1Creating a Credential Store1-121.2.8.2Adding Users to a Credential Store1-121.2.8.3Configuring Properties to Access the Credential Store1-13iii

2Getting Started with Oracle GoldenGate (Classic) for Big Data2.1Verifying Certification, System, and Interoparability Requirements2-12.2What are the Additional Support Considerations?2-22.3About Oracle GoldenGate Properties Files2-42.3.1352-42.4Setting Up the Java Runtime Environment2-42.5Configuring Java Virtual Machine Memory2-52.6Using GGSCI2-62.7Grouping Transactions2-62.8Controlling Oracle GoldenGate (Classic) Processes2-6Getting Started with Oracle GoldenGate (MA) for Big Data3.1Working With Deployments3-13.2About Oracle GoldenGate Properties Files3-13.2.14Parameter FilesParameter Files3-23.3Using the Admin Client3-23.4Controlling Oracle GoldenGate (MA) Processes3-2Dependency Downloader4.1Dependency Downloader Setup4-14.2Running the Dependency Downloader Scripts4-24.3Dependency Downloader Scripts4-3Using the BigQuery Handler5.15.2Detailing the Functionality5-15.1.1Data Types5-15.1.2Metadata Support5-25.1.3Operation Modes5-25.1.4Operation Processing Support5-35.1.5Proxy Settings5-45.1.6Mapping to Google Datasets5-4Setting Up and Running the BigQuery Handler5-45.2.1Schema Mapping for BigQuery5-55.2.2Understanding the BigQuery Handler Configuration5-55.2.3Review a Sample Configuration5-75.2.4Configuring Handler Authentication5-7iv

6Using the Cassandra Handler6.1Overview6-16.2Detailing the Functionality6-16.36.476.2.1About the Cassandra Data Types6-26.2.2About Catalog, Schema, Table, and Column Name Mapping6-36.2.3About DDL Functionality6-36.2.3.1About the Keyspaces6-46.2.3.2About the Tables6-46.2.3.3Adding Column Functionality6-46.2.3.4Dropping Column Functionality6-56.2.4How Operations are Processed6-56.2.5About Compressed Updates vs. Full Image Updates6-66.2.6About Primary Key Updates6-7Setting Up and Running the Cassandra Handler6-76.3.1Understanding the Cassandra Handler Configuration6-86.3.2Review a Sample Configuration6-116.3.3Configuring Security6-11About Automated DDL Handling6-126.4.1About the Table Check and Reconciliation Process6-126.4.2Capturing New Change Data6-126.5Performance Considerations6-136.6Additional Considerations6-136.7Troubleshooting6-146.7.1Java Classpath6-146.7.2Write Timeout Exception6-146.7.3Datastax Driver Error6-15Using the Elasticsearch Handler7.1Overview7-17.2Detailing the Functionality7-17.37.2.1About the Elasticsearch Version Property7-27.2.2About the Index and Type7-27.2.3About the Document7-27.2.4About the Primary Key Update7-37.2.5About the Data Types7-37.2.6Operation Mode7-37.2.7Operation Processing Support7-37.2.8About the Connection7-4Setting Up and Running the Elasticsearch Handler7.3.1Configuring the Elasticsearch Handler7-47-4v

7.487.3.1.1Common Configurable Properties7-77.3.1.2Transport Client Configurable Properties7-87.3.1.3Transport Client Setting Properties File7-97.3.1.4Classpath Settings for Transport Client7-97.3.1.5REST Client Configurable Properties7-97.3.1.6Authentication for REST Client7-137.3.1.7Classpath Settings for REST Client7-13Troubleshooting7.4.1Incorrect Java Classpath7-147.4.2Elasticsearch Version Mismatch7-147.4.3Transport Client Properties File Not Found7-147.4.4Cluster Connection Problem7-157.4.5Unsupported Truncate Operation7-157.4.6Bulk Execute Errors7-157.5Performance Consideration7-167.6About the Shield Plug-In Support7-167.7About DDL Handling7-167.8Known Issues in the Elasticsearch Handler7-16Using the File Writer Handler8.1Overview8.1.197-138-1Detailing the Functionality8-28.1.1.1Using File Roll Events8-28.1.1.2Automatic Directory Creation8-48.1.1.3About the Active Write Suffix8-48.1.1.4Maintenance of State8-48.1.2Configuring the File Writer Handler8-48.1.3Stopping the File Writer Handler8-128.1.4Review a Sample Configuration8-128.1.5File Writer Handler Partitioning8-138.1.5.1File Writer Handler Partitioning Precondition8-138.1.5.2Path Configuration8-138.1.5.3Partitioning Configuration8-148.1.5.4Partitioning Effect on Event Handler8-14Using the HDFS Event Handler9.1Detailing the Functionality9.1.19-1Configuring the Handler9-1vi

9.1.2101210.1Overview10-110.2Detailing the Functionality10-110.2.1About the Upstream Data Format10-110.2.2About the Library Dependencies10-110.2.3Requirements10-1Configuring the ORC Event Handler11.1Overview11-111.2Detailing the Functionality11-111.3Configuring the Oracle Cloud Infrastructure Event Handler11-211.4Configuring Credentials for Oracle Cloud Infrastructure11-711.5Troubleshooting11-8Using the Parquet Event Handler12.1Overview12-112.2Detailing the Functionality12-112.2.1Configuring the Parquet Event Handler to Write to HDFS12-112.2.2About the Upstream Data Format12-2Configuring the Parquet Event Handler12-2Using the S3 Event Handler13.1Overview13-113.2Detailing Functionality13-113.2.1Resolving AWS Credentials13.2.1.1Amazon Web Services Simple Storage Service Client Authentication13-113-213.2.2About the AWS S3 g the Oracle Cloud Infrastructure Event Handler12.3139-1Using the Optimized Row Columnar Event Handler10.311Configuring the HDFS Event HandlerConfiguring the S3 Event Handler13-4Using the Command Event Handler14.1Overview - Command Event Handler14-114.2Configuring the Command Event Handler14-1vii

14.3151617Using Command Argument Template Strings14-2Using the Redshift Event Handler15.1Detailed Functionality15-115.2Operation Aggregation15-215.2.1Aggregation In Memory15-215.2.2Aggregation using SQL post loading data into the staging table15-215.3Unsupported Operations and Limitations15-215.4Uncompressed UPDATE records15-315.5Error During the Data Load Proces15-315.6Troubleshooting and 5-415.9Redshift COPY SQL Authorization15-7Using the Autonomous Data Warehouse Event Handler16.1Detailed Functionality16-116.2ADW Database Credential to Access OCI ObjectStore File16-116.3ADW Database User Privileges16-216.4Unsupported Operations/ Limitations16-216.5Troubleshooting and 6-516.7.1Automatic Configuration16-616.7.2File Writer Handler Configuration16-616.7.3OCI Event Handler Configuration16-616.7.4ADW Event Handler Configuration16-616.7.5End-to-End Configuration16-7Using the HBase Handler17.1Overview17-117.2Detailed Functionality17-117.3Setting Up and Running the HBase Handler17-217.3.1Classpath Configuration17-217.3.2HBase Handler Configuration17-317.3.3Sample Configuration17-517.3.4Performance Considerations17-617.4Security17-617.5Metadata Change Events17-7viii

1817.6Additional Considerations17-717.7Troubleshooting the HBase Handler17-717.7.1Java Classpath17-717.7.2HBase Connection Properties17-717.7.3Logging of Handler Configuration17-817.7.4HBase Handler Delete-Insert Problem17-8Using the HDFS Handler18.1Overview18-118.2Writing into HDFS in SequenceFile Format18-118.2.1Integrating with Hive18-218.2.2Understanding the Data Format18-2Setting Up and Running the HDFS Handler18-218.318.3.1Classpath Configuration18-318.3.2HDFS Handler Configuration18-318.3.3Review a Sample Configuration18-818.3.4Performance Considerations18-918.3.5Security18-918.4Writing in HDFS in Avro Object Container File Format18-918.5Generating HDFS File Names Using Template Strings18-1018.6Metadata Change Events18-1118.7Partitioning18-1118.8HDFS Additional Considerations18-1218.9Best Practices18-1318.1019Troubleshooting the HDFS Handler18-1418.10.1Java Classpath18-1418.10.2Java Boot Options18-1418.10.3HDFS Connection Properties18-1418.10.4Handler and Formatter Configuration18-15Using the Java Database Connectivity Handler19.1Overview19-119.2Detailed Functionality19-119.2.1Single Operation Mode19-219.2.2Oracle Database Data Types19-219.2.3MySQL Database Data Types19-219.2.4Netezza Database Data Types19-319.2.5Redshift Database Data Types19-3Setting Up and Running the JDBC Handler19-319.3ix

19.3.1Java Classpath19-419.3.2Handler Configuration19-419.3.3Statement Caching19-519.3.4Setting Up Error Handling19-519.42021Sample Configurations19-719.4.1Sample Oracle Database Target19-719.4.2Sample Oracle Database Target with JDBC Metadata Provider19-719.4.3Sample MySQL Database Target19-819.4.4Sample MySQL Database Target with JDBC Metadata Provider19-8Using the Java Message Service Handler20.1Overview20-120.2Setting Up and Running the JMS Handler20-120.2.1Classpath Configuration20-220.2.2Java Naming and Directory Interface Configuration20-220.2.3Handler Configuration20-220.2.4Sample Configuration Using Oracle WebLogic Server20-7Using the Kafka Handler21.1Overview21-121.2Detailed Functionality21-221.3Setting Up and Running the Kafka Handler21-321.3.1Classpath Configuration21-321.3.2Kafka Handler Configuration21-421.3.3Java Adapter Properties File21-521.3.4Kafka Producer Configuration File21-621.3.5Using Templates to Resolve the Topic Name and Message Key21-621.3.6Kafka Configuring with Kerberos21-721.3.7Kafka SSL Support21-1021.4Schema Propagation21-1121.5Performance Considerations21-1121.6About Security21-1221.7Metadata Change Events21-1221.8Snappy Considerations21-1221.9Kafka Interceptor Support21-1221.10Kafka Partition fy the Kafka Setup21-1421.11.2Classpath Issues21-14x

222321.11.3Invalid Kafka Version21-1421.11.4Kafka Producer Properties File Not Found21-1421.11.5Kafka Connection Problem21-15Using the Kafka Connect Handler22.1Overview22-122.2Detailed Functionality22-222.3Setting Up and Running the Kafka Connect Handler22-422.3.1Kafka Connect Handler Configuration22-422.3.2Using Templates to Resolve the Topic Name and Message Key22-1322.3.3Configuring Security in the Kafka Connect Handler22-1322.4Connecting to a Secure Schema Registry22-1322.5Kafka Connect Handler Performance Considerations22-1422.6Kafka Interceptor Support22-1422.7Kafka Partition Selection22-1522.8Troubleshooting the Kafka Connect Handler22-1622.8.1Java Classpath for Kafka Connect Handler22-1622.8.2Invalid Kafka Version22-1622.8.3Kafka Producer Properties File Not Found22-1622.8.4Kafka Connection Problem22-17Using the Kafka REST Proxy Handler23.1Overview23-123.2Setting Up and Starting the Kafka REST Proxy Handler Services23-123.2.1Using the Kafka REST Proxy Handler23-223.2.2Downloading the Dependencies23-223.2.3Classpath Configuration23-223.2.4Kafka REST Proxy Handler Configuration23-223.2.5Review a Sample g a Keystore or Truststore23-623.2.7.1Setting Metacolumn Output23-723.2.8Using Templates to Resolve the Topic Name and Message Key23-1023.2.9Kafka REST Proxy Handler Formatter Properties23-1023.3Consuming the Records23-1423.4Performance Considerations23-1523.5Kafka REST Proxy Handler Metacolumns Template Property23-15xi

24Using the Kinesis Streams Handler24.1Overview24-124.2Detailed Functionality24-124.2.1Amazon Kinesis Java SDK24-124.2.2Kinesis Streams Input Limits24-224.3Setting Up and Running the Kinesis Streams Handler24.3.1Set the Classpath in Kinesis Streams Handler24-324.3.2Kinesis Streams Handler Configuration24-324.3.3Using Templates to Resolve the Stream Name and Partition Name24-924.3.4Resolving AWS Credentials24.3.4.124-10AWS Kinesis Client Authentication24-1024.3.5Configuring the Proxy Server for Kinesis Streams Handler24-1124.3.6Configuring Security in Kinesis Streams Handler24-1224.4Kinesis Handler Performance Considerations24-1224.4.1Kinesis Streams Input Limitations24-1224.4.2Transaction Batching24-1324.4.3Deferring Flush at Transaction a Classpath24-1424.5.2Kinesis Handler Connectivity Issues24-1424.5.3Logging24-14Using the MongoDB Handler25.1Overview25-125.2MongoDB Wire Protocol25-125.3Supported Target Types25-125.4Detailed Functionality25-125.4.1Document Key Column25-225.4.2Primary Key Update Operation25-225.4.3MongoDB Trail Data Types25-225.5Setting Up and Running the MongoDB Handler25-325.5.1Classpath Configuration25-325.5.2MongoDB Handler Configuration25-325.5.3Using Bulk Write25-525.5.4Using Write Concern25-625.5.5Using Three-Part Table Names25-625.5.6Using Undo Handling25-625.6Reviewing Sample Configurations25-625.7MongoDB to AJD/ATP Migration25-725.7.1Overview25-8xii

2625.7.2Configuring MongoDB handler to Write to AJD/ATP25-825.7.3Steps for Migration25-825.7.4Best Practices25-9Using the Metadata Providers26.1About the Metadata Providers26-126.2Avro Metadata Provider26-226.2.1Detailed Functionality26-226.2.2Runtime Prerequisites26-426.2.3Classpath Configuration26-426.2.4Avro Metadata Provider Configuration26-426.2.5Review a Sample Configuration26-426.2.6Metadata Change ng26-626.326.2.8.1Invalid Schema Files Location26-626.2.8.2Invalid Schema File Name26-626.2.8.3Invalid Namespace in Schema File26-726.2.8.4Invalid Table Name in Schema File26-7Java Database Connectivity Metadata Provider26-826.3.1JDBC Detailed Functionality26-826.3.2Java Classpath26-926.3.3JDBC Metadata Provider Configuration26-926.3.4Review a Sample Configuration26-926.4Hive Metadata Provider26.4.1Detailed Functionality26-1126.4.2Configuring Hive with a Remote Metastore Database26-1226.4.3Classpath Configuration26-1326.4.4Hive Metadata Provider Configuration Properties26-1426.4.5Review a Sample a Change Event26-1826.4.8Limitations26-1826.4.9Additional -18Using the Oracle NoSQL Handler27.1Overview27-127.2On-Premise Connectivity27-1xiii

27.2.1Server Authentication27-227.2.2Client Authentication27-227.2.3Sample On-Premise Oracle NoSQL Configuration27-227.3OCI Cloud Connectivity27.3.1Server Authentication27-327.3.2Client Authentication27-327.3.3Sample Cloud Oracle NoSQL Configuration27-327.3.4Sample OCI Configuration file27-327.4Oracle NoSQL Types27-327.5Oracle NoSQL Handler Configuration27-427.6Performance Considerations27-727.7Operation Processing Support27-827.8Column Processing27-827.9Table Check and Reconciliation Process27-927.9.12827-2Full Image Data Requirements27-9Using the Pluggable Formatters28.1Using Operation-Based versus Row-Based Formatting28-128.1.1Operation Formatters28-228.1.2Row Formatters28-228.1.3Table Row or Column Value States28-228.2Using the Avro Formatter28.2.128-3Avro Row Formatter28-328.2.1.1Operation Metadata Formatting Details28-328.2.1.2Operation Data Formatting Details28-428.2.1.3Sample Avro Row Messages28-528.2.1.4Avro Schemas28-628.2.1.5Avro Row Configuration Properties28-728.2.1.6Review a Sample Configuration28-1428.2.1.7Metadata Change Events28-1428.2.1.8Special Considerations28-1428.2.2The Avro Operation Formatter28-1628.2.2.1Operation Metadata Formatting Details28-1628.2.2.2Operation Data Formatting Details28-1728.2.2.3Sample Avro Operation Messages28-1828.2.2.4Avro Schema28-2028.2.2.5Avro Operation Formatter Configuration Properties28-2228.2.2.6Review a Sample Configuration28-2628.2.2.7Metadata Change Events28-2628.2.2.8Special Considerations28-26xiv

28.2.3Avro Object Container File Formatter28.2.3.128.2.428.3Setting Metacolumn OutputUsing the Delimited Text Formatter28.3.1Using the Delimited Text Row Formatter28-2828-3228-3228-3328.3.1.1Message Formatting Details28-3328.3.1.2Sample Formatted Messages28-3428.3.1.3Output Format Summary Log28-3528.3.1.4Configuration28-3528.3.1.5Metadata Change Events28-3528.3.1.6Setting Metacolumn Output28-3628.3.1.7Additional Considerations28-3828.3.228.4Avro OCF Formatter Configuration Properties28-27Delimited Text Operation Formatter28-3928.3.2.1Message Formatting Details28-4028.3.2.2Sample Formatted Messages28-4128.3.2.3Output Format Summary Log28-4128.3.2.4Delimited Text Formatter Configuration Properties28-4228.3.2.5Review a Sample Configuration28-4428.3.2.6Metadata Change Events28-4428.3.2.7Setting Metacolumn Output28-4428.3.2.8Additional Considerations28-46Using the JSON Formatter28-4628.4.1Operation Metadata Formatting Details28-4728.4.2Operation Data Formatting Details28-4728.4.3Row Data Formatting Details28-4828.4.4Sample JSON Messages28-4928.4.4.1Sample Operation Modeled JSON Messages28-4928.4.4.2Sample Flattened Operation Modeled JSON Messages28-5028.4.4.3Sample Row Modeled JSON Messages28-5228.4.4.4Sample Primary Key Output JSON Message28-5328.4.5JSON Schemas28-5328.4.6JSON Formatter Configuration Properties28-6128.4.7Review a Sample Configuration28-6328.4.8Metadata Change Events28-6428.4.9Setting Metacolumn Output28-6428.4.10JSON Primary Key Updates28-6528.4.11Integrating Oracle Stream Analytics28-6628.5Using the Length Delimited Value Formatter28-6628.5.1Formatting Message Details28-6728.5.2Sample Formatted Messages28-6728.5.3LDV Formatter Configuration Properties28-67xv

28.5.428.629Additional Considerations28-70Using the XML Formatter28-7028.6.1Message Formatting Details28-7128.6.2Sample XML Messages28-7128.6.2.1Sample Insert Message28-7128.6.2.2Sample Update Message28-7228.6.2.3Sample Delete Message28-7328.6.2.4Sample Truncate Message28-7428.6.3XML Schema28-7428.6.4XML Formatter Configuration Properties28-7528.6.5Review a Sample Configuration28-7628.6.6Metadata Change Events28-7728.6.7Setting Metacolumn Output28-7728.6.8Primary Key Updates28-78Using Oracle GoldenGate Capture for Cassandra29.1Overview29-129.2Setting Up Cassandra Change Data Capture29-229.2.1Setup SSH Connection to the Cassandra Nodes29-229.2.2Data Types29-329.2.3Cassandra Database Operations29-429.3Deduplication29-429.4Topology Changes29-429.5Data Availability in the CDC Logs29-529.6Using Extract Initial Load29-529.7Using Change Data Capture Extract29-629.8Replicating to RDMBS Targets29-729.9Partition Update or Insert of Static Columns29-829.10Partition Delete29-829.11Security and Authentication29-929.11.129.12Configuring SSL29-9Cleanup of CDC Commit Log Files29.12.129-10Cassandra CDC Commit Log Purger29-1029.12.1.1How to Run the Purge Utility29-1129.12.1.2Sample config.properties for Local File System29-1229.12.1.3Argument cassCommitLogPurgerConfFile29-1229.12.1.4Argument purgeInterval29-1429.13Multiple Extract Support29-1429.14CDC Configuration Reference29-14xvi

29.1530TroubleshootingUsing Oracle GoldenGate Capture for MongoDB30.1Overview30-130.2Setting up MongoDB30-130.3MongoDB Database Operations30-230.4Using Extract Initial Load30-330.5Using Change Data Capture Extract30-330.6Positioning the Extract30-430.7Security and Authentication30-430.7.13129-20SSL Configuration Setup30-630.8Mongo DB Configuration Reference30-830.9Columns in Trail File30-1030.10Update Operation Behavior30-1230.11Oplog Size Recommendations30-1330.12Troubleshooting30-14Using Oracle GoldenGate Capture for Kafka31.1Overview31-131.2General Terms and Functionality of Kafka Capture31-131.2.1Kafka Streams31-131.2.2Kafka Message Order31-231.2.3Kafka Message Timestamps31-331.2.4Kafka Message Coordinates31-331.2.5Start Extract Modes31-331.2.5.1Start Earliest31-331.2.5.2Start Timestamp31-431.2.6General Configuration Overview31-431.2.7GLOBALS File31-431.2.8The Extract Parameter File31-531.2.9Kafka Consumer Properties File31-631.2.1031.3Generic Mutation Builder31-6Kafka Connect Mutation Builder31-731.3.1Functionality and Limitations of the Kafka Connect Mutation Builder31-831.3.2Primary Key31-831.3.3Kafka Message Key31-831.3.4Kafka Connect Supported Types31-831.3.5How to Enable the Kafka Connect Mutation Builder31-931.4Example Configuration Files31-11xvii

31.4.1Example GLOBALS File31-1131.4.2Example kc.prm file31-1131.4.3Example Kafka Consumer Properties File31-1132Connecting to Microsoft Azure Data Lake33Connecting to Microsoft Azure Data Lake Gen 2 (or Microsoft AzureBlob Storage)34Using the Microsoft Azure Synapse Analytics Event Handler34.1Detailed Functionality34-134.1.1Database User Privileges34-134.1.2Merge SQL ion34.2.134-2Automatic Configuration34-234.2.1.1File Writer Handler Configuration34-334.2.1.2Parquet Event Handler Configuration34-334.2.1.3Synapse Event Handler Configuration34-334.2.2Synapse Database Credentials34-534.2.3Classpath lasspath34-634.2.4Initial load Performance34-634.2.5Large Object (LOB) Performance34-734.2.6End-to-End Configuration34-734.3Troubleshooting and Diagnostics35Connecting to Microsoft Azure Event Hubs36Stage and Merge Data Warehouse Replication36.134-8Steps for Stage and guration of Handlers36-2xviii

36.1.4File Writer Handler36-336.1.5Operation Aggregation36-336.1.6Object Store Event handler36-336.1.7JDBC Metadata Provider36-336.1.8Command Event handler Merge Script36-336.1.9Stage and Merge Sample Configuration36-336.1.10Variables in the Merge Script36-436.1.11SQL Statements in the Merge Script36-436.1.12Merge Script tions36-636.2Hive Stage and Merge36.2.1Data Flow36-636.2.2Configuration36-736.2.3Merge Script Variables36-736.2.4Prerequisites36-737Connecting to Oracle Streaming Service38Using the Azure Blob Storage Event orage Account, Container, and Objects38-138.4Configuration38-138.4.1Classpath ntication38-538.4.3.1Azure Tenant ID, Client ID, and Client Secret38-538.4.4Proxy Configuration38-638.4.5Sample Configuration38-638.4.6Azure Government Cloud Configuration38-638.53936-6Troubleshooting and Diagnostics38-7Using the Snowflake Event Handler39.1Overview39-139.2Detailed Functionality39-139.2.1Staging Location39-139.2.2Database User Privileges39-2xix

.1.1File Writer Handler Configuration39-339.3.1.2S3 Handler Configuration39-339.3.1.3HDFS Event Handler Configuration39-439.3.1.4Google Cloud Storage Event Handler Configuration39-439.3.1.5Snowflake Event Handler Configuration39-439.3.2Snowflake Storage Integration39-1039.3.3Classpath Configuration39-11Dependencies39-1139.3.4Proxy Configuration39-1239.3.5Initial load Performance39-1239.3.6Snowflake Key Pair Authentication39-1239.3.7Mapping Source JSON/XML to Snowflake VARIANT39-1339.3.8End-to-End Configuration39-1439.4Troubleshooting and Diagnostics39-15Using the Google Cloud Storage Event ckets and Objects40-140.4Authentication and Authorization40-140.4.1Bucket Permissions40-240.4.2Object h Configuration40.5.1.14139-2Automatic 0.5.2Proxy Configuration40-1140.5.3Sample Configuration40-11Using the Google BigQuery Stage and Merge41.1Overview41-141.2Detailed ces between BigQuery Handler and Stage and Merge BigQuery EventHandler41-2Authentication or Authorization41-241.541.5.141.6BigQuery Permissions41-2Configuration41-5xx

41.6.1Automatic Configuration41.6.1.1File Writer Handler Configuration41-541.6.1.2GCS Event Handler Configuration41-541.6.1.3BigQuery Event Handler Configuration41-641.6.2Classpath Configuration41-841.6.3Proxy Configuration41-941.6.4Initial load performance41-941.6.5End-to-End Configuration41-941.7Troubleshooting and DiagnosticsATemplate KeywordsBMetacolumn KeywordsCGoogle BigQuery DependenciesC.1DBigQuery 1.135.4Oracle NoSQL SDK Dependencies 5.2.27C-1D-1Cassandra Handler Client DependenciesE.1Cassandra Datastax Java Driver 4.12.0E-1E.2Cassandra Datastax Java Driver 4.9.0E-2FCassandra Capture Client DependenciesGElasticsearch Handler Transport Client DependenciesH41-10Oracle NoSQL SDK DependenciesD.1E41-5G.1Elasticsearch 7.13.3 with X-Pack 7.13.3G-1G.2Elasticsearch 7.10.0 with X-Pack 7.10.0G-2G.3Elasticsearch 6.8.13 with X-Pack 6.8.13G-3Elasticsearch High Level REST Client DependenciesH.1Elasticsearch 7.13.3H-1xxi

H.2II.1HBase 2.4.4I-1I.2HBase 2.3.3I-3I.3HBase 2.2.0I-3I.4HBase 2.1.5I-5I.5HBase 2.0.5I-6I.6HBase 1.4.10I-7I.7HBase 1.3.3I-8I.8HBase 1.2.5I-9I.9HBase 1.1.1I-10HBase 1.0.1.1I-11HDFS Handler Client DependenciesJ.1KH-2HBase Handler Client DependenciesI.10JElasticsearch 7.6.1Hadoop Client DependenciesJ-1J.1.1HDFS 3.3.0J-1J.1.2HDFS 3.2.0J-3J.1.3HDFS 3.1.4J-5J.1.4HDFS 3.0.3J-6J.1.5HDFS 2.9.2J-8J.1.6HDFS 2.8.5J-9J.1.7HDFS 2.7.7J-11J.1.8HDFS 2.6.0J-12J.1.9HDFS 2.5.2J-13J.1.10HDFS 2.4.1J-14J.1.11HDFS 2.3.0J-15J.1.12HDFS 2.2.0J-16Kafka Handler Client DependenciesK.1Kafka 2.8.0K-1K.2Kafka 2.7.0K-1K.3Kafka 2.6.0K-2K.4Kafka 2.5.1K-2K.5Kafka 2.4.1K-2K.6Kafka 2.3.1K-2K.7Kafka 2.2.1K-2K.8Kafka 2.1.0K-2K.9Kafka 2.0.0K-3xxii

LK.10Kafka 1.1.1K-3K.11Kafka 1.0.2K-3K.12Kafka 0.11.0.0K-3K.13Kafka 0.10.2.0K-3K.14Kafka 0.10.1.1K-3K.15Kafka 0.10.0.1K-3K.16Kafka 0.9.0.1K-4Kafka Connect Handler Client DependenciesL.1Kafka 2.8.0L-1L.2Kafka 2.7.1L-2L.3Kafka 2.6.0L-2L.4Kafka 2.5.1L-2L.5Kafka 2.4.1L-2L.6Kafka 2.3.1L-3L.7Kafka 2.2.1L-3L.8Kafka 2.1.1L-3L.9Kafka 2.0.1L-3L.10Kafka 1.1.1L-4L.11Kafka 1.0.2L-4L.12Kafka 0.11.0.0

1 Introducing Oracle GoldenGate for Big Data 1.1 What's Supported in Oracle GoldenGate for Big Data? 1-1 1.2 Configuring Oracle GoldenGate for Big Data 1-1 1.2.1 Running with Replicat 1-1 1.2.1.1 Configuring Replicat 1-2 1.2.1.2 Adding the Replicat Process 1-2 1.2.1.3 Replicat Grouping 1-2 1.2.1.4 About Replicat Checkpointing 1-3