Install Guide

Transcription

DMXInstall GuideVersion 9.10

DMX Install GuideCopyright 1990, 2020 Syncsort Incorporated. All rights reserved.This document contains unpublished, confidential, and proprietary information of SyncsortIncorporated. No disclosure or use of any portion of the contents of this document may be madewithout the express written consent of Syncsort Incorporated.Getting technical support: Customers with a valid maintenance contact can get technical assistancevia MySupport. There you will find product downloads and documentation for the products to whichyou are entitled, as well as an extensive knowledge base.Version 9.10Last Update: 15 May 2020

ContentsDMX Overview . 4Installing DMX/DMX-h. 4DMX-h Overview . 4Prerequisites . 5Step-by-Step Installation . 8Configuring the DMX Run-time Service . 22Applying a New License Key to an Existing Installation . 26Running DMX . 28Graphical User Interfaces . 28DMX Help . 28Connecting to Databases from DMX . 29Amazon Redshift . 29Azure Synapse Analytics (formerly SQL Data Warehouse) . 31Databricks . 33DB2 . 35Greenplum . 36Hive data warehouses . 38Apache Impala . 45Microsoft SQL Server . 48Netezza . 48NoSQL Databases . 50Oracle. 51Snowflake . 51Sybase . 54Teradata . 54Vertica . 55Other DBMSs. 56Defining ODBC Data Sources . 58Connecting to Message Queues from DMX . 60IBM WebSphere MQ . 60Connecting to Salesforce from DMX . 61Connecting to SAP from DMX . 62DMX Install Guidei

Registering DMX in SAP SLD . 63Connecting to HDFS from DMX . 63Connecting to Connect:Direct nodes from DMX . 63Security . 63Installation and Configuration . 63Connecting to CyberArk Enterprise Password Vault . 64CyberArk Licenses . 64Connecting to Protegrity Data Security Gateway. 65Connecting to QlikView data eXchange files from QlikView or Qlik Sense. 65QlikView desktop installation overview . 65Qlik Sense desktop installation overview . 65Connecting to Tableau Data Extract files from Tableau . 66Tableau desktop installation overview . 66Removing DMX/DMX-h from Your System . 66DMX installation component options . 69DMX Management Service installation and configuration . 70DMX DataFunnel run-time service install and configuration . 76Technical Support . 80iiDMX Install Guide

Documentation ConventionsThe following conventions are used in the format sections of the command options in this manual.ConventionExplanationExampleRegular typeItems in regular type must be entered literally usingeither lowercase or uppercase letters. Items may beabbreviated.ASCIIascendingItalics (non-bold)Items in italics (non-bold) represent variables. You mustsubstitute an appropriate numerical or text value for thevariable.file nameBraces { }Braces indicate that a choice must be made among itemscontained in the braces. The choices may be presentedin an aligned column, or on one line separated by avertical bar ( ).{"a" }{X"xx" }OR{AND OR}Brackets [ ]Brackets indicate that an item is optional. A choice maybe made among multiple items contained in brackets.[alias]OR[ -]Slash /A slash identifies a DMX option keyword. The slashmust be included when an option keyword is specified./INFILE/infileDouble quotes " "Double quotation marks that appear in a formatstatement must be specified literally."b"-"e"Ellipsis An ellipsis indicates that the preceding argument orgroup of arguments may be repeated.[expression ]SequencenumberA sequence number indicates that a series of argumentsor values may be specified. The sequence number itselfmust never be specified.field2DMX Install Guide3

DMX OverviewDMX is a high-performance data transformation product. With DMX you can design, schedule, andcontrol all your data transformations from a simple graphical interface on your Windows desktop.Data records can be input from many types of sources such as database tables, SAP systems,Salesforce.com objects, flat files, XML files, pipes, etc. The records can be aggregated, joined, sorted,merged, or just copied to the appropriate target(s). Before output, records can be filtered,reformatted, or otherwise transformed.Metadata, including record layouts, business rules, transformation definitions, run history and datastatistics, can be maintained either within a specific task or in a central repository. The effects ofmaking a change to your application can be analyzed through impact and lineage analysis.You can run your data transformations directly from your desktop, on any UNIX or Windows server,or schedule them for later execution, embed them in batch scripts, or invoke them from your ownprograms.Installing DMX/DMX-hInstalled DMX components are dependent on your license key: DMX server license key installs components based on whether you select a Standard, Full,Classic, or Custom installation. See DMX installation component options. DMX workstation license key installs the development client, Job and Task Editors; the DMXengine, dmxjob/dmexpress;; and the service for development client, which is the DMX Run-timeService, dmxd.The version of DMX server software must be at least as high as the version of the DMX clientsoftware that is used to develop jobs and connect to the server. Thus, when installing a new versionof DMX, ensure that you install the same release of DMX on your client and server machines. If youare upgrading and unable to install both the client and the server at the same time, you need toupgrade the server prior to upgrading the client.DMX-h OverviewDMX-h is the Hadoop-enabled edition of DMX, providing the following Hadoop functionality: ETL Processing in Hadoop – Develop a DMX-h ETL application entirely in the DMX GUI to runseamlessly in the Hadoop MapReduce framework, with no Pig, Hive, or Java programmingrequired. Currently, jobs can be run in either MapReduce or Spark. See the online DMX Helptopic "DMX-h”. Hadoop Sort Acceleration – Seamlessly replace the native sort within Hadoop MapReduceprocessing with the high-speed DMX engine sort, providing performance benefits withoutprogramming changes to existing MapReduce jobs. See the DMX-h Sort User Guide, which isincluded in the Documentation folder under your DMX software installation directory. Apache Spark Integration – Use the Spark mainframe connector to transfer mainframe data toHDFS. See the online DMX Help topic “Spark Mainframe Connector”. Apache Sqoop Integration – Use the Sqoop mainframe import connector to transfer mainframedata into HDFS. See the online DMX Help topic "Sqoop Mainframe Import Connector”.DMX-h RequirementsDMX-h requires the following:4DMX Install Guide

DMX-h Edition A supported Hadoop MapReduce and/or Spark distribution:oMapReduce Cloudera CDH 5.x (5.2 and higher) – YARN (MRv2) Hortonworks Data Platform (HDP) 2.x (2.3 and higher) – YARN Apache Hadoop 2.x (2.2 and higher) – YARN MapR, Community Edition and Enterprise Edition only (previously termed M5 andM7, respectively), 6.x – YARN Pivotal HD 3.0 – YARNDMX-h is certified as ODPi (1.0 and higher) interoperable.oSpark Spark on YARN on the following Hadoop distributions: Cloudera CDH 5.x (5.5 and higher) Hortonworks Data Platform (HDP) 2.3.4, 2.x (2.4 and higher) MapR 5.x (5.1 and higher), Community Edition and Enterprise Edition only(previously named M5 and M7, respectively) Spark on Mesos 0.21.0 Spark Standalone 1.5.2 and higherDMX-h Component Setup and OperationA DMX-h setup consists of the following: Windows workstationoooDMX must be installed as described in Step-by-Step Installation, Windows Systems.DMX Job and Task Editors are used for MapReduce job development.MapReduce jobs are submitted to Hadoop via the ETL server from the Job Editor. Linux ETL server (edge node)oooDMX must be installed as described in Step-by-Step Installation, UNIX Systems.The Hadoop client must be installed and configured to connect to the Hadoop cluster.The DMX Run-time Service, dmxd, must be running to respond to jobs run via theWindows workstation; it calls dmxjob with the /HADOOP option, which ultimately callshadoop to submit jobs to the cluster. Hadoop clusteroooDMX must be installed without dmxd on all nodes in the Hadoop cluster as described inStep-by-Step Installation, Hadoop Cluster.Each mapper and reducer runs the map side or reduce side task(s), respectively.All file descriptors for sources, targets, and intermediate files are carefully connected sothey fit into the Hadoop MapReduce flow.PrerequisitesBefore you install DMX on your system, ensure that the following are available: DMX software: This is generally shipped downloaded from Syncsort’s web site as a selfextracting executable file (Windows) or a tar file (UNIX).DMX Install Guide5

DMX license key: License keys are sent via e-mail as an attachment file calledDMExpressLicense.txt. If you need specific system information to obtain a license key, refer tothe section below on Getting DMX License Information.If you have a DMX server license key and plan to install DMX installation components, the type ofuser that you setup depends on whether impersonation privileges are extended. See DMXinstallation user setup considerations. Operating system: DMX runs on the following operating systems, with the listed release beingthe minimum supported. Both 32 bit and 64 bit versions are supported, unless otherwise stated:AIX release 6.1 64-bit; HP-UX release 11.31 IA64 64-bit; Linux kernel version 2.6.18 to 2.6.31with C library version 2.5 to 2.11 on Pentium-class x86 64 64-bit machines; Linux kernelversion 2.6.16 with C library version 2.4 on IBM System z 64-bit mainframes; SunOS 5.10SPARC 64-bit; Windows Vista; Windows 7; Windows 8.x; Windows 10; and Windows Server2008, 2012; and 2012 R2. Java version requirements: On Windows and UNIX/Linux systems, DMX requires Java runtimeversion 1.7 or higher unless you are only running DMX Sort, which does not use Java. DMXrequires JDK 7. Communication security protocol: On Windows and UNIX/Linux systems, DMX supportsTransport Layer Security (TLS) up to and including TLS version 1.2. User rights: Sufficient privileges to install and start Windows Services for Windows platformsand root privileges to install and start UNIX daemons on UNIX platforms. An umask setting of022 is required so that other users can run the installed executables. The installation proceduresets and resets umask if required. Pluggable Authentication Modules (PAM): If you want to use PAM for authentication on UNIXor Linux platforms, PAM must be installed and configured on the system. Database client software: If you want DMX to access data in database tables (either as datasource or target), then the appropriate database client software must be on the system andaccessible via the appropriate shared library or dynamic link library (dll) paths.For example, to access an Oracle database, Oracle Client must be installed on the system whereyou run DMX; to access a database via ODBC, an ODBC data source must be defined on thesystem where you run DMX. For details on how to connect to a specific Database ManagementSystem (DBMS), refer to the section Connecting to Databases from DMX. Message queue client software: If you want DMX to access data in a message queue, then theappropriate message queue client software must be on the system and accessible via theappropriate shared library or dynamic link library (dll) paths.For example, to access an IBM WebSphere MQ queue, IBM WebSphere MQ client must beinstalled on the system where you run DMX. For details on how to connect to a specific messagequeue type, refer to the section Connecting to Message Queues from DMX. SAP client software: If you want DMX to access data in an SAP system, then the appropriateSAP client software must be installed on the system where you run DMX and accessible via theappropriate shared library or dynamic link library (dll) paths. For details on how to connect toan SAP system, refer to the section Connecting to SAP from DMX. Hadoop software – If you want DMX to access data in a Hadoop Distributed File System (HDFS),or you want to run DMX-h ETL MapReduce jobs, then a Hadoop distribution configured toaccess the cluster must be installed on the edge/ETL node from which you run DMX. For detailson how to connect to HDFS, refer to the section Connecting to HDFS from DMX. Connect:Direct software – If you want DMX to access data using a Connect:Direct connection, aConnect:Direct server and client (CLI/API) must be installed on the system where you run DMXand must be configured to access the required Connect:Direct nodes. For details on how toconnect to a Connect:Direct node, refer to Connecting to Connect:Direct nodes from DMX. QlikView software – DMX supports QlikView data eXchange (QVX) files as targets. To accessQVX files as sources from QlikView or Qlik Sense, refer to Connecting to QlikView data eXchangefiles from QlikView or Qlik Sense.6DMX Install Guide

Tableau software – DMX supports Tableau Data Extract (TDE) files as targets. To access TDEfiles as sources from Tableau, refer to Connecting to Tableau Data Extract files from Tableau.DMX installation user setup considerationsThe type of user that you setup to install DMX installation components is dependent on whetherimpersonation privileges are extended: If you plan to use impersonation when running the DMX Run-time Service, dmxd, you mustinstall as root. When running the DataFunnel Run-time Service, dmxrund, considerations exist for the type ofuser that installs components.User setup when running dmxrundIf you do not plan to use impersonation when running dmxrund, setup a non-administrative user toinstall and run on Windows or setup a service user to install and run on Linux.Setup a non-administrative/service userWindowsAs the administrative user has impersonation privileges by default, setup a new user who does nothave administrative rights.LinuxTo install and run job requests without impersonation, create a service user, dmxuser, and run theinstallation as dmxuser.Setup impersonationIf you plan to use impersonation when running dmxrund, no user setup is required to install and runon Windows; setup an impersonated user to install and run on Linux.WindowsAs the administrative user has impersonation privileges by default, no setup is required.LinuxDMX installation impersonation considerations on Linux follow: No impersonation – Running jobs without impersonation does not require root access. Uponreceipt of a job submission request from the DMX management service, dmxmgr, dmxrund callsthe DMX engine, dmxdfnl, to run the submitted job as the service user, dmxuser. Impersonation – Running jobs with impersonation requires root access to impersonate thespecified user. While dmxrund never is granted root access, another installed component,dmxexecutor, can enable impersonation. When dmxrund detects that dmxexecutor is installedin the required directory with the correct permissions, dmxrund calls dmxexecutor toimpersonate the specific user that calls the DMX engine, dmxdfnl, which runs the submittedjobs.To install and run job requests with impersonation, do the following: Create a service user, dmxuser. Create a service group, dmexpress.Note: If you choose to change the name of the service group, you must update the SERVICE GROUPproperty of the DMX custom impersonation configuration properties file.DMX Install Guide7

Add dmxuser to the service group. Run the installation as dmxuser. Ensure that the following files are in the specified directories with the specified permissions:Directory and filePermissionsNotes DMX installation /bin/dmxexecutor-rwsr-x---The ‘s’ represents the set-user identification(setuid) bit and indicates that dmxexecutor isextended impersonation privileges to runsubmitted jobs as a specific user. DMX installation /conf/dmxexecutor.conf-rwx------Updates to dmxexecutor.conf are required onlyif you choose to customize the impersonation.Getting DMX License InformationTo obtain a license key, you need the computer name, the hardware model, the number of processors,and the operating system of each system on which DMX is to run. You can gather the systeminformation by running the DMX License Information program.Windows SystemsYou can run the DMX License Information program in the following ways: From Syncsort’s web site at:http://ww

4 DMX Install Guide DMX Overview DMX is a high-performance data transformation product. With DMX you can design, schedule, and control all your data transformations from a simple graphical interface on your Windows desktop.