Install Guide - Microsoft

Transcription

DMXInstall GuideVersion 9.10

DMX Install GuideCopyright 1990, 2020 Syncsort Incorporated. All rights reserved.This document contains unpublished, confidential, and proprietary information of SyncsortIncorporated. No disclosure or use of any portion of the contents of this document may be madewithout the express written consent of Syncsort Incorporated.Getting technical support: Customers with a valid maintenance contact can get technical assistancevia MySupport. There you will find product downloads and documentation for the products to whichyou are entitled, as well as an extensive knowledge base.Version 9.10Last Update: 21 October 2020

ContentsDMX Overview . 4Installing DMX/DMX-h. 4DMX-h Overview . 4Prerequisites . 5Step-by-Step Installation . 9Configuring the DMX Run-time Service . 28Applying a New License Key to an Existing Installation . 31Running DMX . 33Graphical User Interfaces . 34DMX Help . 34Connecting to Databases from DMX . 34Amazon Redshift . 34Azure Synapse Analytics (formerly SQL Data Warehouse) . 36Databricks . 38DB2 . 41Greenplum . 41Hive data warehouses . 43Apache Impala . 65Microsoft SQL Server . 68Netezza . 68NoSQL Databases . 70Oracle. 71Snowflake . 72Sybase . 74Teradata . 74Vertica . 75Other DBMSs. 77Defining ODBC Data Sources . 79Connecting to Message Queues from DMX . 80IBM WebSphere MQ . 80Connecting to Salesforce from DMX . 82Connecting to SAP from DMX . 82DMX Install Guidei

Registering DMX in SAP SLD . 83Connecting to HDFS from DMX . 84Connecting to Connect:Direct nodes from DMX . 84Security . 84Installation and Configuration . 84Connecting to Databricks File Systems (DBFSs) . 85Databricks File System (DBFS) connection requirements . 85Defining Databricks File System (DBFS) connections . 87Connecting to CyberArk Enterprise Password Vault . 88CyberArk Licenses . 88Connecting to Protegrity Data Security Gateway. 88Connecting to QlikView data eXchange files from QlikView or Qlik Sense. 88QlikView desktop installation overview . 89Qlik Sense desktop installation overview . 89Connecting to Tableau Data Extract files from Tableau . 89Tableau desktop installation overview . 90Removing DMX/DMX-h from Your System . 90DMX installation component options . 93DMX Management Service installation and configuration . 94DMX DataFunnel run-time service install and configuration . 100Technical Support . 104iiDMX Install Guide

Documentation ConventionsThe following conventions are used in the format sections of the command options in this manual.ConventionExplanationExampleRegular typeItems in regular type must be entered literally usingeither lowercase or uppercase letters. Items may beabbreviated.ASCIIascendingItalics (nonbold)Items in italics (non-bold) represent variables. You mustsubstitute an appropriate numerical or text value for thevariable.file nameBraces { }Braces indicate that a choice must be made among itemscontained in the braces. The choices may be presentedin an aligned column, or on one line separated by avertical bar ( ).{"a" }{X"xx" }OR{AND OR}Brackets [ ]Brackets indicate that an item is optional. A choice maybe made among multiple items contained in brackets.[alias]OR[ -]Slash /A slash identifies a DMX option keyword. The slashmust be included when an option keyword is specified./INFILE/infileDouble quotes""Double quotation marks that appear in a formatstatement must be specified literally."b"-"e"Ellipsis An ellipsis indicates that the preceding argument orgroup of arguments may be repeated.[expression ]SequencenumberA sequence number indicates that a series of argumentsor values may be specified. The sequence number itselfmust never be specified.field2DMX Install Guide3

DMX OverviewDMX is a high-performance data transformation product. With DMX you can design, schedule, andcontrol all your data transformations from a simple graphical interface on your Windows desktop.Data records can be input from many types of sources such as database tables, SAP systems,Salesforce.com objects, flat files, XML files, pipes, etc. The records can be aggregated, joined, sorted,merged, or just copied to the appropriate target(s). Before output, records can be filtered,reformatted, or otherwise transformed.Metadata, including record layouts, business rules, transformation definitions, run history and datastatistics, can be maintained either within a specific task or in a central repository. The effects ofmaking a change to your application can be analyzed through impact and lineage analysis.You can run your data transformations directly from your desktop, on any UNIX or Windows server,or schedule them for later execution, embed them in batch scripts, or invoke them from your ownprograms.Installing DMX/DMX-hInstalled DMX components are dependent on your license key: DMX server license key installs components based on whether you select a Standard, Full,Classic, or Custom installation. See DMX installation component options.DMX workstation license key installs the development client, Job and Task Editors; theDMX engine, dmxjob/dmexpress;; and the service for development client, which is the DMXRun-time Service, dmxd.The version of DMX server software must be at least as high as the version of the DMX clientsoftware that is used to develop jobs and connect to the server. Thus, when installing a new versionof DMX, ensure that you install the same release of DMX on your client and server machines. If youare upgrading and unable to install both the client and the server at the same time, you need toupgrade the server prior to upgrading the client.DMX-h OverviewDMX-h is the Hadoop-enabled edition of DMX, providing the following Hadoop functionality: 4ETL Processing in Hadoop – Develop a DMX-h ETL application entirely in the DMX GUI torun seamlessly in the Hadoop MapReduce framework, with no Pig, Hive, or Javaprogramming required. Currently, jobs can be run in either MapReduce or Spark. See theonline DMX Help topic "DMX-h”.Hadoop Sort Acceleration – Seamlessly replace the native sort within Hadoop MapReduceprocessing with the high-speed DMX engine sort, providing performance benefits withoutprogramming changes to existing MapReduce jobs. See the DMX-h Sort User Guide, which isincluded in the Documentation folder under your DMX software installation directory.Apache Spark Integration – Use the Spark mainframe connector to transfer mainframe datato HDFS. See the online DMX Help topic “Spark Mainframe Connector”.Apache Sqoop Integration – Use the Sqoop mainframe import connector to transfermainframe data into HDFS. See the online DMX Help topic "Sqoop Mainframe ImportConnector”.DMX Install Guide

DMX-h RequirementsDMX-h requires the following: oDMX-h EditionA supported Hadoop MapReduce and/or Spark distribution:MapReduce Cloudera CDH 5.x (5.2 and higher) – YARN (MRv2) Hortonworks Data Platform (HDP) 2.x (2.3 and higher) – YARN Apache Hadoop 2.x (2.2 and higher) – YARN MapR, Community Edition and Enterprise Edition only (previously termed M5 andM7, respectively), 6.x – YARN Pivotal HD 3.0 – YARNDMX-h is certified as ODPi (1.0 and higher) interoperable.oSpark Spark on YARN on the following Hadoop distributions: Cloudera CDH 5.x (5.5 and higher) Hortonworks Data Platform (HDP) 2.3.4, 2.x (2.4 and higher) MapR 5.x (5.1 and higher), Community Edition and Enterprise Edition only(previously named M5 and M7, respectively) Spark on Mesos 0.21.0 Spark Standalone 1.5.2 and higherDMX-h Component Setup and OperationA DMX-h setup consists of the following: Windows workstationooo Linux ETL server (edge node)ooo DMX must be installed as described in Step-by-Step Installation, Windows Systems.DMX Job and Task Editors are used for MapReduce job development.MapReduce jobs are submitted to Hadoop via the ETL server from the Job Editor.DMX must be installed as described in Step-by-Step Installation, UNIX Systems.The Hadoop client must be installed and configured to connect to the Hadoop cluster.The DMX Run-time Service, dmxd, must be running to respond to jobs run via theWindows workstation; it calls dmxjob with the /HADOOP option, which ultimately callshadoop to submit jobs to the cluster.Hadoop clusteroooDMX must be installed without dmxd on all nodes in the Hadoop cluster as described inStep-by-Step Installation, Hadoop Cluster.Each mapper and reducer runs the map side or reduce side task(s), respectively.All file descriptors for sources, targets, and intermediate files are carefully connected sothey fit into the Hadoop MapReduce flow.PrerequisitesBefore you install DMX on your system, ensure that the following are available:DMX Install Guide5

DMX software: This is generally shipped downloaded from Syncsort’s web site as a selfextracting executable file (Windows) or a tar file (UNIX).DMX license key: License keys are sent via e-mail as an attachment file calledDMExpressLicense.txt. If you need specific system information to obtain a license key, referto the section below on Getting DMX License Information.If you have a DMX server license key and plan to install DMX installation components, the type ofuser that you setup depends on whether impersonation privileges are extended. See DMXinstallation user setup considerations. 6Operating system: DMX runs on the following operating systems, with the listed releasebeing the minimum supported. Both 32 bit and 64 bit versions are supported, unlessotherwise stated: AIX release 6.1 64-bit; HP-UX release 11.31 IA64 64-bit; Linux kernelversion 2.6.18 to 2.6.31 with C library version 2.5 to 2.11 on Pentium-class x86 64 64-bitmachines; Linux kernel version 2.6.16 with C library version 2.4 on IBM System z 64-bitmainframes; SunOS 5.10 SPARC 64-bit; Windows Vista; Windows 7; Windows 8.x; Windows10; and Windows Server 2008, 2012; and 2012 R2.Java version requirements: On Windows and UNIX/Linux systems, DMX requires Javaruntime version 1.7 or higher unless you are only running DMX Sort, which does not useJava. DMX requires JDK 7.Communication security protocol: On Windows and UNIX/Linux systems, DMX supportsTransport Layer Security (TLS) up to and including TLS version 1.2.User rights: Sufficient privileges to install and start Windows Services for Windowsplatforms and root privileges to install and start UNIX daemons on UNIX platforms. Anumask setting of 022 is required so that other users can run the installed executables. Theinstallation procedure sets and resets umask if required.Pluggable Authentication Modules (PAM): If you want to use PAM for authentication onUNIX or Linux platforms, PAM must be installed and configured on the system.Database client software: If you want DMX to access data in database tables (either as datasource or target), then the appropriate database client software must be on the system andaccessible via the appropriate shared library or dynamic link library (dll) paths.For example, to access an Oracle database, Oracle Client must be installed on the systemwhere you run DMX; to access a database via ODBC, an ODBC data source must be definedon the system where you run DMX. For details on how to connect to a specific DatabaseManagement System (DBMS), refer to the section Connecting to Databases from DMX.Message queue client software: If you want DMX to access data in a message queue, then theappropriate message queue client software must be on the system and accessible via theappropriate shared library or dynamic link library (dll) paths.For example, to access an IBM WebSphere MQ queue, IBM WebSphere MQ client must beinstalled on the system where you run DMX. For details on how to connect to a specificmessage queue type, refer to the section Connecting to Message Queues from DMX.SAP client software: If you want DMX to access data in an SAP system, then the appropriateSAP client software must be installed on the system where you run DMX and accessible viathe appropriate shared library or dynamic link library (dll) paths. For details on how toconnect to an SAP system, refer to the section Connecting to SAP from DMX.Hadoop software – If you want DMX to access data in a Hadoop Distributed File System(HDFS), or you want to run DMX-h ETL MapReduce jobs, then a Hadoop distributionconfigured to access the cluster must be installed on the edge/ETL node from which you runDMX. For details on how to connect to HDFS, refer to the section Connecting to HDFS fromDMX.Connect:Direct software – If you want DMX to access data using a Connect:Directconnection, a Connect:Direct server and client (CLI/API) must be installed on the systemwhere you run DMX and must be configured to access the required Connect:Direct nodes. ForDMX Install Guide

details on how to connect to a Connect:Direct node, refer to Connecting to Connect:Directnodes from DMX.QlikView software – DMX supports QlikView data eXchange (QVX) files as targets. To accessQVX files as sources from QlikView or Qlik Sense, refer to Connecting to QlikView dataeXchange files from QlikView or Qlik Sense.Tableau software – DMX supports Tableau Data Extract (TDE) files as targets. To accessTDE files as sources from Tableau, refer to Connecting to Tableau Data Extract files fromTableau.DMX installation user setup considerationsThe type of user that you setup to install DMX installation components is dependent on whetherimpersonation privileges are extended: If you plan to use impersonation when running the DMX Run-time Service, dmxd, you mustinstall as root.When running the DataFunnel Run-time Service, dmxrund, considerations exist for the typeof user that installs components.User setup when running dmxrundIf you do not plan to use impersonation when running dmxrund, setup a non-administrative user toinstall and run on Windows or setup a service user to install and run on Linux.Setup a non-administrative/service userWindowsAs the administrative user has impersonation privileges by default, setup a new user who does nothave administrative rights.LinuxTo install and run job requests without impersonation, create a service user, dmxuser, and run theinstallation as dmxuser.Setup impersonationIf you plan to use impersonation when running dmxrund, no user setup is required to install and runon Windows; setup an impersonated user to install and run on Linux.WindowsAs the administrative user has impersonation privileges by default, no setup is required.LinuxDMX installation impersonation considerations on Linux follow: No impersonation – Running jobs without impersonation does not require root access. Uponreceipt of a job submission request from the DMX management service, dmxmgr, dmxrundcalls the DMX engine, dmxdfnl, to run the submitted job as the service user, dmxuser.Impersonation – Running jobs with impersonation requires root access to impersonate thespecified user. While dmxrund never is granted root access, another installed component,dmxexecutor, can enable impersonation. When dmxrund detects that dmxexecutor isinstalled in the required directory with the correct permissions, dmxrund calls dmxexecutorto impersonate the specific user that calls the DMX engine, dmxdfnl, which runs thesubmitted jobs.To install and run job requests with impersonation, do the following:DMX Install Guide7

Create a service user, dmxuser.Create a service group, dmexpress.Note: If you choose to change the name of the service group, you must update the SERVICE GROUPproperty of the DMX custom impersonation configuration properties file. Add dmxuser to the service group.Run the installation as dmxuser.Ensure that the following files are in the specified directories with the specified permissions:Directory and filePermissionsNotes DMX installation /bin/dmxexecutor-rwsr-x---The ‘s’ represents the set-user identification(setuid) bit and indicates that dmxexecutor isextended impersonation privileges to runsubmitted jobs as a specific user. DMX installation /conf/dmxexecutor.conf-rwx------Updates to dmxexecutor.conf are required onlyif you choose to customize the impersonation.Getting DMX License InformationTo obtain a license key, you need the computer name, the hardware model, the number of processors,and the operating system of each system on whi

4 DMX Install Guide DMX Overview DMX is a high-performance data transformation product. With DMX you can design, schedule, and control all your data transformations f