Qlik Compose Setup And User Guide

Transcription

Qlik Compose Setup and User GuideQlik ComposeTMAugust 2021Last updated: March 22, 2022Copyright 1993-2021 QlikTech International AB. All rights reserved.HELP.QLIK.COM

2022 QlikTech International AB. All rights reserved. All company and/or product names may be tradenames, trademarks and/or registered trademarks of the respective owners with which they are associated.

Contents1 What's new?81.1 What's new in Data Warehouse projects?8Uniform source consolidationEnvironment variablesSnowflake VARIANT data type supportSupport for data profiling and data quality rules when using Google Cloud BigQueryAttributes case sensitivity supportAssociating a Replicate task that writes to a Hadoop targetPerformance improvementsData mart UX improvement1.2 What's new in Data Lake projects?889999101010Support for excluding deleted records from ODS viewsDatabricks on Google CloudGenerating projects using the CLIThe "Associate with Replicate task option" is now mandatory1.3 All Projects1011111111Task RecoveryNew Project title settingSupport for multiple Replicate ServersLicensePerformance1.4 Support for Microsoft Edge Browser1113131314141.5 Compose Support in Enterprise Manager142 Introduction152.1 Data warehouse projects15Data warehouse projects architectureKey features2.2 Data lake projects151616Easy data structuring and transformationContinuous updatesHistorical data storeData lake project architecture3 Qlik Compose installation and setup16161617183.1 Preparing your system for Compose18Hardware prerequisitesSoftware and network prerequisitesRequired permissions for the Compose serviceReserved system names3.2 Installing or upgrading Compose1819191919Installation InstructionsUpgrade Instructions3.3 Installing and upgrading Compose silently192020Silently installing ComposeSilently upgrading ComposeSilently uninstalling Compose3.4 Determining the required number of database connections212222223.5 Accessing Qlik Compose23Setup and User Guide - Qlik Compose, August 20213

Contents4 Security considerations254.1 Setting up HTTPS for the Compose console25Checking if an SSL certificate is installedUsing the self-signed certificateReplacing the self-signed certificate on Windows4.2 Setting the hostname and changing the HTTPS port252628294.3 Setting up HSTS on Compose29Enabling HSTSDisabling HSTS4.4 Setting Single Sign-On Authentication with Kerberos3030304.5 Changing the master user password315 Data Warehouse projects335.1 Defining a Qlik Replicate task34PrerequisitesLimitations and considerationsSetting up the task5.2 Adding and managing data warehouse projects34343535Adding data warehouse projectsManaging and monitoring projectsProject settingsResetting projectsProject deploymentMigrating objects as CSV filesExporting and importing projects using the CLIWorking with environment variablesGenerating projects using the CLIExporting project documentationViewing and downloading DDL scriptsProject versioningCreating a diagnostics package5.3 Getting started with Data Warehouse projects3637384445467582929394959697High-level flowConsole elementsData warehouse project tutorial5.4 Setting up a data warehouse connection9797100104Using Microsoft SQL Server as a data warehouseUsing Oracle as a data warehouseUsing Snowflake as a data warehouseUsing Amazon Redshift as a data warehouseUsing Microsoft Azure Synapse Analytics as a data warehouseUsing Google Cloud BigQuery as a Data WarehouseManaging databases5.5 Setting up Landing Zone and Data Source connections105108111115117120123124PermissionsData type mappingsDefining landing zonesDefining Replicate data source connectionsSetup and User Guide - Qlik Compose, August 20211241251321344

ContentsManaging databases5.6 Creating and managing the model141141Generating the modelModel limitationsValidating the modelDisplaying the modelManaging the modelCreating expressionsOpening the expression builderDefining reusable transformations5.7 Creating and managing the data warehouse142149149150153168169176177Data warehouse tasksManaging tasksViewing and exporting task statementsModifying task settingsValidating the data warehouseClearing the data warehouse metadata cache5.8 Creating and managing data marts178187202203208210211Adding data marts and star schemasDisplaying data in a pivot tableManaging data martsExample of a Valid Table Creation ModifierExample of a Valid Table Creation ModifierPre and Post Loading ETLsViewing and exporting task statementsValidating the data martModifying data mart settingsThe "Obsolete" indicator5.9 Creating and managing command tasks211217219225230233234235236238238Defining command tasksManaging command tasksControlling and monitoring command tasks5.10 Controlling and monitoring tasks and workflows239239239240Viewing information in the monitorViewing missing referencesControlling tasksNotificationsWorkflowsMonitoring and controlling Qlik Replicate tasks6 Data Lake projects2402422442462482522556.1 Defining a Qlik Replicate task255PrerequisitesLimitations and ConsiderationsSetting up the task6.2 Adding and managing Data Lake projects255256256257PrerequisitesData Lake project guidelines257259Setup and User Guide - Qlik Compose, August 20215

ContentsAdding data lake projectsManaging and monitoring projectsProject settingsResetting projectsProject deploymentExporting and importing projects using the CLIGenerating projects using the CLIViewing and downloading DDL scriptsProject versioningCreating a diagnostics package6.3 Getting started with Data Lake l flowConsole elements6.4 Setting up landing and storage connections281281283Defining a Storage ZoneDefining Landing ZonesManaging Landing and Storage connections6.5 Selecting source tables and managing metadata284292294294Selecting and adding the source tablesValidating the metadata and storageManaging the metadataSchema evolutionCreating transformationsReusable transformations6.6 Creating and Managing Storage Zone Tasks294298300304307313315Defining and running data storage tasksManaging task definitionsClearing the metadata cacheViewing and exporting task statementsModifying task settings6.7 Creating and managing command tasks316319328329329330Defining Command tasksManaging Command tasksControlling and monitoring Command tasks6.8 Controlling and monitoring tasks and workflows330331331331Viewing information in the monitorRunning and controlling tasksNotificationsWorkflowsMonitoring and controlling Replicate tasks7 Managing Compose3323333353373413437.1 License settings343License enforcementRegistering a license7.2 Viewing a license3433433447.3 Logging settings344Setting the logging levelSetup and User Guide - Qlik Compose, August 20213446

ContentsSetting automatic roll over and cleanupViewing and downloading Compose log files7.4 Mail server settings3453463477.5 Running tasks on a remote Compose server3477.6 Replicate Server settings3477.7 User permissions348Default user permissions according to roleGranular access controlManaging user permissions7.8 Audit trails349350352355Audit trail informationExporting Audit Trail filesConfiguring Audit Trail size and retentionDecoding an encoded payload8 Setting up Compose on a Windows HA cluster3553563573573588.1 Step 1: Installing Compose in the cluster358PreparationPrimary Node SetupSecondary Node Setup8.2 Step 2: Adding the Compose service3583583593598.3 Step 3: Defining the service dependencies3598.4 Step 4: Defining the URL for the cluster3608.5 Upgrading Compose on the cluster361A Impact of DST change on Qlik Compose362B Support matrix363B.1 Supported Windows platforms363B.2 Supported browsers363B.3 Supported Qlik Replicate and Enterprise Manager versions363B.4 Supported Databases for Data Warehouse Projects364Supported data sourcesSupported data warehousesB.5 Supported databases for Data Lake projects364364366C Cron format and examples367C.1 Cron format367C.2 Special characters367C.3 Usage examples368D Supported characters370E Glossary371Setup and User Guide - Qlik Compose, August 20217

1 What's new?1What's new?The following section describes the enhancements and new features introduced in Qlik Compose August2021.1.1What's new in Data Warehouse projects?The following section describes the enhancements and new features introduced in Qlik Compose DataWarehouse projects.Uniform source consolidationRequires Compose August 2021 Service Release 03 or later.To enable uniform source consolidation configuration, a new Consolidation tab has been added to thedata warehouse task settings.When the Consolidate uniform sources option is enabled, Compose will read from the selected datasources and write the data to one consolidated entity. This is especially useful if your source data ismanaged across several databases with the same structure, as instead of having to define multiple datawarehouse tasks (one for each source), you only need to define a single task that consolidates the datafrom the selected data sources.Consolidation tab showing selected data sourcesEnvironment variablesRequires Compose August 2021 Service Release 03 or later.Setup and User Guide - Qlik Compose, August 20218

1 What's new?Environment variables allow developers to build more portable expressions, custom ETLs, and Composeconfigurations, which is especially useful when working with several environments, as is the case withDTAP (Development, Testing, Acceptance and Production). Different environments (for example,development and production) often have environment-specific settings such as database names, schemanames, and Replicate task names. Variables allow you to easily move projects between differentenvironments without needing to manually configure the settings for each environment. This is especiallyuseful if many settings are different between environments. For each project, you can use the predefinedenvironment variables or create your own environment variables.Snowflake VARIANT data type supportWhen creating a Landing Zone, the new Discover the VARIANT data type as option lets you choosewhether columns of the VARIANT data type will be created as JSON or XML in the Snowflake database.VARIANT data type support in Replicate is planned for the November 2021 release. Until then,this feature can be used to discover tables with this data type that were created manually in theLanding Zone, or to manually set certain columns in the model to the JSON/XML data type.Support for data profiling and data quality rules when using GoogleCloud BigQueryRequires Compose August 2021 Service Release 01 or later.Added support for data profiling and data quality rules when using Google Cloud BigQuery as a datawarehouse.Attributes case sensitivity supportRequires Compose August 2021 Service Release 01 or later.In previous versions, attempting to create several Attributes with the same name but a different case wouldresult in a duplication error. Now, such attributes will be created with an integer suffix that increasesincrementally for each attribute added with the same name. For example: Sales, SALES 01, and Sales02 .Associating a Replicate task that writes to a Hadoop targetRequires Compose August 2021 Service Release 01 or later.This version introduces support for associating a Replicate task that writes to a Hadoop target.Setup and User Guide - Qlik Compose, August 20219

1 What's new?Performance improvementslThe "Discover" queries have been optimized to return results much faster than in previous versionslWhen loading the data mart tables in Microsoft Azure Synapse Analytics, the 'FORCE ORDER' hinthas been added to the statements that populate the temporary DIM MUT tables and to thestatements that populate the fact tables. For each dimension, the statistics are updated the momentbefore they are used to load the fact tables. lThe New Star Schema wizard has been scaled to allow a significantly larger number of tables to beselected.Performance improvements that require Compose August 2021 Service Release01 or laterlValidating a model with self-referencing entities is now significantly faster than in previous versions.For instance, it now takes less than a minute (instead of up to two hours) to validate a model with5500 entities.lThe time it takes to "Adjust" the data warehouse has been significantly reduced. For instance, it nowtakes less than three minutes (instead of up to two hours) to adjust a data warehouse with 5500entities.lSignificantly improved the loading speed of data mart Type 2 dimensions with more than twoentities. In order to benefit from this improvement, customers upgrading with existing data martsneeds to regenerate their data mart ETLs.Data mart UX improvementRequires Compose August 2021 Service Release 01 or later.The Data Mart Dimensions tree and the Star Schema Fact tab were redesigned to provide a better userexperience.1.2What's new in Data Lake projects?The following section describes the enhancements and new features introduced in Qlik Compose DataLake projects.Support for excluding deleted records from ODS viewsRequires Compose August 2021 Service Release 03 or later.A Deleted records in ODS views section has been added to the General tab of the project settings, withthe following options:lExclude the corresponding record from the ODS views - This is the default option as recordsmarked as deleted should not usually be included in ODS views.Setup and User Guide - Qlik Compose, August 202110

1 What's new?lInclude the corresponding record in the ODS views - Although not common, in some cases, youmight want include records marked as deleted in the ODS views in order to analyze the number ofdeleted records and investigate the reason for their deletion. Also, regulatory compliance mightrequire you to be able to retrieve the past record status (which requires change history as well).As this was the default behavior in previous versions, you might need to select thisoption to maintain backward compatibility.Databricks on Google CloudRequires Replicate May 2021 Patch Release 02 or later.Databricks on Google Cloud 8.x is now supported.Generating projects using the CLIPreviously available only for Data Warehouse projects, this version introduces support for generating DataLake projects using the Compose CLI. This can be especially useful when deploying projects betweendifferent environments. When the generate project command is run, Compose:lValidates the metadata.lCreates any storage tables that do not exist.lValidates the storage.lAdjusts the storage if needed.lGenerates all storage tasks.The "Associate with Replicate task option" is now mandatoryWhen defining a Landing Zone for Data Lake projects, associating a Replicate task (by selecting theAssociate with Replicate task check box) is now mandatory. Therefore, after upgrading to this version,make sure to set this option if it is not already set.1.3All ProjectsThe following section describes the enhancements and new features that are common to both Data Lakeprojects and Data warehouse projects.Task RecoveryThe ability to set SQL state classes and error codes for triggering task recovery has been added to theproject settings' Environment tab.Setup and User Guide - Qlik Compose, August 202111

1 What's new?Limitations and considerations:lSchema evolution retries are not supported.lODBC statements comprise a small part of the task execution sequence. However, asthe task retry mechanism is JDBC-based, ODBC statements will not be retried even ifthe specified SQL state/error code is encountered.Setup and User Guide - Qlik Compose, August 202112

1 What's new?New Project title settingRequires Compose August 2021 Patch Release 12 or later.A new Project title setting had been added to the Environment tab of the project settings. The project titlewill be shown in the console banner. If both an Environment Title and a Project Title are defined, theproject title will be displayed to the right of the environment title. Unlike the Environment title andEnvironment type, which are unique for each environment, the project title is environment independent.This means that the project title will always be retained, even when deploying to a different environment.The following image shows the banner with both an Environment title and a Project title:The banner text is shown without the Environment title and Project title console labels. Thisprovides greater flexibility as it allows you add any banner text you like, regardless of the actuallabel name. For example, specifying Project owner: Mike Smith in the Project title field,will display that text in the banner.Support for multiple Replicate ServersThis version introduces support for configuring connectivity to Multiple Replicate Servers. This can beespecially useful if you need to set up several projects in Compose, each with a different Replicate Serverand task.LicenseThe parameter for registering a license has been changed from --req to --infile, thereby aligning it withother Qlik data integration products. In addition, the need to append the path with an "@" symbol is nolonger applicable.Setup and User Guide - Qlik Compose, August 202113

1 What's new?Example of previous command:ComposeCLI.exe register license --req @c:\Admin\Temp\lic.txtExample of new command:ComposeCLI.exe register license --infile c:\Admin\Temp\lic.txtPerformanceCreating and dropping tables via Compose is now significantly faster when a large number of tables isinvolved.1.4Support for Microsoft Edge BrowserRequires Compose August 2021 Service Release 01 or later.This version introduces support for accessing the Compose console using Microsoft Edge.1.5Compose Support in Enterprise ManagerFrom Enterprise Manager Service Release SR2, the following functionality will now be supported whenworking with Qlik Compose August 2021 (or above):lMonitoring of Data Lake projectslMonitoring of workflows - Note that monitoring information is provided at the workflow level only. Forinformation on individual tasks in the workflow, you need to drill-down to the Compose project fromEnterprise Manager.lMonitoring tasks that have the same name, but are defined in different Compose projects. Tofacilitate such support:lIn the API, Compose task names are now appended with a double underscore and theproject name. For example, MyTask MyProject.Task names without an appended project name are no longer supported.Therefore, any scripts or applications using the old name format (i.e. without theprojectName suffix) will need to be updated to use the new format.lIn the console user interface, a new Project column has been added, allowing you to easilydifferentiate between two tasks with the same name.The newly supported functionality has also been incorporated into the following API methods (REST,.NET, and Python): GetTaskList, GetTaskDetails, RunTask, and StopTask.Setup and User Guide - Qlik Compose, August 202114

2 Introduction2IntroductionQlik Compose provides an all-in-one purpose built automation solution for creating an agile datawarehouse and/or ingesting data from multiple sources to your data lake for further downstreamprocessing. To this end, Qlik Compose offers two project types: Data Warehouse and Data Lake. Thisintroduction will take a closer look at how these projects can help your organization overcome the hurdlestypically faced when confronted with the challenge of setting up and maintaining an agile data warehouse,or when faced with challenge of ingesting data from multiple source to a single analytics-ready storagesystem.2.1Data warehouse projectsTraditional methods of designing, developing, and implementing data warehouses require large time andresource investments. The ETL stand-up development effort alone – multi-month and error-prone with preptimes of up to 80 percent and expertise from specialized developers – often means your data model is out ofdate before your BI project even starts. Plus, the result of a traditional data warehouse design,development, and implementation process is often a system that can’t adapt to continually changingbusiness requirements. Yet modifying your data warehouse diverts skilled resources from your moreinnovation-related projects. Consequently, your business ends up with your data warehouse becoming abottleneck as much as an enabler of analytics.Qlik Compose data warehouse projects allows you to automate these traditionally manual, repetitive datawarehouse tasks: design, development, testing, deployment, operations, impact analysis, and changemanagement. Qlik Compose automatically generates the task statements, data warehouse structures, anddocumentation your team needs to efficiently execute projects while tracking data lineage and ensuringintegrity. Using Qlik Compose, your IT teams can respond fast – in days – to new business requests,providing accurate time, cost, and resource estimates. Then once projects are approved, your IT staff canfinally deliver completed data warehouses, data marts, and BI environments in far less time.Data warehouse projects architectureThe process is illustrated in the following diagram and described below:Setup and User Guide - Qlik Compose, August 202115

2 IntroductionKey featuresThe comprehensive set of automation features in our Qlik Compose solution simplifies data warehousingprojects. It eliminates the cumbersome and error-prone manual coding required by legacy data warehousedesign and implementations’ many repetitive steps. In addition, our solution includes the operationalfeatures your business needs for ongoing data warehouse and data mart maintenance.Automation Featuresl2.2Operational FeaturesOptimized for either model-driven ordata-driven data warehousingapproacheslMonitoringlWorkflow designer and schedulerlNotificationslReal-time source data integrationlData profiling and quality enforcementlAutomated ETL generationlLineage and impact analysislPhysical data warehousemanagementlProject documentation generationlMigration between environmentslData mart generationData lake projectsLeverage Qlik Compose data lake projects to automate your data pipelines and create analytics-readydata sets. By automating data ingestion, schema creation, and continual updates, organizations realizefaster time-to-value from their existing data lake investments.Easy data structuring and transformationAn intuitive and guided user interface helps you build, model and execute data lake pipelines.Automatically generate schemas and Hive Catalog structures for operational data stores (ODS) andhistorical data stores (HDS) without manual coding.Continuous updatesBe confident that your ODS and HDS accurately represent your source systems.lUse change data capture (CDC) to enable real-time analytics with less administrative andprocessing overhead.lEfficiently process initial loading with parallel threading.lLeverage time-based partitioning with transactional consistency to ensure that only transactionscompleted within a specified time are processed.Historical data storeDerive analytics-specific data sets from a full historical data store (HDS).lNew rows are automatically appended to HDS as data updates arrive from source systems.lNew HDS records are automatically time-stamped, enabling the creation of trend analysis and othertime-oriented analytic data marts.lSupports data models that include Type-2, slowing changing dimensions.Setup and User Guide - Qlik Compose, August 202116

2 IntroductionData lake project architectureThe flow is as follows:1. Land: The source tables are loaded into the Landing Zone using Qlik Replicate or other third-partyreplication tools.When using Qlik Replicate to move the source table to the Landing Zone, you can define either aFull Load replication task or a Full Load and Store Changes task to constantly propagate thesource table changes to the Landing Zone in write-optimized format.2. Store: After the source tables are present in the Landing Zone, Compose auto-generates metadatabased on the data source(s). Once the metadata and the mappings between the tables in theLanding Zone and the Storage Zone have been finalized, Compose creates and populates theStorage Zone tables in read-optimized format, ready for consumption by downstream applicaitons.It should be noted that even though setting up the initial project involves both manual and automaticoperations, once the project is set up, you can automate the tasks by designing a Workflow in Composeand/or utilizing the Compose scheduler.Setup and User Guide - Qlik Compose, August 202117

3 Qlik Compose installation and setup3Qlik Compose installation and setupThis section describes how to install and set up Qlik Compose.Note that as Qlik Replicate serves as a data (and metadata) provider for Qlik Compose, you also need toinstall Replicate in your organization. For a description of the Replicate installation procedure, refer to theQlik Replicate Setup and User Guide.In this section:lPreparing your system for Compose (page 18)lInstalling or upgrading Compose (page 19)lInstalling and upgrading Compose silently (page 20)lDetermining the required number of database connections (page 22)lAccessing Qlik Compose (page 23)3.1Preparing your system for ComposeCompose should be installed on a Windows Server machine that is able to access the data warehouse andoptionally the source database(s) defined in your Compose project. Note that Compose only needs toaccess the source database if you plan to discover the source database when generating your model. Formore information on discovery, see Discovering the Source Database or Landing Zone (page 142).Before installing Compose, make sure that the following prerequisites have been met:Hardware prerequisitesThe following table lists the required hardware for varied deployment scales:Hardware component cSystemLargeSystemQuadcoreQuadcore base8-core base8 GB16 GB32 GBAdditional cores may improve performance whenseveral ETL processes are running concurrently.MemoryAdditional memory may improve performancewhen several ETL processes are runningconcurrently.Setup and User Guide - Qlik Compose, August 202118

3 Qlik Compose installation and emDisk requirements100 GB500 GB500 GBSSD10,000RPM15,000RPMRAIDRAID10 GbTwo 10 GbFor all configurations, RAID is recommended forhigher system availability in case of disk failure.Network1 GbSoftware and network prerequisiteslFirewall ports 80/443 should be open on the Compose machine.lMicrosoft Visual Studio C 2010 X64 Redistributable installed on the Compose machine.l.NET Framework 4.8 or later installed on the Compose machine.lTLS 1.2 or later must be supported in the underlying OS.On Windows Server 2012 R2, TLS 1.2 should be turned on by default. If it is not, refer tothe Microsoft online help for instructions on how to turn it on.For information on supported databases and browsers, see Support matrix (page 363).Required permissions for the Compose serviceQlik Compose needs to be installed and run as Administrator.Reserved system namesAll database object names (queries, tables, columns, schemas, and indexes) starting with the prefix qlk, and regardless of case, are reserved for internal Compose use.Thus, a table named qlK MyTable or a column named QLK MyColumn would not be permitted.3.2Installing or upgrading ComposeThe following topic describes how to install and upgrade Qlik Compose.Installation InstructionsFor best performance when using cloud-based databases (such as, Snowflake) as your datasource or data warehouse, it is strongly recommended to install Qlik Compose on a machine(such as Amazon EC2) located in the same region as your database instance.Setup and User Guide - Qlik Compose, August 202119

3 Qlik Compose installation and setupTo install Compose:1. Run the Compose setup file (Qlik Compose version.number .exe).The Qlik Compose setup wizard opens.2. Click Next. Select I accept the terms of the license agreement and then click Next again.3. Optionally change the installation directory and then click Next.4. Click Next and then click Next again to start the installation.5. When the installation completes, click Finish to exit the Wizard.As part of the installation, a new Windows Service called Qlik Compose is created.6. Open the Qlik Compose console as described in Accessing Qlik Compose (page 23).When you first open the Qlik Compose Console, you will be prompted to register anappropriate license. Register the license that you received from Qlik.Upgrade InstructionsDepending on your existing Compose version, you may also need to perform additionalversion-specific upgrade tasks. It is therefore strongly recommended to review the releasenotes for the new version before upgrading.1. Stop all Compose tasks and services.2. After the Qlik Compose service has been stopped by the Installer, make sure that all childprocesses are also stopped.Compose runs a check to verify the termination of tasks and processes before runningan upgrade. If any processes are found to be still running, the installation will beaborted.3. Run the Qlik Compose setup wizard.4. Start all Compose tasks and services.3.3Installing and upgrading Compose silentlyCompose can be installed silently (i.e. without requiring user interaction). This option is useful, for example,if you need to install Compose on several machines throughout your organization.Before commencing the installation, make sure that the prerequisites have been met. SeePreparing your system for Compose (page 18).The following topics describe how silently install, upgrade, and uninstall Compose:Setup and User Guide - Qlik Compose, August 202120

3 Qlik Compose installation and setuplSilently installing Compose (page 21)lSilently upgrading Compose (page 22)lSilently uninstalling Compose (page 22)Silently installing ComposeThe installation process consists of two stages: creating a response file, and running the silent install.Creating a response

5.3 Getting started with Data Warehouse projects. 97. High-level flow. 97. Console elements. 97. Data warehouse project tutorial. 100. 5.4 Setting up a data warehouse connection. 104. Using Microsoft SQL Server as a data warehouse. 105. Using Oracle as a data warehouse. 108. Using Snowflake as a data warehouse. 111. Using Amazon Redshift as a .