Introduction To Talend Open Studio For Data Integration

Transcription

Introduction toTalend Open Studio for Data IntegrationDimitar ZaharievBI / DI Consultantdimitar@zahariev.pro@shekeriev

DisclaimerPlease keep in mind that: 2I’m not related in any way to TalendEverything stated from now on is my personal opinion and itdoesn’t reflect in any way the position of my employer or otherrelated parties

Agenda General definitions Business case Demo3

General definitionsJust to be sure that we are on the same page4

Main definitions WorkspaceLocal directory that stores one or more projects ProjectLogical grouping of one or more jobs JobThe smallest executable unit. It is a group of one or more components.Typically implements a data flow or integration process5

General look and feel6

General look and feelRepositoryGives us access to the Repository where we cancreate Jobs and manage metadata7

General look and feelDesign WorkspaceProvides us with a playground to design our Jobs8

General look and feelConfiguration TabsAllow us to control the components behaviorand execute Jobs9

General look and feelOutline and Code tabsThe Outline tab lists the components that have beenadded to the design workspace. The Code tab displaysthe code associated with each component10

General look and feelPaletteContains the different components we useto build our Jobs11

Main sections of the Repository Job DesignsStores Jobs we work on. Furthermore Jobs can be organized into folders ContextsContains sets of global or job-specific variables MetadataHolds descriptive information about our data sources and targets groupedby type12

Building blocks of a data warehouse DimensionsA dimension is a structure that categorizes facts and measures in order toenable users to answer business questions. Commonly used dimensions arepeople, products, place and time. Historical changes in dimensions areusually tracked by SCD management methodologies referred to as Type 0through 6. FactsA fact is a value or measurement, which represents a fact about themanaged entity or system.Wikipedia13

Business caseWhat is the problem and how to deal with it?14

The customerLinuxGoods.rs is a local Serbian on-line shop for Linux and Unixrelated merchandise like: Badges Stickers T-shirts Hats and etc.15

The caseAs their business was growing they began to realize that therehad to be a way to analyze what is going on. It would allow themto keep the trend.So they decided to build a small data warehouse to meet theirgrowing need for analytical overview of the business.16

The landscapeThree source systems and one target – a database. Input data is coming inthree forms - plain text files, excel files, and XML files. Part of the processedfiles should be moved in another folder for archiving purposes.17

The solution18

DemoTalend Open Studio for Data Integration in action19

ResourcesUseful stuff to help us on our journey with Talend20

Official resourcesA short list of helpful resources: Software and -open-studio#t4 Talend knowledge basehttps://help.talend.com/display/HOME/Knowledge Base Talend community sitehttps://www.talendforge.org/ Talend demo project (available within the studio)21

Additional resourcesA very good book on the subject: Getting Started with Talend Open Studio for Data Integrationby Jonathan BowenResources prepared by me: Pre-Built Linux VMs with Talend installed for VirtualBoxhttps://zahariev.pro/balccon2k16 Articles on the subject (they will increase with time)https://zahariev.pro/category/talend22

Thank you!Dimitar ZaharievBI / DI Consultantdimitar@zahariev.pro@shekeriev

5 Main definitions Workspace Local directory that stores one or more projects Project Logical grouping of one or more jobs Job The smallest