REST-based Data Integration Services For Software Engineering Domain - TUM

Transcription

REST-based Data Integration Services forSoftware Engineering DomainFridolin Koch, Bachelor’s Thesis – Kickoff PresentationSoftware Engineering for Business Information Systems (sebis)Department of InformaticsTechnische Universität München, Germanywwwmatthes.in.tum.de

Outline1.2.3.4.5.Motivation Problem statement Existing ETL solutionsResearch QuestionsSolution Approach UI Prototype Framework Workflow Technology Stack Current ArchitectureNext StepsTimelineFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis2

Problem Statement Existing barrier in the adoption of knowledge management systems in softwareengineering domain Many different software architecture life cycle tools produce data in differentformats (Enterprise Architect, Excel, Jira, etc.) Repeatedly integrating this data into such a system can be a challenging andtedious task In general the task of data integration is addressed by Extract-Transform-LoadTool (ETL-Tool) Wide range of commercial and open source ETL-Tool available But: Mostly tailored to generic use cases Difficult to embedded in existingdomain specific tools Potential Solution: Analyze popular ETL-Tools and create an easily extendableframeworkFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis3

Existing ETL-Tools: sJavaStandaloneGenericCloverETLCore onlyJavaStandalone,EmbeddedGenericTalend OpenYesStudio for DataIntegrationJavaDesigner /Script-GeneratorGenericPentahoYes, but # inked-Data(RDF)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis4

Existing ETL-Tools: Apatar Java-BasedOpen-SourceVisual job designerGeneric usage domainSource: www.apatar.comFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis5

Existing ETL-Tools: Clover-ETL Java-Based ETL-Tool Open-Source (Core only) Visual job designer (Community adCommercial Edition) Standalone and embedded Generic usage domain Custom Domain-Specific-Languageto define business logic (“CTL”) Clusterable Many data connectorsSource: http://www.cloveretl.com/Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis6

Existing ETL-Tools:Talend Open Studio for Data Integration Java-Based ETL-Tool Open-Source Code generator for datatransformation scripts (Java) Based on Eclipse 900 connector and components Generic usage domainSource: Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis7

Existing ETL-Tools: Pentaho Java-BasedCommunity and Enterprise editionVisual DesignerRich library of pre-built ETLcomponents Generic usage domainSource: dolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis8

Existing ETL-Tools: Rhino ETL C# .Net Framework Open source Hello-World application available onGitHub Pure framework no additionalconnectorsSource: n Koch, Bachelor’s Thesis – Kickoff Presentation sebis9

Existing ETL-Tools: UnifiedViews Java based Open Source Specialized on RDF-Data (Linkeddata) Visual Designer to build Job (WebBased) Extendable through plugins Developed at Charles University,PragueSource: http://unifiedviews.eu/Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis10

Existing ETL-Tools: Conclusion Almost all tool have a generic use case domain, but are manly advertised forBusiness Intelligence and Big Data Integration / Analysis Tools have thousands of adapters, transformers and settings High entrybarrier Heavy duty tools for “Big Data” Higher configuration and maintenanceeffort SyncPipes is lightweight an quick to integrate into your infrastructure TypeScript / JavaScript provides an ecosystem that is easily extensible 260.000 Packages available through npm to speed up development RESTful API assures easy integration into existing system architecture Rule of thumb: Create new adapters and the corresponding Workflow withina day Docker-Support out of the Box (“Zero configuration”)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis11

Research QuestionsResearch Objective: Create a REST-based Data Integration Framework to enabledevelopers to implement adapters for ETL-Workflows easily. Facilitate the EndUser to visualize the source and target system’s domain model in the conjunctionwith creating new Data Integration Workflows.Research QuestionsQ1: “What are the key features that must be supported by data integrationframework?”Q2: “How does the framework's architecture support its extensibility with newadapters?”Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis12

UI Prototype (I)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis13

UI Prototype (II)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis14

UI Prototype (III)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis15

UI Prototype (IV)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis16

UI Prototype (V)Fridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis17

Technology StackMongoDB Express.js Angular.js Node.jsFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis18

Current Application ArchitectureExtensionsExtractorFrameworkLoaderLoad extensionsWorker 1Worker 2 Worker nKernelStore configurationRESTful ServerEnqueue JobsDistribute Jobs to WorkersFridolin Koch, Bachelor’s Thesis – Kickoff PresentationAngular Client sebis19

Workflow(End)UserDeveloperDeveloper usesframeworkto provide extensionsCreate and monitor JobsSyncPipesExtractTransformLoadJiraSocioCortexEA ExcelSource SystemsFridolin Koch, Bachelor’s Thesis – Kickoff PresentationTarget System(s) sebis20

Next Steps Prototype evaluation1. Present prototypical implementation to 2-3 developers which a familiar withthe target domain (e.g. researchers at the SEBIS chair)2. Ask developers to implement extractor and loader extensions3. Gather feedback through interviews4. Improve prototype based on the provided feedback5. Ask developers to implement similar adapters again6. Gather feedback Improve UI / Frontend to work with RESTful backend Write thesisFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis21

TimelineFebruaryMarchAprilMayJuneJulyLiterature researchAnalyze ETL-ToolsUIPrototypeFrameworkImplementationUI RESTEvaluationImproveFrameworkEvaluationWriting BufferFridolin Koch, Bachelor’s Thesis – Kickoff Presentation sebis22

Thank you for your attention.Fridolin KochTechnische Universität MünchenDepartment of InformaticsChair of Software Engineering forBusiness Information SystemsBoltzmannstraße 385748 Garching bei MünchenTelFax 49.89.289. de

Talend Open Studio for Data Integration Yes Java Designer / Script-Generator Generic Pentaho Yes, but less functionality Java Standalone Generic RhinoETL Yes C# .net Framework Generic UnifiedViews Yes Java Standalone Linked-Data (RDF)