KNIME Analytics Platform - Datascienceguide

Transcription

KNIMEAnalyticsPlatform

Summary of this lesson„If the only tool you have is a hammer, you tend to see every problem as anail”-Abraham MaslowFirst the tool or the knowledge?*This lesson refers to Appendix B of the GIDS bookGuide to Intelligent Data Science Second Edition, 20202

Content of this lesson Download and Install The Workbench More on Nodes Metanodes and Components KNIME Hub Build your first „Hello“ WorkflowGuide to Intelligent Data Science Second Edition, 20203

Datasets Dataset used : adult dataset Example Workflows: „My First Workflow“ https://kni.me/w/kYeZOLeAJXo9Mvol Read from CSV file, Excel file and SQLite. Filter rows and columns Write to CSV fileGuide to Intelligent Data Science Second Edition, 20204

Download and InstallGuide to Intelligent Data Science Second Edition, 20205

KNIME Analytics Platform Open and opensource modular Data Science platform Covers all the data science needs:Data AccessData PreparationData VisualizationMachine LearningTestingDeployment Based on the visual programming paradigm Provides a diverse array of extensions: Text MiningNetwork MiningCheminformaticsDeep LearningMany integrations, such as Java, R, Python, Weka, Keras, Plotly, H2O, etc. And moreGuide to Intelligent Data Science Second Edition, 20206

KNIME ServerKNIME Analytics PlatformKNIME Server To develop data sciencesolutions To integrate the solutions into theIT environment Structured dataUnstructured dataMachine LearningStatistics Open source FreeGuide to Intelligent Data Science Second Edition, 2020 SchedulingMLOpsEasy deploymentREST architectureAuditing tools Closed source Yearly license7

Installationhttps://www.knime.com/downloads Select the KNIME Analytics Platform version for your computer: Mac Windows – 32 or 64 bit Linux Download the archive and extract the file, or download the installerpackage and run itGuide to Intelligent Data Science Second Edition, 20208

The WorkbenchGuide to Intelligent Data Science Second Edition, 20209

The KNIME Workspace The workspace is the folder/directory in which workflows (andpotentially data files) are stored for the current session. Workspaces are portable (just like KNIME Analytics Platform)Guide to Intelligent Data Science Second Edition, 202010

The KNIME WorkbenchGuide to Intelligent Data Science Second Edition, 202011

WorkflowA workflow is a pipeline of nodes, each configurable to perform a specific task.The data flow through nodes from left to rightGuide to Intelligent Data Science Second Edition, 202012

KNIME Explorer This panel displays all the workflowsin the selected workspace LOCAL: projects saved on your own machine EXAMPLES: hundreds of read-only exampleworkflows My-KNIME-Hub: additional space where you canshare your workflows with the community or justpark your work for yourself Provides a search box and buttons to Refresh the view Select the currently displayed workflow Can display 4 types of content Guide to Intelligent Data Science Second Edition, 2020WorkflowsWorkflow groupsData filesShared Components13

Creating a new workflowClick anywhere on the KNIME Explorer tocreate a new workflow or workflow groupGuide to Intelligent Data Science Second Edition, 202014

Importing and Exporting WorkflowsRight-click anywhere in KNIMEExplorer to import a workflowGuide to Intelligent Data Science Second Edition, 2020Right-click on a workflow orworkflow group to export theselected workflow15

Node Repository The Node Repository contains allKNIME nodes - ordered bycategory with further subcategories. Extensions installation can sensiblyincrease the number of nodes Two search methods: Crisp Search Fuzzy Search Nodes can be added by drag anddrop from the Node Repository tothe Workflow EditorGuide to Intelligent Data Science Second Edition, 202016

Node Description The Description window givesinformation about: Guide to Intelligent Data Science Second Edition, 2020Node FunctionalityInput & OutputNode SettingsPortsReferences to literature17

Workflow Description When selecting the workflow, theDescription window givesinformation about the workflow’s: Guide to Intelligent Data Science Second Edition, 2020TitleDescriptionAssociated Tags and LinksCreation DateAuthor18

Workflow Coach Node Recommendation engine It gives hints about which node to use next in the workflow It is based on world-wide KNIME community usage statistics It can also be set to use personal and local group usage statisticsGuide to Intelligent Data Science Second Edition, 202019

Console and Other views Console view prints out error andwarning messages about what isgoing on under the hood Click on View and select Other toadd additional viewsGuide to Intelligent Data Science Second Edition, 202020

Error Log ViewTip: enabling and checking the Error Log view can help while debugging your projectGuide to Intelligent Data Science Second Edition, 202021

More on NodesGuide to Intelligent Data Science Second Edition, 202022

More on Nodes Nodes are the basic processing units of a workflow Each node has a number of input and/or output ports Data is transferred over a connection from an out-port to the in-port(s) ofother nodes Under each node, a light shows its statusGuide to Intelligent Data Science Second Edition, 202023

Data Port TypesModelFlowVariableImageDataDB ConnectionDB Data A pipeline of such nodes makes a workflow The result of the node’s operation on the data is provided at the out-portto successor nodes Only port of the same type can be connectedGuide to Intelligent Data Science Second Edition, 202024

Node Configuration Most nodes requireconfiguration To access a nodeconfigurationwindow: Double-click the nodeOR Right-click ConfigureGuide to Intelligent Data Science Second Edition, 202025

Node Execution Right-click node Select Execute in context menuIf execution is successful, status shows green lightIf execution produces warnings, status showyellow triangleIf execution encounters errors, status shows a redXGuide to Intelligent Data Science Second Edition, 202026

Node ViewsInteractive ViewData ViewGuide to Intelligent Data Science Second Edition, 202027

Frequently Used ers/PredictorsGuide to Intelligent Data Science Second Edition, 202028

Tidy up workflows Workflow can easily become complex and difficult to understandGuide to Intelligent Data Science Second Edition, 202029

Metanodes andComponentsGuide to Intelligent Data Science Second Edition, 202030

Tidy up workflows Metanodes and components can help tidying up, encapsulating nodes performingcommon operationsGuide to Intelligent Data Science Second Edition, 202031

ComponentsSteps to build a component or a metanode Select related nodes that you want to groupRight clickSelect Create component or Create Metanode Give it a nameComponents have more sophisticated features: Encapsulate flow variables, i.e. the parameters only live insidethe component Provide a configuration window: variables and parameterswithin the component can be edited by Right Click - Configure Build a composite view: Visualization inside the componentcan be grouped in a dashboardGuide to Intelligent Data Science Second Edition, 202032

Submenu Component Right click on the Component and select Setup from the submenuComponent to access further customization settings, such as thecomponent name and the portsGuide to Intelligent Data Science Second Edition, 202033

Inside a componentShortcut:Ctrl double click oncomponent to open itscontentGuide to Intelligent Data Science Second Edition, 202034

Components Configuration Window Components can be configurable From the configuration window(Right click - Configure ) theuser can enter some parameters The entered parameters changethe behaviour of the nodes insidethe componentGuide to Intelligent Data Science Second Edition, 202035

Components Composite ViewThe visualization nodeswithin the component can beorganized to build aninteractive composite viewYou can organize andreshape the node viewsfrom the Visual Layoutwindow (from inside thecomponent, last icon onthe toolbar)Guide to Intelligent Data Science Second Edition, 202036

Composite views interactivityEnable publication andsubscription to selection eventsto make the composite viewinteractive: data selected in oneview are highlighted in the othersGuide to Intelligent Data Science Second Edition, 202037

KNIME HubGuide to Intelligent Data Science Second Edition, 202038

KNIME HubA place to share knowledge about Workflows and Nodeshttps://hub.knime.comGuide to Intelligent Data Science Second Edition, 202039

KNIME HubWorkflowsNodes, Shared Components andExtensionsGuide to Intelligent Data Science Second Edition, 202040

KNIME Hub Spaces Private Space Your personal space. Upload hereyour workflows and components(max 1GB) to have them alwaysavailable in a central place Public Space Shared with the KNIMEcommunity. Everyone can find anddownload them from the KNIMEHubGuide to Intelligent Data Science Second Edition, 202041

Downloading and importing a workflow from the KNIME HubSearching for the Tag“theguidebook” will show you allthe workflows related to this bookGuide to Intelligent Data Science Second Edition, 202042

Downloading and importing a workflow from the KNIME HubMethod 1Download the workflow, locate itinto your machine and import itas seen beforeGuide to Intelligent Data Science Second Edition, 202043

Downloading and importing a workflow from the KNIME HubMethod 2Drag and drop theicondirectly into the KNIME Explorerat the desired locationGuide to Intelligent Data Science Second Edition, 202044

KNIME Cheat Sheetshttps://www.knime.com/cheat-sheetsGuide to Intelligent Data Science Second Edition, 202045

KNIME Bookse-book downloads from KNIME Presshttps://www.knime.com/knimepresswith code: Promotion-Code Guide to Intelligent Data Science Second Edition, 202046

Free e-Learning Courses about KNIME Analytics rsesGuide to Intelligent Data Science Second Edition, 202047

Build your first HelloWorkflowGuide to Intelligent Data Science Second Edition, 202048

Create your first workflowRight Click on the LOCAL folder in the KNIMEExplorer and select New KNIME WorkflowFrom the pop upwindow, insert the nameof your first workflowGuide to Intelligent Data Science Second Edition, 202049

Read the datasetDrag and drop the FileReader node from the NodeRepository to add it to theworkflowOpen the configurationwindow (double click) andselect the file on your machinecontaining the adult datasetGuide to Intelligent Data Science Second Edition, 202050

Remove columnsSome columns haveunnecessary information.Remove them with aColumn Filter nodeGuide to Intelligent Data Science Second Edition, 202051

Remove RowsAdd a Row Filter node andconfigure it to only keepentries whose “native-country”value is not “United-States”Guide to Intelligent Data Science Second Edition, 202052

Write to new fileFinally add a CSV Writer node to thepipeline.Configure and execute it to write thetransformed dataset to a new fileGuide to Intelligent Data Science Second Edition, 202053

Annotations Annotations are coloured editable boxes that you can add to your workflow They help you making it more readable and visually pleasantClick on the upper leftcorner icon to customizetext and appearance ofan annotationRight click anywhere on yourworkflow and add a New WorkflowAnnotation from the context menuGuide to Intelligent Data Science Second Edition, 202054

Thank youGuide to Intelligent Data Science Second Edition, 2020Guide to Intelligent Data Science Second Edition, 202055

KNIME Analytics Platform Open and opensource modular Data Science platform Covers all the data science needs: Based on the visual programming paradigm Provides a diverse array of extensions: Text Mining Network Mining Cheminformatics Deep Learning Many integrations, such as Java, R, Python, Weka, Keras, Plotly, H2O .