Benchmark: Talend Open Studio Vs Pentaho Data Integrator .

Transcription

Benchmark: Talend Open Studio vs Pentaho Data Integrator(aka Kettle)V0.23MarcRussel@gmail.comLast modified: 2007-07-31Table of contentsEnvironment. 2Test 1: Text Input file Text Output file. 3Test 2: Text Input file XML Output file. 5Test 3: Text Input file Mysql Output table.7Test 4: Text Input file Transformation Text Output file. 9Test 5: Test 4 Lookup.11Test 6: Test 5 output filter. 13Test 7: Test 6 aggregation. 15Appendix 1: Transformation step/component. 17

EnvironmentComparison benchmarks were performed on TOS 2.1.0RC1 and TOS 2.1.r4725 vs PDI/Kettle2.5.0 and PDI 3.0.0M1.TOS 2.1.r4725 & PDI 3.0.0M1 have shown global enhancements as far as performance isconcerned.Some test results are missing on PDI 3.0.0M1, due to a component functional bug, preventingsome of the tests to run properly.Tests were carried out using files of 10,000 lines, 100,000 lines and 5 million lines.Tests with 10,000 and 100,000 records were executed 4 times – best result of four was retained,whereas, tests with 5 million records were only executed once.Exec time accuracy was 0.1s for PDI and 1ms for TOS.Hardware Configuration:– JVM: 1.5.0 12– OS: Windows XP SP2– CPU: Intel Core2 Duo T5200 @ 1,60 GHz– RAM: 1 GBFigure 1: schema in TOS Job DesignerFigure 2: schema in PDI SpoonTOS 2.1.0RC1TOS 2.1.r4725PDI 2.5.0PDI 3.0.0M1Figure 3: Chart callout list

Test 1: Text Input file Text Output fileJob descriptionReading x lines from a source file and writing them into a target file.Figure 4: test 1 with TOSFigure 5: test 1 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time ,000,0002331411875941025300743400Rows / 000,00021.4526.654.886.73Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 0024%-77%-69%Performance chartlines10,000100,0005,000,000051015rows / ms2025(the longer the stack is, the better the result)30

Test 2: Text Input file XML Output fileJob descriptionReading X lines from a source file and writing them into a target file, following an XMLsyntaxFigure 6: Test 2 with TOSFigure 7: Test 2 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time 0Rows / 0009.2310.154.573.02Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 010%-51%-67%Performance chartlines10,000100,0005,000,0000246rows / ms81012

Test 3: Text Input file Mysql Output tableJob descriptionReading X lines from a source file, and writing them into a MySQL table, committing every100 lines.Figure 8: Test 3 with TOSFigure 9: Test 3 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time 46,500Rows / 00,0001.181.31.121.33Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 -5%13%Performance chartlines10,000100,0005,000,00000.250.50.751rows / ms1.251.51.75

Test 4: Text Input file Transformation Text Output file––––Job descriptionReading X lines from a source file, carrying out the following transformations:adding a surrogatekey column (sequence)id id * 7name firstname ' ' lastnameaddr uppercase(addr)Writing the transformation output into a target text file.Figure 10: Test 4 with TOSFigure 11: Test 4 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time r**Rows / **error**5,000,00021.833.983.52**error**Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 r**5,000,00056%-84%**error**Performance chartlines10,000100,0005,000,0000510152025rows / ms303540

Test 5: Test 4 LookupJob descriptionReading X lines from a source file, carrying out the transformations as specified in Test 4,looking up to a MySQL table, for a State name using code state column. Then writing thetransformation output into a target file.NotesLookup table is cached in PDI.Lookup table size 1296 rows.Figure 12: Test 5 with TOSFigure 13: Test 5 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time ror**Rows / **error**5,000,00023.4528.583.73**error**Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 r**5,000,00022%-84%**error**Performance chartlines10,000100,0005,000,000051015rows / ms202530

Test 6: Test 5 output filterJob descriptionReading X lines from a source file, carrying out the transformations as specified in Test 5,filtering the output (code state matching 'FR'), writing the filtered output into a first target text file.Writing the main output flow into a second target text file.Figure 14: Test 6 with TOSFigure 15: Test 6 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time ror**Rows / *error**5,000,00025.2540.853.6**error**Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 r**5,000,00062%-86%**error**Performance chartlines10,000100,0005,000,0000102030rows / ms4050

Test 7: Test 6 aggregationJob descriptionReading X lines from a source file, carrying out the transformations as specfied in Test 6.The main output flow is aggregated on the code state column, and SUM, MAX, MIN, AVGfunctions are applied on column id.NotesPDI sorts and writes every 400 000 lines in a file, in order to reduce the memory use.The rows should be sorted before aggregation in PDI.Figure 16: Test 7 with TOSFigure 17: Test 7 with PDI

Test ResultsNr of linesTOS 2.1.0RC1TOS r4725PDI 2.5PDI 3.0M1Exec time or**Rows / 8**error**5,000,00035.5479.782.48**error**Ratio of Nr of rows processed/ms (against TOS 2.1.0RC1 r**5,000,000124%-93%**error**Performance chartlines10,000100,0005,000,000010203040rows / ms50607080

Appendix 1: Transformation step/componentFigure 18: Test 4 tMap component details (TOS)Figure 19: Test 4 JavaScript transformation step details (PDI)

Benchmark: Talend Open Studio vs Pentaho Data Integrator (aka Kettle) V0.23 MarcRussel@gmail.com Last modified: 2007-07-31 Table of contents Environment.2