Performance test scenario 4

Result table

Exec timesJavaOShardwareUserNote
TOS 2.4.0
See graph below 1.6.0_03-b05 Ubuntu 7.10 Sony Vaio laptop 2GB, Intel Core 2 Duo T7100@1.8GHz, HD 5400 rpm amaumont

:performances:scenario_4-05

png

We can notice several things about this graph:
- 'Lookup Memory' times are 35 % to 50 % quicker than worst times of 'Lookup Store on disk' when 'Max rows buffer' property is set with 10 millions rows.
- 'Lookup Memory' times are comparable to best times of 'Lookup Store on disk' when 'Max rows buffer' property is set around 200,000 rows.
- the 'Lookup Store on disk' curves seems to join at a given number of rows, then “Max rows buffer” could have no effect anymore since a this number of rows. It could be explained by the fact that too many files would be generated, which would slow down the process.

Overview

In this test scenario, we read a source and a data source lookup containing from 1,000,000 lines to 20,000,000, for each test data sources have same lines count and have labels lightly different.

:performances:scenario_4-01.png

Configuration

:performances:scenario_4-02.png

Configuration of advanced property “Max buffer size”:

:performances:scenario_4-03.png

This value corresponds approximatively to the best value for this test.

The best value depends mainly on many factors:
- Hard Disk speed
- Processor speed
- data size
- Number of rows to sort/write on disk
- Number of columns in each row
- Capacity for the OS to support a given number of opened files

This value set to 200,000 rows implies that a new data main file or two lookup files will be written for each 200,000 rows write into the buffer.

Then for a test with 20,000,000 rows in each source, files count below will be generated :
⇒ Main files = 20,000,000 / 200,000 = 100 files
⇒ Lookup files = (key file + data file) * 20,000,000 / 200,000 = 200 files
So, 300 temporary files will be generated, then opened by OS at same time for this case.

For now, we can't set a different “Max buffer size” for each source, but in a near future, we will add a feature to adjust automatically this value. Yet, this auto-adjustment could have a limit, indeed by seeing the graph result we can see that the best and worst curves seem to join at a given number of rows, I check it later.

The below graph shows all the drop out around 175,000 - 200,000 rows for “Max rows buffer”:
:performances:scenario_4-04.png

 
performances/scenario_4.txt · Last modified: 2011/12/17 12:52 (external edit)
 
 
Recent changes RSS feed Driven by DokuWiki