You are not logged in.
Announcement
Unanswered posts
|
Pages: 1
Hi,
I've used tPostgresql and tFileOutputDelimited but when I start job I've got error that Java heap memory is too small. I have a feeling that job try to first select all rows to memory and after write it to a file. My table has several milions of rows and that is why it si imposible to select all data to memory. What is typical solution in 'telend' to export such number of rows?
Offline
The usual way to do this with TOS is tPostgresqlInput --row--> tFileOutputDelimited. I'm not sure for Java generated script, but theoretically, there should be no in memory buffer before writing to file (with Perl generated script, I'm sure there should be no memory problem due to the script).
Your question makes me think you should make a feature request on a new component dumping a table content to a file using the COPY PostgreSQL specific statement.
Offline
I've analyzed generated code and I've found:
java.sql.Statement stmt_tPostgresqlInput_1 = conn_tPostgresqlInput_1.createStatement();
java.sql.ResultSet rs_tPostgresqlInput_1 = stmt_tPostgresqlInput_1.executeQuery("SELECT cdr.\"cdr_id\", cdr.\"id_group_from\", cdr.\"id_group_to\", cdr.\"call_history\" FROM accounting.cdr");
because you do not use cursor, JDBC tries read all rows to memory.
I've chenged it to:
conn_tPostgresqlInput_1.setAutoCommit(false);
java.sql.Statement stmt_tPostgresqlInput_1 = conn_tPostgresqlInput_1.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
stmt_tPostgresqlInput_1.setFetchSize(100);
java.sql.ResultSet rs_tPostgresqlInput_1 = stmt_tPostgresqlInput_1.executeQuery("SELECT cdr.\"cdr_id\", cdr.\"id_group_from\", cdr.\"id_group_to\", cdr.\"call_history\" FROM accounting.cdr");
and now it export data (several milions o rows) without problem. But question is how to achive the same result without changing generated code?
Offline
szymic1 wrote:
and now it export data (several milions o rows) without problem. But question is how to achive the same result without changing generated code?
You can't.
Immediate solution is to duplicate current tPostgresqlInput into tPostgresqlInputSzymic1 with the enhancement you suggest (and publish it on our ecosystem).
Then you add a bug notification in the bugtracker with a link to this topic. You've made a good job in code generation analyze that will help a lot Java component developers. Your modification should be added for TOS 2.3.0 (or even next 2.2.x, depending on the code modification impact).
Offline
I have to say that I little disapointed
I thought that Talend is solution for massive data transformation. I thought I can use it to prepare data to our database warehouse. By the way does this problem exist also in Perl ?
Offline
szymic1 wrote:
I have to say that I little disapointed
I thought that Talend is solution for massive data transformation.
Yes, TOS is a solution for massive data transformation. You've just encountered a problem, given a solution, that will be added in a future release I think. As an opensource software, we're also expecting a "real life" feedback from our users to make the solution better and better.
szymic1 wrote:
By the way does this problem exist also in Perl ?
I've made a test with a 1M rows table with 5 columns:
talend=# select * from feature2567 limit 10; id | firstname | lastname | age | weight ----+------------+------------+-----+-------- 2 | YBZ9ZCXLH8 | be6m4y3sns | 94 | 73 4 | E6YKMSICEO | 25oxzpmjyq | 3 | 46 5 | 0GNYXMIOTI | hyzk6cdx6s | 49 | 57 6 | IWHMW2BELR | jf87wsb2lp | 58 | 19 7 | FORRMJSHYT | oquwqvfy31 | 83 | 13 8 | GGEULHEARC | o1my61ktdc | 78 | 39 9 | KXKLJMDE5G | ts0ndyhexd | 77 | 77 10 | RZDM1IZVBG | zu5gyr4mk4 | 23 | 75 11 | GOENZCJNV1 | u6hffiuiss | 92 | 71 13 | DA9BYY53PX | ftv7hcr7fg | 4 | 84 (10 rows)
The Perl script takes 86MB in memory when selecting 1M rows. If I had a limit on 500k first rows, the memory usage is divided by 2 (and with other limits, memory usage seems linear). The output file weights 34MB. I don't know if DBI (JDBC equivalent in the Perl world) stores all rows in memory, but the more rows the bigger memory usage is.
Offline
plegall wrote:
Yes, TOS is a solution for massive data transformation. You've just encountered a problem, given a solution, that will be added in a future release I think. As an opensource software, we're also expecting a "real life" feedback from our users to make the solution better and better.
I see that you are open to new ideas and I know that you can't develop everything what is needed for all cases and for all users ,but I still think you have made very interesting software, especially for this price ![]()
If I find a time I will add notification to your bugtracker. I've worked almost one year for Java project (and we use mainly Hibernate) where we process/generate milions of rows and I've got many interesting observation regarding DB performance in Java/Hibernate/PLSQL but is another story ![]()
Offline
bboutros has added a feature request for Java components [Bugtracker, feature 3251, fixed] Enable the use of cursor for all Database tXXXXXinput componant
plegall (me) has added a feature request for Perl components [Bugtracker, feature 3252, open] Reduce memory usage on tdb*Input components
szymic1, your feedback is welcomed about finding ways to improve performances on database connections.
Offline
Pages: 1