You are not logged in.
I'm working in getting data from very large txt fixed-length files and I would like to load them into MySQL. I have several of these files and they are about 28GB each. I found "Talend Open Studio for Big Data" and I would like to integrate this tools to my solution. At this point I'm working to see if it is possible to load such big file using Talend.
Any thought, recommendations and/or advice are welcome.
Welcome to Talend Community!
Talend Open Studio for Big Data mainly aims at HDFS, Hive and Hbase and extends capabilities for Pig.
According to your description, you'd better use Talend Open Studio for Data Integration to do this ETL.
The components about Mysql from Talend support bulk load and can handle with big data well.
There are some available features you can select for better performance.
I did install TOSDI and I'm working in the job to load my data. I have this huge file with 184 fields. I got the job tFileInputPositional to start playing. I check the basic settings and in the pattern I entered the length of the fields. When I ran the job it worked flawlessly but taking a close look the data was truncated according to the schema definition. Also make sense.
I need to change this schema definition and add my 184 field but it will take forever so I notice there is an import XML feature (can you guess this is my first time working with Talend?) so I created a formula within excel that replicates the XML format in which the schema is extracted. Then I copy the column and pasted in a text editor saving it as XML. I load my brand new file and the schema window goes blank. I don't see any error message about this. Do I need to load the schema manually?