Hi! Is there any way to get new data sets (other than those already provided as examples) to be tested using Talend Big Data sandbox?? Where can I get?? Can I only upload .TSV extension data sets to the sandbox or other extension files can also be used??
Last edited by sk_19 (2014-08-28 06:29:29)
You can FTP any dataset you would like to the Sandbox. There are no limits. To FTP file you can use the users/passwords which are found in the cookbook. Please keep in mind the Sandbox is a single node Hadoop Virtual Machine. This will not be representative of any performance you can gain using Hadoop. You would need to create a much larger cluster consisting of 3 or more servers in the cluster. Hortonworks, Cloudera and MapR would all recommend the same if not even more nodes.
Also, you can use any file format you would like. For example the Twitter example uses JSON, the Data Warehouse example works on compressed files. There is an example that process Apache Web Log format as well.