You are not logged in.

Unanswered posts



Important! This site has been replaced. All content here is read-only. Please visit our brand-new community at https://community.talend.com/. We look forward to hearing from you there!



#1 2014-03-24 01:36:31

Glasswalker
Member
4 posts

Glasswalker said:

tExtractJSONFields not getting any rows. Help?

Hey, new to Talend. Used other ETL tools in the past, but getting my head around this one for a personal "fun" project (trying to use it to familiarize myself with the tool).

So I'm processing public bitcoin block-chain data to do analysis on the financial transactions...

Here is some example JSON:

{"blocks":[{"hash":"000000000003ba27aa200b1cecaad478d2b00432346c3f1f3986da1afd33e506","ver":1,"prev_block":"000000000002d01c1fccc21636b607dfd930d31d01c3a62104612a1719011250","mrkl_root":"f3e94742aca4b5ef85488dc37c06c3282295ffec960994b2c0d5ac2a25a95766","time":1293623863,"bits":453281356,"fee":0,"nonce":274148111,"n_tx":4,"size":957,"block_index":107177,"main_chain":true,"height":100000,"tx":[{"time":1293623863,"inputs":[[]],"vout_sz":1,"relayed_by":"0.0.0.0","hash":"8c14f0db3df150123e6f3dbbf30f8b955a8249b62ac1d1ff16284aefa3d06d87","vin_sz":1,"tx_index":238294,"ver":1,"out":[{"n":0,"value":5000000000,"addr":"1HWqMzw1jfpXb3xyuUZ4uWXY4tqL2cW47J","tx_index":238294,"type":0}],"size":135},{"time":1293623863,"inputs":[{"prev_out":{"n":0,"value":1000000,"addr":"1JxDJCyWNakZ5kECKdCU9Zka6mh34mZ7B2","tx_index":237619,"type":0}}],"vout_sz":1,"relayed_by":"0.0.0.0","hash":"e9a66845e05d5abc0ad04ec80f774a7e585c6e8db975962d069a522137b80c1d","vin_sz":1,"tx_index":238297,"ver":1,"out":[{"n":0,"value":1000000,"addr":"16FuTPaeRSPVxxCnwQmdyx2PQWxX6HWzhQ","tx_index":238297,"type":0}],"size":225},{"time":1293623863,"inputs":[{"prev_out":{"n":1,"value":300000000,"addr":"15vScfMHNrXN4QvWe54q5hwfVoYwG79CS1","tx_index":219170,"type":0}}],"vout_sz":2,"relayed_by":"0.0.0.0","hash":"6359f0868171b1d194cbee1af2f16ea598ae8fad666d9b012c8ed2b79a236ec4","vin_sz":1,"tx_index":238296,"ver":1,"out":[{"n":0,"value":1000000,"addr":"1H8ANdafjpqYntniT3Ddxh4xPBMCSz33pj","tx_index":238296,"type":0},{"n":1,"value":299000000,"addr":"1Am9UTGfdnxabvcywYG2hvzr6qK8T3oUZT","tx_index":238296,"type":0}],"size":257},{"time":1293623863,"inputs":[{"prev_out":{"n":0,"value":5000000000,"addr":"1BNwxHGaFbeUBitpjy2AsKpJ29Ybxntqvb","tx_index":234892,"type":0}}],"vout_sz":2,"relayed_by":"0.0.0.0","hash":"fff2525b8931402dd09222c50775608f75787bd2b87e56995a7bdd30f79702c4","vin_sz":1,"tx_index":238295,"ver":1,"out":[{"n":0,"value":556000000,"addr":"1JqDybm2nWTENrHvMyafbSXXtTk5Uv5QAn","tx_index":238295,"type":0},{"n":1,"value":4444000000,"addr":"1EYTGtG4LnFfiMvjJdsU7GMGCQvsRSjYhx","tx_index":238295,"type":0}],"size":259}]}]}

I have a tFileInputJSON to read the data.
I have Read by XPath unchecked, and just the blocks field in the schema. Defined as "$.blocks[*]" for it's JSONPath query.

I have tested this with tLogRow and it does pass on the right row for the block data. (and in a case with more than one block in the array it sees each block as a row).

I then have this outputting to tExtractJSONFields. It has JSON Field selected as blocks.
Loop XPath query is "/" because I want to extract fields of the Block directly
I then do a test with block_index as the only field, and XPath query as "/block_index" without get Nodes checked.

If I output this to tLogRow it doesn't do any rows...

I ultimately want to chain this so that the contents of tx is passed to another tExtractJSONFields which breaks out the transactions, and their inputs/outputs in turn are broken apart again by another ExtractJSONFields and so on, to tease apart the nested data in this JSON... (this is a simple example, later blocks get VERY large and much nested data)

Any suggestions/help would be greatly appreciated!

Offline

#2 2014-03-24 05:11:31

Glasswalker
Member
4 posts

Glasswalker said:

Re: tExtractJSONFields not getting any rows. Help?

Update: I notice in the output log of the job I get the following:

[statistics] connecting to socket on port 3982
[statistics] connected
Cannot determine next state
Cannot determine next state
.-----------.
| tLogRow_1 |
|=---------=|
|block_index|
|=---------=|
'-----------'

[statistics] disconnected

The "Cannot Determine Next State" messages are in red. But do not seem to register as errors...

If I disconnect the tExtractJSONFields module, then it does not throw these messages.

I suspect these reflect 2 rows (my current testing I've tried several json examples, in this case it's a file with 2 blocks in it, so there are 2 rows).

I'm completely at a loss to what's causing this...

Offline

#3 2014-03-25 16:54:10

Glasswalker
Member
4 posts

Glasswalker said:

Re: tExtractJSONFields not getting any rows. Help?

Anyone else have any input on this? I'm now working on a simplified use-case to try and better troubleshoot. This is driving me insane though, Nothing I seem to do can get me to pass on the nested levels of the JSON data. I'm probably just doing something stupid being unfamiliar with Talend, but it's not that different from other ETL tools I've used.

Sure I could do it directly in the code. But I assumed with a suite of JSON handling modules, that it would be able to do it from the designer view directly.

Offline

#4 2014-07-18 18:48:57

PradeepBR
Guest

PradeepBR said:

Re: tExtractJSONFields not getting any rows. Help?

Did you find a resolution for this? I am having a similar kind of issue, when I try to process Datasift files, specifically the Twitter sections. Same kind of error. Would greatly appreciate if you could share how you fixed this. Thanks!

Board footer

Talend Contributor Agreement - Talend Website Privacy Policy