• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » [resolved] Managing incremental keys in SQL Server importing Data

#1 2010-11-08 18:49:12

ienax_ridens
Member
Registered: 2010-10-27
Posts: 50

[resolved] Managing incremental keys in SQL Server importing Data

Good morning, this is my first post on Talend forum; I have a specific question about importing data from a source to anotherone (SQL data base).

To be simple, in this case we can consider 3 .xls files (then the procedure can be used also with diferent input/output objects):

INPUT.xlsx  --> Contains the input data that can be new data, modified data (to be updated in the OUTPUT) or missing data (in this case we want to change a column value in the OUTPUT)

OUTPUT.xlsx --> Contains the output data with the Primary Key

CONTATORE.xlsx --> Contains the LAST_ID(int; not null) field: this number is the last number used in OUTPUT as incremental Primary Key.


I WANT MANAGE 4 CASES:

1. INSERT new records assigning the correct ID as PK using CONTATORE.xlsx

Why I need of the CONTATORE file: the PK (primary key) of the OUTPUT  is not auto-incremental, thus I have to handle manually the increment of that field that can't be NULL or duplicated; in this way on Insert data:
a) I check the LAST_ID in CONTATORE
b) I insert in the OUTPUT the record with ID = LAST_ID+1
c) I set the new LAST_ID

NOTE: I want to increment always the ID relatively to the LAST_ID in CONTATORE.xlsx even if in OUTPUT file I have some missing ID values (e.g. if I delete some records)

2. MODIFY some existing records if in the INPUT I have different data using a fixed field --> check if record exist and then changing the values

3. SET STATE = 'Deactivated' (in OUTPUT file) if one record was removed from INPUT file

4. DELETE record in the OUTPUT.xlsx if it does not exist in the INPUT.xlsx

------------------------------------------------
Below you can see, in order LEFT --> RIGHT: CONTATORE, INPUT and OUTPUT sample files.


Uploaded Images

Last edited by ienax_ridens (2010-11-08 19:02:17)

Offline

#2 2010-11-09 15:06:01

tchd
Member
Company: edexe
Registered: 2010-07-05
Posts: 79
Website

Re: [resolved] Managing incremental keys in SQL Server importing Data

Hi,

I have a couple of questions:

- Is the output going to be a copy of input but with keys added?  i.e. is the input a full refresh?
- What is the difference between modified data and missing data.  In both cases you seem to be simply changing column values.
- Are you not including missing rows in the output, or marking them as Deactivated?
- Is there a difference between input and output file and input and output xlsx?

Answers to the above may change what I describe below.  I'm assuming that you will be doing a refresh:

- Subjob 1 - read CONTATORE.xlsx and write to a context variable (maxkey)
- Subjob 2 - read the input and join using tmap to the old copy of the ouput i.e. the one from the previous run.  Join on the name.  If match then use the key found from the lookup and copy the input data.  If they don't, use Numeric.sequence to assign a new key.  Note: set the start value of the sequence to the context variable maxkey.  Write to the new output file.
- Subjob 3 - Read the new output and use taggregate to find the highest value in the primary key field and write this back to CONTATORE


If you wan't to make it more efficient you could have a second output from tmap,  that tests the key value for each record against the previous highest and writes out the highest each time.  If you connect this output then the value of maxval will be set automatically without having to make a second pass through the output file.  Subjob 3 would then simply be a tfixedflow input --> CONTATORE.

Hope this makes sense.

Regards,
Rick

Offline

#3 2010-11-09 15:32:48

ienax_ridens
Member
Registered: 2010-10-27
Posts: 50

Re: [resolved] Managing incremental keys in SQL Server importing Data

Thanks about the answer, you're right, I'll try to be more precise:

tchd wrote:

- Is the output going to be a copy of input but with keys added?  i.e. is the input a full refresh?

No, the output is not a copy of input: output file contains ALSO the data of input file.
i.e. I could have IN THE INPUT file some new records to insert in the output, some records equals and/or some removed records that I want to keep in the output if they already exist

tchd wrote:

- What is the difference between modified data and missing data.  In both cases you seem to be simply changing column values.

Yes, I want only to change some valuse, the only difference is that in one case I have to do a check on data present in output and not in input; and in the other case I have to make a check on the 'Name' field and modify 'Number' value on data existing in input AND output file

tchd wrote:

- Are you not including missing rows in the output, or marking them as Deactivated?

I want to mark as 'Deactivated' in the output, all the rows that are missing in the input. (This is related to previous point)

tchd wrote:

- Is there a difference between input and output file and input and output xlsx?

No, there is not, they are the same. Sorry for names that make confusion.

Thanks a lot for the answer, I'll try and if I'll have doubts I'll post again!
Erik

Offline

#4 2011-09-14 16:51:35

ienax_ridens
Member
Registered: 2010-10-27
Posts: 50

Re: [resolved] Managing incremental keys in SQL Server importing Data

I solved in a very simple way: I used a calculated variable in tMap.

Thanks to all

put [resolved] in title

Offline

#5 2011-09-15 01:23:34

alevy
Member
Registered: 2009-11-20
Posts: 1477

Re: [resolved] Managing incremental keys in SQL Server importing Data

ienax_ridens wrote:

put [resolved] in title

There is a link at the bottom right of your initial post in this topic to resolve it.

Last edited by alevy (2011-09-15 01:24:14)

Online

#6 2011-09-15 09:10:24

ienax_ridens
Member
Registered: 2010-10-27
Posts: 50

Re: [resolved] Managing incremental keys in SQL Server importing Data

Thanks alevy! You save me every time...

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » [resolved] Managing incremental keys in SQL Server importing Data

Board footer

Powered by FluxBB