You are not logged in.
I am a Talend newbie, so please bear with me. I am doing incremental loading into an SQLServer database using Talend, and I was wondering what steps I'd need to take to track duplicates. Let's say I have a csv file f1 today:
I load the above contents into my SQLServer table, and then I receive from my client in a month from now file f2 with the following contents:
I want to use Talend to figure that the new file f2 has records (102,103 in the above example) that are already present in my SQLServer table. I want to store these duplicated records separately so the client can decide what to do about it. Apart from this simple exact match case, I was also wondering if Talend can detect fuzzy matches as well (let's say detecting by last names that aren't always spelled right).
In this context, could somebody please suggest what I could do to achieve my goal?