I would like to see an API that makes it easy for an external tool to generate Talend jobs, manipulate metadata and such.
Here is a typical use case (that I have do deal with every day):
Imagine a regular Data Warehouse with well over 1000 file interfaces and over 10.000 tables.
Let's say you use Talend as ETL tool so you build every single interface individually. May be you make use of joblets, templates and other things - but there is still a lot of manual work involved if an existing interface needs to be altered or a new one is added.
What would make the life of the DWH team a lot easier would be this:
The business analyst who is in charge of defining a new interface has a (intranet based) form which offers all the option a typical file interface of this particular DWH can have: file name, record descriptions, field descriptions, delivery paths, quantity information... but also fields that just document the interface (name, business scope, content, certain provisions ...)
The content of that form is now stored somewhere (e.g. as XML document)
Based on that formal description, the tool automatically generates an interface agreement based on the Templates of the company so it really looks like the analyst has hand-written the document which can then be sent around, filed, shipped out for review...
But the tool also generates a complete Talend job for this new interface, including describing the file in the metadata section, including describing the target table, including generating scripts to actually create that table. Depending on the architecture of the DWH that job will have certain logging capabilites, data verification capabilities, error handling procedures, rejection procedures and so forth. So the job can be quite complex and therefore quite powerful. But since it's automatically generated, that's not a problem
In addition, the tool also generates a set of automated test cases for a functional test suite (test data, expectations...)
So when a new interface is needed, the analyst just fills out the form and the rest is automatically generated.
I assume that Talend will not provide such Tool (even though that would be awesome), but instead it should provide an API to make it possible for any company or even an IT product company to write such a tool.
And if the automatically generated job needs a certain amount of tweaking (e.g. for performance reasons or because the property of a given file is somewhat special), a developer can easily take the generated job as foundation for any kind of alteration; the Talend Studio easily allows for that.
The same way, altering an existing interface should only require the alteration of the above mentioned form.
If the mentioned tool was smart enough, it will automatically ask Talend to create migration scripts or DDL scripts as needed.
This kind of approach allows for the handling of the complete data model, data model changes, and all input/output interfaces of any given application in a declarative way and makes sure that documentation, implementation, and test cases always match.
Since a lot of money is spent on data model handling and interface development, the combo "Talend + the 'Tool'" would make a great impact to lower the costs and reduce the time to market significantly.
But it all should start with a Talend API that allows for any external tool to generate jobs and manipulate metadata.
What do you think?
Creating from scratch new is hard but you could probably define some typical job pattern as template and use one of these templates and change the schemas, table names etc. in it and save it under a new name as job. This is what I currently would do.
Talend Certified working for cimt objects AG in Berlin
Fayaz Khan said:
Glad to connect with you virtually. I would like to introduce you to AnalytiX Mapping Manager and Code Automation Framework (CATfx) APIs/Automation tools which will help govern the Pre-ETL data mapping process and Accelerate the ETL Development process. The product is ETL Agnostic and supports all major ETL tool Platforms. The product has been designed by industry experts in the USA and currently at release version 5. We will be introducing version 6 in January 2015. Primary users groups of our product are Data Analyst/Mapping Analyst, ETL Developers, Data Governance Team and just about anyone working in Data Warehousing and Data migration/conversion projects.
Summary of Key Features & Capabilities
◦ Web Based User Interface
◦ Manage, Govern & Reuse System Metadata (Data Dictionaries & Data Glossaries)
◦ Import Existing Excel based Mapping Specifications
◦ Import/Export Mapping Specifications as documentation developers
◦ Web based user interface for business users
◦ Automates import of ETL jobs (for any ETL Tool) as mapping specifications (Reverse engineering)
◦ Automates Creation of ETL jobs for ANY ETL TOOL
◦ CATfX Automation Framework enables users to create custom adaptors or Customized Code-Generation templates for Generating ETL Jobs (Any ETL Platform)
◦ Print Reports: Mapping Spec, Project Report & System Analysis Report
◦ Perform impact analysis of Tables, Columns and Transformations
◦ View Impact Analysis and data lineage across the enterprise (Lineage Analyzer)
For more information or if you would like to have a webinar capabilities presentation or access to an evaluation license, please visit our website: www.anaytixds.com or contact l me directly @+44-0-207 193 6563 | email@example.com and I would be happy to facilitate.