Data lineage is the process of tracking the movement of the data from its source and the conversion of till its consumption. it includes all the transformation done on the dataset from its origin to the destination. Data lineage gives a better understanding to the organization of what happened to the data throughout the life cycle also. It also enables in tracing the errors and implementing changes in the process.
Another process to data lineage combines data discovery and the use of a Data Cataloging, it involves the creation of business glossary and most importantly connects technical metadata and business metadata. It allows the user to look for the data in both directions (forward and backward) between origin to destination of the data. Data Lineage provides us the answers the below mentioned W’s
- Who and When was the data created?
- What information it stores data?
- Where is the data located?
- Why does the data exist?
Why do Organizations need Data Lineage?
For a Data Steward
Data managers/Stewards are basically considered as the owners of data. Traditionally, ownership was at the application level, business level or product level, but data is an entity that combines all of these factors. It is therefore important to establish data ownership and lead to a data management system. it provides the Data Stewards the transparency about the data and the modifications that are happening during its flow
For a Business User
Data lineage helps a business user to find the reports based on any particular data entity or column. it enables the users to get an idea about the data and how is it being modified over the other application so it helps them in case of incorrect data coming from the upstream application so that they can inform them and get it corrected as a single bad data can impact in a huge loss for the Organization
How to Organizations Implementing Data Lineage?
Use of InHouse Developed Tools
Many Organizations are using inhouse tools for the scanning and extraction of data lineage from there organization, Using a inhouse tools give them a advantage of extra customization of the tool as per the needs of the Organization
Use of 3rd Party Developed Tools
Various vendors like Collibra, Informatica, ASG, IBM etc. have wide range of Data governance and Data Lineage products in the market. They are working very actively to expand the capabilities using latest technologies
Conclusion
Data Lineage helps the Organization to ensure that the data comes from a reliable data source, the transformations is performed correctly and loaded correctly in the destination. Data Lineage plays an important role when important decisions are based on accurate information. Without proper Data Governance technology and data Quality checks the process of validating this manually is likely to be a costly and time-consuming task.
Thank you for writing this article. I appreciate the subject too.
Thank you for writing this post. I like the subject too.
Your articles are extremely helpful to me. May I ask for more information?