Continuing from my smart cities post, thought would move into Data Migration, the other topic I intend posting on. The intention is to alternate between these two topics but please do bear with me if I do not keep up my word on that though.
I happened to read the other day â€œâ€œMigration is not just about moving the dataâ€¦
Itâ€™s about making the data work.â€ Well it got me thinking, about my experiences with Data Migration, true enough, seemed like it was me stating that.
What are the challenges around making the data workâ€¦? My experiences around data migration in moving large quantities of data from source to a new target system is what I intend writing about. I wonâ€™t go into the challenges faced, they are there, and my take on this thread is, what the important issues in a migration are and, how we managed that. This post is about my experiences in data profiling and the gains from it.
Data Quality: The most abused word in a migration process, but having stated that, it indeed is the most important issue. The question that we were faced with was that â€œthe data you say is of poor quality is the one that we have been running our system for the past 15 yearsâ€, whenever an issue about data quality was raised. The answer to this question is simply data profiling. Data profiling is an objective measure of the â€œqualityâ€ of the source data and the suitability of that to the target system.
Data Migration is always dealt as a three stage process â€“ Extract, Transform, and Load. Â The extracts and load have been the easiest task in the migration process and are a function of the data size. The transformation is the one that consumes the most time, both in terms of tool development as well as the actual data migration. Quality hits the transform, and thus hits the data migration where it hurts the most.Â Trust me, it is painful when hit.
Profiling brings out a comprehensive understanding of the â€œdata qualityâ€, my experiences point to the fact that profiling output defines the development transformation tools and approach. Bad records will be dropped rather than being cleansed, if your transformational output on load points to this, the diagnosis is that a profiling exercise has not been carried out. In relational systems a small set of bad records can actually make the data set migrated disappear when the relationships are factored in. I call this the â€œnasty multiplication factorâ€. The solution to mitigating the â€œnasty multiplication factorâ€ is to profile and transform as many bad records or make strategies around cleansing them. The inverse of the â€œnasty multiplication factorâ€ being that a small set of records cleansed or transformed properly could mean a complete to a near complete migration. Fortunately, it works both ways.
This being my experience, look forward to comments regarding yours.
- Contributed by Vancheeswar B