Data Governance is an umbrella term that we often hear is a priority at organizations of all sizes and maturity levels. You have data locked away in line of business applications and data warehouses. Most executives are well aware that data governance is no small task. Even the preliminary step of identifying the data can be an expensive and daunting task. Many times, organizations push data governance initiatives out simply because they don’t know where to begin.
Hidden away in the vast treasure of Azure data products, however, is a service that will change the way you approach governance. Azure Data Catalog gets you beyond step one by capturing and identifying all your data sources (whether on-premise or cloud) and identifying the lineage of how data moves around your organization – ultimately giving you insight into both risk and opportunity areas that lie ahead.
Azure Data Catalog is not new, but it’s often overlooked by customers. With an upgraded second generation version on its way, now is the time for organizations to start leveraging it. Whether you’re building a new analytics platform and data warehouse, or trying to get a handle on compliance and mitigating sensitive data, this relatively low-cost service can enable you to put governance front and center and get people in your organization to start talking about all your data assets instead of their individual silos.
With the second generation of Azure Data Catalog on its way, here are some use cases where we expect to see benefits:
Democratizing Data for Analytics & Reporting
We often see business and BI teams experience a ‘chicken or the egg’ dilemma – technical teams who know what data is out there, and business teams who could analyze the data to solve business problems but can’t access the data. Self-service BI was meant to resolve this, but often relies on technical users exposing the data properly. With Azure Data Catalog, business users can get a handle on what data is available for analysis, where it’s coming from, and jointly work with BI teams to get the data they need for the questions they need answered today.
Identifying Personally identifiable information (PII) and other Sensitive Data
With data moving all across the organization and living in various silos, Azure Data Catalog can help you both identify and mitigate PII and other sensitive data that should be secured, cleansed, or removed. With the ability to search for common patterns of sensitive data (such as social security number), this grueling exercise can become largely automated, all throughout the enterprise.
Understanding Data Lineage to Reporting
Data integration pipelines are complex, and with the growing demands of analytics and data needs across all aspects of an organization, it will only continue to grow in complexity. Azure Data Catalog will be able to capture not only the lineage of a dashboard metric, but also expose the business logic applied from start to finish. For data teams, this means removing the burden of manually documenting the multi-step process of how a CSV or Parquet file in the data lake moved throughout the ETL pipeline, into your data warehouse and Analysis Services, and out to a Power BI dashboard – all without the fear of documentation becoming stale or outdated.
Chances are your organization already has a data governance plan in place. Are you executing it successfully? Our experience shows that governance projects are often one-time exercises, with committee meetings that attempt to keep the momentum alive. Without making it a continuous part of your data processes, the end result will be an ungoverned data organization – and that exposes risk. Rather than making governance an after-thought, incorporate it into your next analytics project – whether large or small. Azure Data Catalog will get the process started and help ensure it’s not forgotten.
If you’re interested in learning more about Azure Data Catalog or want to discuss data governance as a whole, contact us for more information.
Fred Bliss is the CTO at Aptitive. He brings over 15 years of experience solving complex business problems through data solutions including cloud integration, data warehouse modeling, ETL, and front-end reporting implementations.