Data governance is a broad-ranging discipline that affects everyone in an organization, whether directly or indirectly. It is most often employed to improve and consistently manage data through deduplication and standardization, among other activities, and can have a significant and sustained effect on reducing operational costs, increasing sales, or both.
Data governance can also be part of a more extensive master data management (MDM) program. The MDM program an organization chooses and how they implement it depends on the issues they face and both their short- and long-term visions.
For example, in the insurance industry, many companies sell various types of insurance policies renewing annually over a number of years, such as industrial property coverages and workers’ compensation casualty coverages. Two sets of underwriters will more than likely underwrite the business. Having two sets of underwriters using data systems specific to their lines of business is an advantage when meeting the coverage needs of their customers but often becomes a disadvantage when considering all of the data — but it doesn’t have to be.
The disadvantage arises when an agent or account executive needs to know the overall status of a client, including long-term profitability during all the years of coverage. This involves pulling data from policy systems, claims systems, and customer support systems. An analyst may be tasked with producing a client report for the agent or account executive to truly understand their client and make better decisions on both the client and company’s behalf. But the analyst may not know where the data is stored, who owns the data, or how to link clients across disparate systems.
Fifteen years ago, this task was very time-consuming and even five years ago was still quite cumbersome. Today, however, this issue can be mitigated with the correct data governance plan. We will go deeper into data governance and MDM in upcoming posts; but for this one, we want to show you how innovators like Snowflake are helping the cause.
What is data governance?
Data governance ensures that data is consistent, accurate, and reliable, which allows for informed and effective decision-making. This can be achieved by centralizing the data into one location from few or many siloed locations. Ensuring that data is accessible in one location enables data users to understand and analyze the data to make effective decisions. One way to accomplish this centralization of data is to implement the Snowflake Data Cloud.
Snowflake not only enables a company to store their data inexpensively and query the data for analytics, but it can foster data governance. Dynamic data masking and object tagging are two new features from Snowflake that can supplement a company’s data governance initiative.
What is dynamic data masking?
Dynamic data masking is a Snowflake security feature that selectively omits plain-text data in table or view columns based on predefined policies for masking. The purpose of data masking or hiding data in specific columns is to ensure that data is accessed on a need-to-know basis. This kind of data is most likely sensitive and doesn’t need to be accessed by every user.
When is dynamic data masking used?
Data masking is usually implemented to protect personally identifiable information (PII), such as a person’s social security number, phone number, home address, or date of birth. An insurance company would likely want to reduce risk by hiding data pertaining to sensitive information if they don’t believe access to the data is necessary for conducting analysis.
However, data masking can also be used for non-production environments where testing needs to be conducted on an application. The users testing the environment wouldn’t need to know specific data if their role is just to test the environment and application. Additionally, data masking may be used to adhere to compliance requirements like HIPAA.
What is object tagging?
Another resource for data governance within Snowflake is object tagging. Object tagging enables data stewards to track sensitive data for compliance and discovery, as well as grouping desired objects such as warehouses, databases, tables or views, and columns.
When a tag is created for a table, view, or column, data stewards can determine if the data should be fully masked, partially masked, or unmasked. When tags are associated with a warehouse, a user with the tag role can view the resource usage of the warehouse to determine what, when, and how this object is being utilized.
When is object tagging used?
There are several instances where object tagging can be useful; one use would be tagging “PII” to a column and adding extra text to describe the type of PII data located there. For example, a tag can be created for a warehouse dedicated to the sales department, enabling you to track usage and deduce why a specific warehouse is being used.
Where can data governance be applied?
Data governance applies to many industries that maintain a vast amount of data from their systems, including healthcare, supply chain and logistics, and insurance; and an effective data governance strategy may use data masking and object tagging in conjunction with each other.
As previously mentioned, one common use case for data masking is for insurance customers’ PII. Normally, analysts wouldn’t need to analyze the personal information of a customer to uncover useful information leading to key business decisions. Therefore, the administrator would be able to mask columns for the customer’s name, phone number, address, social security number, and account number without interfering with analysis.
Object tagging is also valuable within the insurance industry as there is such a vast amount of data collected and consumed. A strong percentage of that data is sensitive information. Because there is so much data and it can be difficult to track those individual pieces of information, Snowflake’s object tagging feature can help with identifying and tracking the usage of those sensitive values for the business user.
Using dynamic data masking and object tagging together, you will be able to gain insights into the locations of your sensitive data and the amount specific warehouses, tables, or columns are being used.
Think back to the situation we mentioned earlier where the property coverage sales department is on legacy system X. During that same time period, the workers’ compensation sales department is on another legacy system Y. How are you supposed to create a report to understand the profitability of these two departments?
One option is to use Snowflake to store all of the data from both legacy systems. Once the information is in the Snowflake environment, object tagging would allow you to tag the databases or tables that involve data about their respective departments. One tag can be specified for property coverage and another tag can be set for workers’ compensation data. When you’re tasked with creating a report of profitability involving these two departments, you can easily identify which information can be used. Because the tag was applied to the database, it will also be applied to all of the tables and their respective columns. You would be able to understand what columns are being used. After the data from both departments is accessible within Snowflake, data masking can then be used to ensure that the new data is only truly accessible to those who need it.
This was just a small introduction to data governance and the new features that Snowflake has available to enable this effort. Don’t forget that this data governance effort can be a part of a larger, more intricate MDM initiative. In upcoming blogs, we will touch more on MDM and other data governance capabilities to maintain and standardize your data, helping you make the most accurate and beneficial business decisions. If you have any questions in the meantime, feel free to get in touch.
Daniel Miziniak is a data management and analytics consultant at Aptitive. In his role, Daniel enables companies to access and use the insights held by their data.