Data ManagementAnalyticsTechnical

Data Preparation Tools — The new silver bullet?

By May 8, 2018 No Comments
Data Preparation Tools Guide

Bad data has plagued IT and Analytics for decades. Even with new visualization tools, predictive analytics, ETL, and cloud computing, every organization needs to clean up and transform their data so their reports, analysis, and dashboards work correctly. This need for cleaning up bad data has led to a new breed of Data Preparation tools.

Many vendors are releasing products and enhancements into the Data Preparation space recently, including:

  • Datameer
  • Google
  • Microsoft
  • Paxata
  • Tableau
  • Trifacta

 

Pros and Cons of Data Preparation Tools

Pros and Cons of Data Prep
Data Preparation tools allow the analyst faster turnaround and speed to market since other teams are not required for development. They also allow the user to do quick data discovery and improve bad data by applying some simple rules. For up-front analysis and departmental analytics in environments with limited IT resources, or a quick Proof-of-Concept (POC), Data Preparation tools can achieve strong business results and greatly reduce time-to-market.

 

Use Cases for Data Preparation Tools

Use Case for Data Prep Tools
One of the best use cases for a data preparation tool is for data discovery. The tools can analyze large volumes of data and look for patterns and distribution of values to identify outliers and missing values. Beyond that, the data preparation tools can apply rules to fix the bad data so the subsequent analysis and reports yield correct results.

Data Preparation tools are not a panacea, and like many new methods, tools, and procedures have some negative fit use cases as well. The biggest challenge to a self-service Data Preparation tool is the governance, standardization, and enterprise use of the data. For effective use of these tools beyond the departmental use or POC, strong governance is needed to keep the data under control. Otherwise, the tools could hinder the overall data efforts and impede progress by allowing multiple values and interpretations of the data when depending on the user of the data preparation tool.

Another challenge when using a data preparation tool is to ensure setting of expectations when using it on a project. After the quick POC development, the project stakeholders need to understand the additional work required to implement a full, robust implementation.

A full description of data governance for data preparation tools (to be included in more detail in a future blog) includes:

  • Multiple environments with stronger governance as things move from development to production
  • Change control (with reviews before code is promoted to the next environment)
  • Governance committee with representation from business, technical, data teams
  • Processes and Standards to support the strategy
  • Clear articulation of how the data preparation tools will be used and controlled
  • Roles and Responsibilities of stakeholders clearly defined
  • Data Architecture, Design, Dictionary well defined and documented (including data preparation transformation rules) to help achieve consistent results
  • Leveraging the audit and governance capabilities of the data preparation tool

With the power and reduced dependence on others comes the responsibility to use this type of tool wisely. A data preparation tool in the wrong hands, lacking governance could create more trouble with the data than the business benefits provided.

Aptitive helps organizations transform data into actionable, valuable, and accessible intelligence. Need help making your data actionable? Contact us for a data preparation audit. 

This post was originally posted on Medium.

Related Article: 

PROS AND CONS OF USING PAXATA

Chuck Diewald is the Analytics Practice Lead and Director of Analytics at Aptitive. He brings over 35 years of information technology and services industry consulting experience. His skill in analytics, data and solution architecture, strategy, and project management helps his clients unleash the power of their data.