Data Preparation Tools Guide

Bad data has plagued IT and analytics for decades. Even with new visualization tools, predictive analytics, ETL, and cloud computing, every organization needs to clean up and transform their data so their reports, analysis, and dashboards work correctly. This need for cleaning up bad data has led to a new breed of data preparation tools.

Many vendors are releasing products and enhancements into the data preparation space, including:

  • Datameer
  • Google
  • Microsoft
  • Paxata
  • Tableau
  • Trifacta

Pros and Cons

Pros and Cons of Data Prep
Data preparation tools allow the analyst faster turnaround and speed-to-market because other teams are not required for development. They also allow the user to do quick data discovery and improve bad data by applying some simple rules. For upfront analysis and departmental analytics in environments with limited IT resources, or a quick proof-of-concept (POC), data prep tools can achieve strong business results and greatly reduce time-to-market.

Use Cases

Use Case for Data Prep Tools
One of the best use cases for a data preparation tool is for data discovery. The tools can analyze large volumes of data and look for patterns and distribution of values to identify outliers and missing values. Beyond that, the tools can apply rules to fix the bad data so the subsequent analysis and reports yield correct results.

Data prep tools are not a panacea and – like many new methods, tools, and procedures – have some negative fit use cases as well. The biggest challenge to a self-service data prep tool is the governance, standardization, and enterprise use of the data. For effective use of these tools beyond the departmental use or POC, strong governance is needed to keep the data under control. Otherwise, the tools could hinder the overall data efforts and impede progress by allowing multiple values and interpretations of the data depending on the user.

Another challenge is to ensure setting of expectations when using a data preparation tool on a project. After the quick POC development, the project stakeholders need to understand the additional work required to implement a full, robust implementation.

A full description of data governance for data preparation tools includes:

  • Multiple environments with stronger governance as things move from development to production
  • Change control (with reviews before code is promoted to the next environment)
  • Governance committee with representation from business, technical, data teams
  • Processes and standards to support the strategy
  • Clear articulation of how the tools will be used and controlled
  • Roles and responsibilities of stakeholders clearly defined
  • Well-defined and documented data architecture, design, and dictionary (including data preparation transformation rules) to help achieve consistent results
  • Leveraging the audit and governance capabilities of the data preparation tool

With the power and reduced dependence on others comes the responsibility to use this type of tool wisely. A data preparation tool in the wrong hands and lacking governance could create more trouble with the data than the business benefits provided.

Aptitive helps organizations transform data into actionable, valuable, and accessible intelligence. Need help making your data actionable? Contact us for a data preparation audit. Want better dashboards? Our data and analytics experts are here to help. Learn more about our Data Visualization Starter Pack.

This post was originally posted on Medium.

Related Article: 

PROS AND CONS OF USING PAXATA