As a business leader, you want to get to impactful analytics ASAP. You want faster reporting, easy-to-use dashboards, and quick and accurate answers to your questions. Whether you’re the CFO, head of operations, or marketing lead, you want to use your data to make better, more insightful business decisions.
However, you can’t implement analytics without a solid data strategy. A common component of implementing a data management architecture is ETL, or “extract, transform, load” logic. We understand that your CEO wants to hear, “We’ve implemented robust analytics,” not “We’re working on ETL.” ETL doesn’t sound exciting. But to get you where you want to go, ETL should be a part of your data story.
At Aptitive, we aim to help you feel confident through all phases of your data and analytics journey; so today I’m answering “what is ETL?,” including where ETL fits into data management and why it may be an essential precursor to your organization’s analytics goals.
What is ETL?
ETL involves collecting the data that is most relevant to the key performance indicators you are trying to analyze – which could be all of the data across your organization or just pieces of the data stored in various source systems – centralizing, transforming, and modeling it, likely in a data warehouse.
As previously mentioned, ETL stands for “extract, transform, load.” Let’s break that down:
Data is gathered from the relevant data sources (e.g., ERP, CRM, application databases, SaaS applications, flat files, and/or Excel spreadsheets). This could be financials, customer details, and internal HR information. Or maybe your marketing department, in particular, needs an ETL solution; they may store customer data across multiple systems, such as your enterprise CRM, email marketing software, and a digital advertising provider.
This could be done manually; or it can be automated utilizing orchestration workflows and ETL that eliminates the manual process, making it much faster and more reliable to analyze your business data.
The data transformation stage is really “where the magic happens” as this is where you make your data usable. Data is cleaned, deduped, and transformed to align with how the business wants to analyze the data in your data warehouse, ensuring data quality and ease of use. This includes reformatting for uniformity, such as restructuring IL, Il, Ill., and Illinois to all read “Illinois.” Data is also cleansed of inconsistencies and missing values, duplicate values are removed, and data is prepared for the business to easily interact with it.
Most of the time spent on the ETL process will be used to come up with the right calculations, determine data quality rules, and model the data for analytics, all key pieces of data transformation. This stage is only becoming more crucial as organizations are seeing growing amounts of disparate data from many different sources.
In the last stage of the ETL process, the transformed data is loaded into your data warehouse so all data is easily accessible for future use in analytical dashboards and reporting. Depending on your business needs, this process can be done in batch or near-real-time, which allows you to access and analyze your data as it’s being loaded into your source systems.
What is ELT?
You may be wondering why this section has the same header as the previous section. If you look closely, you’ll notice two different acronyms: ETL and ELT. If you guessed that ELT stands for “extract, load, transform,” you would be correct. Why might we switch the order of the process?
ELT is most useful when speed is the primary concern. Data is loaded more quickly, and you can decide which data to transform at which point. This lets you access the exact data you need more quickly, though you will have to wait for additional data transformation if more of your data becomes necessary.
Of course, this approach can come with security concerns as large amounts of data are loaded into a system prior to being transformed. Additionally, ELT can impede analytics based on large data sets if your system isn’t capable of transforming the required data quickly enough.
An ETL/ELT tool, or a custom ETL solution built with multiple technologies, can displace some of the concerns you may have about either method as they’re built precisely for this purpose.
What ETL tools (and ELT tools) should you know about?
There are two ways to approach ETL: building a custom ETL solution (which Aptitive can help with) or using a prepackaged ETL tool. The decision to buy or build your ETL tool depends on multiple factors, but a few tools stand out if you decide to go the “buy” route.
Fivetran is cloud-based and offers more than 130 ready-to-use data sources, plus the ability to add custom integrations. In our experience, the Fivetran ETL process can take less than five minutes (but keep in mind, it’s streamlined). Fivetran deals primarily with data integration, not so much transformation. Fivetran’s pricing plan is consumption-based.
Like Fivetran, HVR is a tool to know in the ETL space but doesn’t provide the full ETL experience. It’s intended to assist more with data replication and integration than transformation. HVR offers more than 20 data source integrations, as well as the ability to add custom integrations. Pricing is based on an initial upfront cost for the platform and an additional cost for each connection to your source system data.
Matillion is a cloud ETL platform with more than 100 integrated data sources. Pricing is based on the time the program is turned on, not based on usage.
Formerly known as Attunity Replicate, Qlik Replicate works on-prem and in the cloud, supporting all major data sources. Pricing is based on the number of cores on your database source servers.
Talend is compatible with both on-prem and cloud data sources, and it includes hundreds of pre-built source integrations. There’s an open-source version, as well as a paid platform more suited to larger organizations that would benefit from additional tools and features.
Not sure where to start after all of this information? Set up a complimentary data strategy whiteboarding session with Aptitive. We’ll walk through the current state of your data and your data and analytics goals to determine the best path to the insights you need.
- Blog Post: Should You Build or Buy Your ETL Tool?
- Blog Post: What Is dbt (data build tool) and How Is It Different?
- Case Study: Serverless ETL in a Cloud Data Warehouse
- Case Study: Single Source of Truth for CPG
Jason Maas is the COO of Aptitive and head of the Data Practice. He is a seasoned technology leader with experience spanning several industries such as transportation, accounting, insurance, retail, software, healthcare, and financial services.