The Data Supply Chain for Marketers is the second in the series of our Marketing Data & Analytics Guide for data-driven marketing leaders. In this series we cover major terms, acronyms, and technologies you might encounter as you seek to take control of your data, improve your analytics, and get more value from your MarTech investments.
In this part of the series, we’ll explore the basics of The Data Supply Chain, including:
- Data ingestion
- The differences between ETL and ELT
- Data pipelines
- Data storage options
Data Ingestion is the first step in any analytical undertaking. It’s a process where data from one or many sources are gathered and imported into one place. Data can be imported in real-time (like POS data) or in batches (like billing systems).
Why it matters for marketers:
The process of data ingestion consolidates all of the relevant information from across your data sources into a single, centralized storage system. Through this process, you can begin to convert disparate data created in your CRM, POS, and other source systems into a unified format that is ready for real-time or batch analysis.
Marketing teams pull data from a wide variety of resources including Salesforce, Marketo, Facebook, Twitter, Google, Stripe, Zendesk, Shopify, Mailchimp, mobile devices, and more. It’s incredibly time consuming to manually combine these data sources but by using tools to automate some of these processes you can get data into the hands of your team faster.
This empowers marketers to answer more sophisticated questions about customer behavior, such as:
- Why are customers leaving a specific product in their online shopping carts?
- What is the probability that we’ll lose a customer early in the customer journey?
- Which messaging pillar is resonating most with customers in the middle of the sales funnel who live in Germany?
Image 1: In this image, three source systems with varying formats and content are ingested into a central location in the data warehouse.
ETL vs. ELT
ETL and ELT are both data integration methods that make it possible to take data from various sources and move it into a singular storage space like a data warehouse. The difference is in when the transformation of data takes place.
As your business scales, ELT tools are better equipped to handle the volume and variety of marketing data onhand. However, a robust data plan will make use of both ELT and ETL tools.
For example, a member of your team wants to know which marketing channels are the most effective at converting customers with the highest average order value. The data you need to answer that question is likely spread across multiple structured data sources (referral traffic from Google Analytics, transaction history from your POS or e-commerce system, and customer data from your CRM).
Through your ETL process, you can extract relevant data from the above sources, transform it (e.g. updating customer contact info across files for uniformity and accuracy), and load the clean data into one final location. This enables your team to run your query in a streamlined way with limited upfront effort.
In comparison, your social media marketing team wants to see if email click-through rates or social media interactions lead to more purchases. The ELT process allows them to extract and load all of the raw data in real-time from the relevant source systems and run ad-hoc analytics reports, making adjustments to campaigns on the fly.
Extract, Transform, Load (ETL)
This method of data movement first copies data from the original database into the target system and then converts the data into a singular format. Lastly, the transformed data is uploaded into a data warehouse for analytics.
When you should use ETL:
ETL processes are preferable for moving small amounts of structured data with no rush on when that data is available for use. A robust ETL process would clean and integrate carefully selected data sources to provide a single source of truth that delivers faster analytics and makes understanding and using the data extremely simple.
Image 2: This image shows four different data sources with varying data formats being extracted from their sources, transformed to all be formatted the same, and then loaded into a data warehouse. Having all the data sources formatted the same way allows to have consistent and accurate data in the chart that is built from the data in the data warehouse.
Extract, Load, Transform (ELT)
Raw data is read from source databases, then loaded into the database in its raw form. Raw data is usually stored in a cloud-based data lake or data warehouse allowing you to transform only the data you need.
When you should use ELT:
ELT processes shine when there’s large amounts of complex structured and unstructured data that need to be made available more immediately. ELT processes also upload and store all of your data in its raw format, making data ingestion faster. However, performing analytics on that raw data is a more complex process since cleaning and transformation happen post-upload.
Image 3: This image is showing four different data sources with the data formatted different ways. The data is being extracted from the various sources, loaded into the data warehouse, and then transformed within the data warehouse to all be formatted the same. This allows for accurate reporting of the data in the chart seen above.
A data pipeline is a series of steps in an automated process that moves data from one system to another, typically using ETL or ELT practices.
Why it matters for marketers:
The automatic nature of a data pipeline removes the burden of data manipulation from your marketing team. There’s no need to chase down the IT team or manually download files from your marketing automation tool, CRM, or other data sources to answer a single question. Instead, you can focus on asking the questions and honing in on strategy while the technology takes away the burden of tracking down, manipulating, and refreshing the information.
Say under the current infrastructure, your sales data is split between your ecommerce platform and your in-store POS systems. The different data formats are an obstacle to proper analysis, so you decide to move them to a new target system (such as a data warehouse).
A data pipeline would automate the process of selecting data sources, prioritizing the datasets that are most important, and transforming the data without any micromanagement of the tool. When you’re ready for analysis, the data will already be available in one destination and validated for accuracy and uniformity, enabling you to start your analysis without delay.
Data Storage Options
Databases, data warehouses and data lakes are all systems for storing and using data but there are differences to consider when choosing a solution for your marketing data.
- A database is a central place where a structured and organized collection of data can be stored in a computer that is accessed via various applications such as MailChimp, Rollworks, Marketo or even more traditional campaigns like direct mail. It is not meant for large-scale analytics.
- A data warehouse is a specific way of structuring your data in database tables so that it is optimized for analytics. A data warehouse brings together all your various data sources under one roof and structures it for analytics.
- A data lake is a vast repository of structured and unstructured data. It handles all types of data and there is no hierarchy or organization to the storage.
Why it matters for marketers:
There are benefits and drawbacks to each type of data structure and marketers should have a say in how data gets managed throughout the organization. For example, with a data lake, you will need to have a data scientist or other technical resource on staff to help make sense of all the data but your marketing team can be more self-sufficient with a database or data warehouse.
Without organization and structure, the insights your data holds can be unreliable and hard to find. Pulling data from various source systems is often time-consuming and requires tedious and error-prone reformatting of the data in order to tell a story or answer a question. A database can help to store data from multiple sources in an organized central location.
Without databases, your team would have to use multiple Excel sheets and manual manipulation to store the data needed for analysis. This means your team would have to manually match up or copy/paste each Excel sheet’s data in order to create one place to analyze all of your data.
A data warehouse delivers an extra layer of organization across all databases throughout your business. Your CRM, sales platform, and social media data differ in format and complexity, but often contain data about similar subjects. A data warehouse brings together all of those varying formats into a standardized and holistic view structured to optimize reporting. When that data is consolidated from across your organization, you can obtain a complete view of your customers, their spending habits, and their motivations.
You might people say “enterprise data warehouse” or “EDW” when they talk about data. This is a way to structure data that makes answering questions via reports quick and easy. More importantly, EDWs often contain information from the entire company, not just your function or department. Not only can you answer questions about your customer or marketing-specific topics, but you can understand other concepts such as the inventory flow of your products. With that knowledge, you can determine, for example, how inventory delays are correlated to longer shipping times, which often result in customer churn.
A data lake is a great option for organizations that need more flexibility with their data. The ability for a data lake to hold all data—structured, semi-structured, or unstructured—makes it a good choice when you want the agility to configure and refigure models and queries as needed. Access to all the raw data also makes it easier for data scientists to manipulate the data.
You want to get real-time reports from each step of your SMS marketing campaign. Using a data lake enables you to perform real-time analytics on the number of messages sent, the number of messages opened, how many people replied, and more. Additionally, you can save the content of the messages for later analysis, delivering a more robust view of your customer and enabling you to increase personalization of future campaigns.
So, how do you choose?
You might not have to pick just one solution. In fact, it might make sense to use a combination of these systems. Remember, the most important thing is that you’re thinking about your marketing data, how you want to use it, what makes sense for your business, and the best way to achieve your results.
Hopefully this information has helped you better understand your options for data ingestion and storage. Our next post about analytics and insights will help you leverage your data for deeper insight. In the meantime, feel free to contact us with any questions or to learn more about data ingestion and storage options for your marketing data.
Want better data insights and customer analytics?
Aptitive’s Marketing Analytics Starter Pack provides an easy way to get started or expand your current marketing reporting and analytics capabilities.
Sarah Shea is a Data Management and Analytics Consultant at Aptitive. In her role, Sarah enables companies to discover insights and access the information held by their data.