Analyzing raw data without a singular, standardized format is as fruitful as trying to understand all 193 UN delegates shouting in their native tongues. Something important is being said, but good luck figuring out what that is. But reformat that raw data and shift them from their disparate sources into a single data warehouse, and the message rings through as clear as a bell.
That is the benefit that Extract, Transform, Load (ETL) processes provide to organizations. Yet before you can access the hidden patterns and meanings in your data, you need to decide how you want to acquire your ETL tool: build one from scratch or buy an automated solution. Here’s what to consider as you make your decision.
What’s the Project Size?
Often, a small project scope with simple data flow benefits from a custom build, allowing your organization to calibrate your ETL tool to your precise needs and spend less in the process. Small shops may have fewer technical resources, but they will spend as much time integrating a pre-built ETL tool as building up simple data flows from the ground up.
When the scope is a massive enterprise-level ETL framework, it makes more sense to engage with a preexisting ETL tool and accelerate your analytics timeline. Even then, we recommend a data management partner experienced in ETL processes, one that’s done the technical work of hooking the sources together and transforming the data numerous times. They know the accelerators to get your program up and running enterprise-wide.
What Technology Are You Using?
Your current tech stack is always a consideration. For example, if you prefer open source technology or depend on a web of legacy systems for daily operations, building your own system eliminates the worry that your integration won’t work. Building your ETL program is also a preferred option for organizations with a custom or niche development environment that aims to use fewer computing resources or accelerate your performance.
On the other hand, GUI environments that value ease-of-use are better suited for buying their ETL program. For example, we had an online ecommerce client with a few internal technical resources. They understood their current state and their source systems but did not want to deal with setting up the actual workflows. In that scenario, we determined that integrating a preexisting ETL solution into their ecosystem would help their team to load data and run reports more effectively.
What’s the Shelf-Life of Your Proposed Solution?
How long you’ll use a specific ETL solution has significant influence on the decision to build or buy. If you need a quick-and-dirty, one-off load, it doesn’t make sense to invest $15,000+ a year for an ETL solution. If you have resources capable of scripting an ad hoc solution, utilizing their talents will achieve faster results.
On the other hand, companies that need a scalable or long-term strategic solution tend to lean toward a prepackaged ETL tool. Due to the evolving data sources and streams available in these organizations, a preexisting ETL tool in which ongoing development and integration is handled by the vendor is ideal. The only major challenge is ensuring that your team is maximizing your investment by using the full capacity of your vendor’s ETL solution. Fortunately, it’s a feat that’s more manageable if you work with a technical consulting partner like Aptitive.
What’s Your Budget?
This one is a little deceptive. Though there is an initial investment to building your own solution, you lack the ongoing subscription and the initial integration cost that is often overlooked in preliminary estimates. Additionally, buying ETL solutions often means that you’ll be charged per source system being transformed and loaded into your data warehouse. So depending on the number of disparate sources and volume of data, the build option is a good way to avoid overspending on data ingestion.
Though large enterprises will still end up paying these costs, they can justify them as a trade-off for greater traceability and cataloging for the sake of compliance. The ability to track business data and smoothly conduct audits is more than enough for some organizations to defend the elevated price tag. Especially if those organizations are in the healthcare or financial sector.
Who Will Manage the ETL Process?
Control is a significant consideration for plenty of organizations. For those who want to own the ETL system, building is the right choice. Often, this makes the most sense when you already have a custom infrastructure, legacy storage system, or niche analytics needs.
Yet not every organization wants to divert attention from their primary business. Let’s say you’re a healthcare organization that wants to build a comprehensive data warehouse from a myriad of data sources while still maintaining compliance. Trusting an experienced vendor removes a considerable amount of risk.
Do You Need Flexibility in Your Analytics?
What types of reports will you be running? Standard ones for your industry or business environment? Or reports that are particular to your own unique needs? Your answer heavily influences the choices you make about your ETL tool.
If you feel your demands upon a data warehouse will be uncommon, then building is the ideal choice. That way, your reporting isn’t curtailed to fit a preconceived notion of your needs. Hand-coding your own ETL program enables you to write scripts for whatever schemas or parameters you had in mind. The only limitation is your own technical capability or that of your data management consulting partner.
If performance outranks customization, buying an ETL tool like Attunity, Talend, or others is the superior option. As we’ve said before, you’ll lose some level of flexibility and back-end control, but these enterprise-level ETL solutions allow you to gather, cleanse, and refine data with very minimal effort. Who said data transformation needed to be difficult?
Do You Have Access to Technical Experts?
Effective ETL processes require a skilled workforce to deliver maximum results. Even more, that workforce needs to know how to build a data warehouse. You either need internal resources, a data management partner, or a proficient solutions provider involved in the development, auditing, and testing processes.
Internal resources allow you to build and launch your own ETL program with their ability to hand-code scripts and manage data workflows. Additionally, you don’t need to hire outside resources to monitor ongoing performance or troubleshoot issues. The trade-off is that their work on your ETL solution and data integration can divert their attention from long-term strategic projects or operations. An effective compromise is having an internal resource take ownership of the project and outsource the scripting, loading, and data migration to a technical partner.
For organizations without spare technical talent, buying a prepackaged ETL tool simplifies a portion of the initial technical investment. However, most organizations still need assistance with current state audits to verify all the source systems, hands-on integration support to get reporting up and running, and training on the new reporting processes. Choosing the right technical consulting partner enables you to deliver results in reasonable timetables without hiring new IT talent to handle the ETL process.
The advantage of a data management partner like Aptitive is that we’re proficient in both build and buy situations. If you decide to build, we can help with the scripting and create a support team. If you decide to buy, we can help integrate the tool and teach your internal team how to maximize all of the ETL tool’s features. That way, you can prioritize other more strategic and/or inflexible considerations while still implementing your disparate data sources into a single data warehouse.
Can You Provide Internal Training?
What happens after the implementation? Will your team be able to grab the baton and confidently sprint forward without any impediments? Or are you at risk from the “bus factor”, where one person getting hit by a bus shuts down your total knowledge of the ETL solution? The success of both building an ETL platform and buying a cloud-based subscription depends on the effectiveness of the associated training process.
Going with a custom build means you’re dependent on your own knowledge sharing. You may encounter a bottleneck scenario where only resourceful employees will understand how to run reports after ETL processes are conducted. And if a tool is time-consuming or frustrating, you’ll struggle to encourage buy-in.
However, with a purchased ETL tool, resources outside of your team should easily understand the logistics of the workflows and be able to support your system. Your organization can then recruit or contract staff that is already familiar with the technical function of your tool without painfully reverse-engineering your scripting. Beware, though! You will encounter the same problems as a built system if you integrate the tool poorly (don’t just write custom scripting within your workflows if you want to get the benefits from a purchased option).
The right data management partner can avoid this situation entirely. For example, the Aptitive team is skilled at sharing organizational process changes and communicating best practices to users and stakeholders. That way, there’s no barrier to usage of any ETL tool across your organization.
Whether you build or buy your ETL tool, Aptitive can help you implement the right solution. Schedule a whiteboard session to review your options and start on the path to better data analytics.
Greg Marsh is a Data Engineer Manager at Aptitive. In his role, Greg facilitates the discovery of business insights from data. From “Big” data like IoT streams or classic relational ERP information, Greg helps companies to unlock the power of their data.