In part 1 of our modern data warehouse series, we laid out the benefits of a data warehouse and highlighted how it can help to centralize and manage your data. In part 2, we’ll be comparing modern data warehouse options.
To remain competitive, organizations are increasingly moving towards modern data warehouses, also known as cloud-based data warehouses or modern data platforms, instead of traditional on-premise systems. Modern data warehouses differ from traditional warehouses in the following ways:
- There is no need to purchase physical hardware
- They are less complex to set up
- It is much easier to prototype and provide business value without having to build out the ETL processes right away
- There is no capital expenditure and a low operational expenditure
- It is quicker and less expensive to scale a modern data warehouse
- Modern cloud-based data warehouse architectures can typically perform complex analytical queries much faster because of how the data is stored and their use of massively parallel processing (MPP)
Modern data warehousing is a cost-effective way for companies to take advantage of the latest technology and architectures without the upfront cost to purchase, install, and configure the required hardware, software, and infrastructure.
Comparison of popular modern data warehouses- Click Image to Enlarge
Comparing Modern Data Warehousing Options
- Traditional Data Warehouse deployed on (IaaS): Requires our customers to install traditional data warehouse software on computers provided by a cloud provider (Azure, AWS, Google, etc.).
- Platform as a service (PaaS): The cloud provider manages the hardware deployment, software installation, and software configuration. However, the customer is responsible for managing the environment, tuning queries, and optimizing the data warehouse software.
- A True SaaS data warehouse (SaaS): In a SaaS approach, software and hardware upgrades, security, availability, data protection, and optimization are all handled for you. The cloud provider provides all hardware and software as part of its service, as well as aspects of managing the hardware and software.
With all of the above scenarios, the tasks of purchasing, deploying and configuring the hardware to support the data warehouse environment falls on the cloud provider instead of the customer.
IaaS, PaaS, and SaaS – What is the best option for my organization?
Infrastructure as a service (IaaS) is an instant computing infrastructure, provisioned and managed over the internet. It helps you avoid the expense and complexity of buying and managing your own physical servers and other datacenter infrastructure. In other words, if you’re prepared to buy the engine and build the car around it, the IaaS model may be for you.
In the scenario of platform as a service (PaaS), a cloud provider merely supplies the hardware and it’s traditional software via the cloud, the solution is likely to resemble its original, on-premise architecture and functionality. Many vendors offer a modern data warehouse that was originally designed and deployed for on-premises environments. One such technology is Amazon’s Redshift. Amazon acquired rights to ParAccel, named it Redshift, and hosted it in the AWS cloud environment. Redshift is a highly successful modern data warehouse service. It is easy in AWS to instantiate a Redshift cluster, but that’s where it ends. It still requires you to complete all of the administrative tasks. You have to reclaim space after rows are deleted or updated (the process of vacuuming in Redshift), manage capacity planning, provisioning compute and storage nodes, determine your distribution keys, all of the things that you had to do with ParAccel or with any traditional architecture, you have to do with Redshift.
Alternatively, any data warehouse solution built for the cloud using a true software as a solution (SaaS) data warehouse architecture allows for the cloud provider to include all hardware and software as part of its service as well as aspects of managing the hardware and software. One such technology, which requires no management and features separate compute, storage, and cloud services that can scale and change independently, is Snowflake. It differentiates itself from IaaS and PaaS cloud data warehouses because it was built from the ground up on cloud architecture. All administrative tasks, tuning, patching, and management of the environment falls on the vendor. In lieu of the architecture we have seen with IaaS and PaaS solutions in the market today, Snowflake has a new architecture called a multi-clustered shared data that essentially makes the administrative headache of maintaining solutions, like Redshift, go away.
If you depend on your data to better serve your customers, streamline your operations, and lead (or disrupt) your industry, a modern data platform built on the cloud is a must have for your organization.
Follow us on LinkedIn to be the first to see the rest of the blogs in this series or contact us for a complimentary whiteboarding session to learn what a modern data warehouse would look like for your organization.
Jason Maas is the COO of Aptitive and head of the Data Practice. He is a seasoned technology leader with experience spanning several industries such as transportation, accounting, insurance, retail, software, healthcare, and financial services.