Building A Cloud-Based Data Warehouse
At Aptitive, we recently had the opportunity to help one of our Chicago clients, a large property management conglomerate, implement the foundations of a cloud-based data warehouse. The solution involved integrating multiple source systems into a Snowflake environment. In the form of a short, four-week proof of concept, we accomplished this by using Alooma as a bridge between the source systems and Snowflake.
The Sources and Alooma
For the purpose of this POC, we focused on the integration of two particular sources: a PostgreSQL database and a non-standard application API. The PostgreSQL database had a built-in connector in Alooma, which facilitated a pretty quick and easy set-up. We were able to get all of the PostgreSQL data flowing in within a day.
We really put Alooma to the test, though, as efforts heightened during the API integration. Though Alooma has a built-in API connector, it is limited to REST API sources. Our source provided a more complex API, which demanded a work-around. Put simply, we created Lambda functions in AWS to contact the API, access the necessary data, and pass it to Alooma in batches using the Python SDK. Though this process was technically involved, it was relatively painless and there were substitutes (such as a webhook) available to us.
The “Code Engine” and the “Mapper” in Alooma facilitated the transition of the data into Snowflake. They allowed us to do some minimal, yet powerful, transformations and enable the flow of the data into Snowflake more effectively. For example, JSON from the API came in at varying granularities. There were two levels of granularity, both providing related but different data. Based on certain metadata provided by Alooma, we were able to add custom Python to the Code Engine and break the events up into the structure we needed for Snowflake to ingest. After making a rookie mistake and attempting to manually map all of our sources in the Mapper, we elected to “auto-map” all of our sources to their appropriate destinations in Snowflake. This process was much smoother and more efficient than the manual mapping.
Once the data made its way through Alooma, there was little management that had to take place to see it on the Snowflake side. All of our “raw” tables came over alongside the metadata and log tables provided by Alooma. Though the seamless transition was much appreciated, an even more powerful component of the process was understanding all of the metadata that came with it. We took advantage of these fields to enable some auditing and assurance that our data was coming over properly. It also laid the foundation for a true change tracking and audit framework, which we are currently building out to full form. A key advantage of using these components over most ETL tools is the amount of control and customization it enabled.
After bringing the data into Snowflake, we separated the data by schema and source system. Next, we used views to create a transformation layer where we manipulated all of the raw layer tables into a more appropriate structure. We then created a pseudo dimensional layer (also in views) on top of these views. Disclaimer: This approach was quick and easy but was solely the product of a very short timeline. The computation Snowflake provided allowed for the views to be performant to an extent, but the most evident limitations lied in the lack of stored procedures, scheduled queries, and materialized views. Provided more time, we would recommend a more performant solution for the transformations, such as a separate ETL process.
Overall, the solution was effective and only viable in the timeframe we were given because of Alooma and Snowflake’s simplicity. Though there is still some area for technical advancement, the two allow for a very quick set-up and movement of data, with more control than most tools on the market allow. Furthermore, with the latest announcement of Snowflake’s availability on Azure, the solution is viable for even more ecosystems. Through our experience on this project and related projects, it has become evident that the combination of Alooma and Snowflake is a powerful pairing. We encourage you to consider these SaaS data platforms and find out how the implementation of a quick yet robust cloud data warehouse can help streamline some of your business processes and provide more informative business insights.
If you’re looking for help building a cloud-based data warehouse, Aptitive’s team has the experience and certifications to get the job done right. Read our eBook to learn more about Snowflake deployment best practices or contact us today to schedule a complimentary consultation.
Ashley Pradhan is a Data & Technology Consulting Manager at Aptitive. In her role, Ashley helps clients to find valuable insights through the entire data management lifecycle including, strategy, design, architecture, ETL, custom development, business intelligence, reporting, analytics, and visualizations.