Nick Jetten | 5 June 2024

SnappCar: Redesigning the MLOps of a carsharing platform

Hitting two targets with one arrow at SnappCar

New MLOps setup leads to significant cost-reduction and 50% efficiency gain

About Snappcar

SnappCar is a Dutch company that operates a peer-to-peer car-sharing platform, allowing individuals to rent out their vehicles to others in their community. Founded in 2011, SnappCar aims to promote sustainable mobility by reducing the number of cars on the road through efficient sharing practices.

The challenge of scalability

In the summer of 2023, the data team of SnappCar was facing two scalability challenges. On the one side the growth of the company meant an ever-growing data volume and at the same time there was a need for an improved version of the recommender (SnappRank). Given its experience with bringing a variety of scale-ups from an initial pragmatic solution to a scalable and future-proof infrastructure, Enjins got involved to help tackle these challenges.

Identifying key bottlenecks at the ML Audit

To grasp the strengths, weaknesses, and opportunities of the initial situation, Enjins always assesses the current state of the data and ML landscape. Subsequently, a tailor-made solution that considers the needs of the client is delivered. In the case of SnappCar, the need was two-fold:

  1. The existing data warehouse could not keep up with rising data volume.
  2. A different MLOps setup was required to improve ownership and the time-to-production of models.

Given the data team’s limited size, priority during the ML Audit was on maximizing time for business value creation and hence limiting the effort on infrastructure maintenance. Therefore, we recommended a best of breed setup with mostly managed solutions.

Azure SQL to Snowflake: Elevating scalability and optimizing costs

In the initial stages, SnappCar chose the flexible and all-round Azure SQL for its data warehousing needs. However, as data volumes surged, performance bottlenecks emerged. The strategic shift to Snowflake’s optimized architecture promised efficiency gains. Its usage-based pricing model offered potential cost savings, unlike Azure SQL database’s pricing that scales compute and storage together.

"Embracing Snowflake restores the timely accessibility of our data. Our time spent waiting for queries to run has reduced by at least 50%."

Introducing DBT and Airbyte to Streamline Ingestion and Transformations

In SnappCar’s previous setup, the ingestion and transformations were handled by Airflow (scheduling custom Python tasks) hosted on a Kubernetes cluster. However, there were two key downsides of this approach:

  1. Whilst Airflow gives a nice overview of the task schedule, it offers little insight into the actual content of the task. For example, changing a task proved to be time-consuming and error-prone.
  2. The maintenance of Kubernetes and Airflow required specific competencies which resulted in narrow ownership.

To overcome these issues, we designed an architecture that replaced Airflow by Airbyte. As a managed ingestion service, it inherently reduces maintenance. Moreover, it maintains cost-effectiveness through a pricing model that scales differently for APIs and databases. Dbt was included to offer a structured framework for capturing transformations, thereby reducing implementation time. Later on, it also facilitates easier modifications to these transformations.

"Dbt was a game-changer in taking control of our data processes. Now we have much more control over our data flows, thanks to higher data quality and improved governance."

Re-designing the MLOps architecture: Enhancing Ownership and Impact

The first version of the SnappRank model was successfully deployed using a Kubernetes cluster, which performs excellently with regards to response time. The problem here was that the data team had a large dependency on the back-end team. The back-end team managed the Kubernetes cluster on which the SnappRank was deployed. Therefore, data scientists where not enabled to autonomously deploy new versions of the model, leading to a time-to-production of months rather than weeks.

Recognizing the limitations in the existing model deployment and monitoring structure, the data team sought a solution that would not only enhance efficiency but also grant them greater ownership of the process. First, Deeploy was suggested to serve and monitor SnappRank in production instead of Kubernetes, enabling the development of models within weeks instead of months. Deeploy improved the ownership over productionized models by giving the team an UI to deploy and monitor models.

"We have been able to analyse the performance of the model from a data science perspective and relative to our business goals in a way that we couldn't previously. And when we found a few immediate faults of the model we were able to iterate quickly and independently of the product team."

Through the incorporation of dbt, it would be much more feasible to provide models with offline features. Since Snowflake is not optimized for instant data retrieval, Redis was recommended to act as a feature store for the offline model features.

Development Phase: building the fundament, navigating towards results

To implement the plan outlined in the Audit, we agreed on an intense collaboration at first and scaling down over a period of three months to be as effective and swift as possible. Within a 3-week timeline, all the tools described in the above set-up were established and the majority were actively in use. Most of the infrastructure was set up together in the initial collaboration phase, connecting all the tools described above and have data flowing through the whole platform within a few weeks. After this initial phase, Enjins’ role adjusted to actively sharing knowledge and best practices with SnappCar. This enabled them to independently extend the platform’s functionalities in the future and continue moving swiftly towards the best solution. In practice, this was done through on-the-job advises, Q&A sessions, online (training) content, and sharing the Enjins’ in-house training modules.

Implementation brought unforeseen challenges, such as an unexpected surge in data volume, which affected the reliability and cost-effectiveness of the Airbyte setup. In response, the collaboration between the two companies demonstrated agility and problem-solving capability. We adopted a hybrid approach, leveraging Managed Airflow in AWS for substantial data ingestion. Only the sources with complex integrations were handled by Airbyte. By implementing this strategy, Snappcar effectively balanced cost-efficiency with accessibility. After working with this hybrid set-up for a while, the choice was made to simplify the architecture by focusing on Managed Airflow for all the data ingestion.

"Adaptability is key. We still managed to have an easy-to-use solution, but additionally, we have seen a 22% reduction in architecture costs."

Strategically and pragmatically accelerating business value

The successful conclusion of the project marked more than a technological transformation. SnappCar’s new data and MLOps infrastructure not only accelerated ownership within the data team, but also positioned the platform for seamless scalability in the future. We saw immediate returns on our efforts, reducing costs simultaneously with halving the time spent waiting on queries. Specifically, the platform enabled analyses on searching behavior, which was previously impossible due to endless query times.

Using the MLOps infrastructure that was set up in collaboration, the SnappRank model was successfully deployed and being monitored by the data team. In practice, the new set-up reduces time-to-production and improves the quality of the model, because issues are identified and iterated upon much faster. At the moment of writing this blog, the learnings and designs applied to the SnappRank model are being used for deploying a new use case on the topic of Dynamic Pricing. This shows the success of enabling SnappCar to continue using the gained internal knowledge and capability within MLOps.