Lifesight Migrates To GCP To Modernize Its Infrastructure Leveraging Google BigQuery And Google Kubernetes Engine
Lifesight partnered with Searce to move to Google Cloud Platform from a different cloud provider, modernizing infrastructure along the way and adopting serverless and managed Big Data services.
- Lifesight helps brands, agencies, and enterprises understand and measure real-world consumer behaviour
- Lifesight processes 4.6 billions of data points on an average day, covering 14 mn places of interest
Lifesight’s Agenda to use Google Cloud Platform (GCP) for Reducing the Average Query Processing Time
Lifesight wanted to adopt a data platform that supports building a Data Warehouse for external and internal use cases, running ad-hoc queries on data sitting in storage using a powerful querying engine, data transformation and cleaning in a faster and effective way using a spark or traditional MapReduce based pipeline. Through a Data Strategy workshop for assessing the Lifesight’s current data pipelines and other workloads, Searce proposed a set of optimizations that could be achieved through the Google Cloud Platform (GCP). Subsequently, a POC was performed to replicate 2 of the Lifesight’s pipelines with 3-months’ data of 40 TB on GCP and validated the performance. The average query processing time was reduced by 75% with BigQuery. As a result of this successful workshop, the Lifesight team realized the huge opportunity that lied in migrating and optimizing their data engineering workloads, and hence decided to move ahead with the help of the Searce team.
Lifesight’s Intention to Adopt a Data Platform
- Support in building a Data Warehouse for the company’s external and internal use cases
- Support data visualization via both traditional tools like Tableau / Power BI / Metabase / Superset and chart.js based implementation
- Support ad-hoc queries on current data sitting in storage using a powerful querying engine
- Support data transformation and cleaning in a faster and effective way using a spark or traditional MapReduce based pipeline
- Support querying of Geospatial data in a fast and effective manner as most of the data in the current system is Lat/Long based
Lifesight Data Engineering Platform Requirement to scale Analytical Data
Lifesight wanted a data engineering platform that was secure, durable, and highly scalable for its analytical data. The Searce team used BigQuery to build a data warehouse and migrate the EMR data processing frameworks to handle the analytical processing part. Additionally, the application stack was moved to GKE for seamless scalability and better usage of the underlying infrastructure. Simultaneously, all the corresponding services like S3, Route53, SES, ElasticSearch were migrated to the corresponding GCP alternatives.
Broadly, as a part of the migration process, the following tasks were performed:
- Built an auto-scale GKE platform cluster environment to host containerized applications identified during the workshop
- Defined/Implemented CI/CD on the new recommended platform using GCP services
- Migrated EMR Data Processing frameworks to BigQuery on the identified sources during the workshop
- Defined native schema using GCP services to integrate with the identified current application environment
- Migration of active data from MySQL, Aerospike, Elasticsearch, Kafka, Stored Data migration from S3 to GCS
- Setting up domain names similar to Route53, Implement Sendgrid/Mailchimp as SES alternative
The Business Impact of using BigQuery, GKE and serverless GCP services
- Robust Data Platform – With BigQuery as the Data Warehouse, Lifesight could now analyze Petabytes of data within a matter of minutes
- Easy Insights – Reduced complexity and the overall time to gain deeper insights
- Reduced Manageability – Through GKE, BigQuery and other serverless GCP services, the overhead operations were reduced to manage the data platform
- Better and Faster Query Time – The average query processing time was reduced by 75% due to BigQuery, and Ad Hoc queries could now be easily performed.