THE CUSTOMER
The client runs a multi-tenant eCommerce platform on the front. The unique aspect was that any store could also list down products from other vendors in an automated fashion and earn through affiliate marketing. The products from other vendors were crawled by a Robot.
The Project
Overview
Problems
- The infrastructure was chaotic, there were 100s of EC2 instances being utilized for just basic operations of the system.
- The infrastructure was not scalable and the increased load was not based on usage.
- Containerization was not being followed properly, bloated/insecure containers, neglected version control, unoptimized for specific environments, hindering efficiency and security.
- No orchestration, containers were deployed without a proper management system, leading to manual handling of scaling, service discovery, and load balancing, reducing efficiency exponentially.
- No proper use of CI/CD, DevOps best practices in the infrastructure implementation and deployments.
- The system was NOT designed for cost efficiency, hence cost optimization was a huge factor.
- The code was organized haphazardly in separate repositories.
- Deployment used to take multiple manual steps and was a complicated process.
Issues in Basic Infrastructure
- The infrastructure was chaotic, there were 100s of EC2 instances being utilized for just basic operations of the system.
- The infrastructure was not scalable and the increased load was not based on usage.
- Containerization was not being followed properly.
- No orchestration.
- No CI/CD, Monitoring, etc.
The Cost Factor
The cost was a major concern as the company previously got a huge amount of funding, and so the former team did not design the system in a cost-effective way.
Code organization and CI
-
The code was organized haphazardly in separate repositories.
- Deployment used to take multiple manual steps and was a complicated process.
FOLIO3 SOLUTION
Moved to Kubernetes
Folio3 helped in moving the infrastructure to Kubernetes using the community tool KOPS.
Helm Charts
Using helm charts with k8s, we made sure that the complete solution rolls out in an integrated fashion. Moreover, we were able to maintain and organize the YAML files properly. Helm also made passing down variables and settings easy, which was very helpful in deployment for different environments.
Autoscaling
With Kubernetes, it was easy to implement auto-scaling as compared to directly on AWS. We implemented HPA, which would add another pod to the existing cluster as the need arises. This would ensure only the required number of CPU resources are being used.
We also implemented Cluster Autoscaling, which checks if the CPU resources are not being used and if the size of the cluster is reduced (The number of EC2 instances in the cluster reduces). Similarly, with autoscaling, if any pod fails to launch due to resource constraints, the cluster scales up automatically.
CI/CD
We implemented CI using Jenkins. Dynamic parameters from Jenkins would be passed onto helm charts after which the complete project upgraded version would be launched.
Monitoring
We used Prometheus with Grafana for basic monitoring of the cluster. We also used an Alert Manager that would send out alerts for any unexpected occurrences. Pods and clusters were scaled based on metrics from Prometheus.
Huge Reduction in Cost
With these changes in the infrastructure, we achieved more than 60% savings in the overall bill of AWS. Furthermore, the plan was to use Spot Instances once our microservices are optimized which will reduce the bill to 10-15k. This is huge savings of up to nearly 80% when compared to the old architecture.