Chabane R. for Stack Labs

Posted on Jan 16, 2021 • Edited on Nov 12, 2021

Mayday, mayday! I need a scalable infrastructure to hybrid on Google Cloud! Part 2 - DevOps & Container migration

#googlecloud #architecture #devops #kubernetes

Hello again!

We saw in the part 1 how to build a scalable Google Cloud organization, implementing a hub and spoke topology and centralizing the Google Service Accounts.

In this part 2, we will talk about DevOps and migration of business applications in Kubernetes as an example.

DevOps

The Cloud is ruled by DevOps – this fact has been proven time and again. [1]

I highly recommend all my customers to adopt a DevOps culture and practices from the beginning.

Using a CI/CD tool becomes vital when you have multiples environments and a complex infrastructure configuration that depends on infrastructure as code.

Many DevOps tools exist in the market. Let's take Gitlab as an example.

Gitlab

With Gitlab, you could manage your own instance in your infrastructure or use the SaaS solution, gitlab.com.

In the SaaS solution, you can choose between:

shared runners that are managed by GitLab infrastructure and limited in minutes (Depending on your licence subscription)
specific or group runners that will be managed by your infrastructure with no limit of minutes.

If you choose the shared runners to deploy your resources in Google Cloud, you will need to create and save service accounts keys as variables in the Gitlab CI configuration. You will face all the security issues related to storing credentials outside of your cloud infrastructure: Key rotation, age, destruction, location, etc. Of course, you can use tools like Vault and Forseti but that will add additional tools to manage.

For production use, I recommend customers to use specific runners with jobs running in a GKE cluster.

The runner configuration that I had used for customers:

Specific runners only,
Deployed using Helm Chart, more easy to upgrade,
Accessibles by tags,
Locked to current project(s),
Ran only on protected branches (Production),
Assigned a Kubernetes Service Account,
Jobs run in separate nodepools per environment.

The GKE cluster that hosts those runners has its own VPC which is isolated from the internet with an explicit allow rule for gitlab ingress traffic on port 443.

If your organization has multiple teams, you can grant specific teams access to only the runners that they use:

GitOps

GitOps builds on DevOps with Git as a single source of truth for declarative infrastructure like Kubernetes. [2]

If you have the possibility to migrate your existing workloads to Kubernetes for example, you can take advantage of GitOps practices.

GitOps Tools

Many tools exist in the market like ArgoCD and FluxCD. Personally I consider ArgoCD as the most complete tool for GitOps but you will need to manage it by yourself in your infrastructure. One of my customer decided to use a GKE add-on Application Delivery (beta) instead of ArgoCD.

Application Delivery allows you to:

Manage configurations for multiple environments using a Git repository as a single source of truth.
Review changes before deployment through pull requests on GitHub or Gitlab.
Test and promote changes across different environments.
Roll back changes quickly.
See your applications' version and status on the Google Cloud Console. [3]

GitOps Integration

The common patterns used to integrate GitOps practices is creating multiples repositories instead of a unique source.

Image repository dedicated for the business application. The dockerfile(s) will reside in this repository.
Env repository dedicated for the deployment in Kubernetes. The kubernetes manifests will reside in this repository.
Infra repository dedicated for the deployment in Google Cloud. The terraform plans will reside in this repository.

Let's illustrate that with a diagram:

1 - We start by deploying the GKE cluster using terraform and infra repo:

The gitlab runner job has a KSA (Kubernetes Service Account) which is bound with a GSA (Google Service Account) with appropriate permissions in the Business project.

2 - Build a new docker image after each git tag:

The new docker image is built and published in a centralized docker registry.

Share the specific runner registered in infra repo with the image repo.

3 - Edit the docker image version of Kuberentes manifests using Kustomize and trigger the env repo pipeline from the image repo pipeline:

The kubernetes workloads are updated with the new docker image version using a GitOps tool like ArgoCD.

Share the specific runner registered in infra repo with the env repo and lock the runner to the current Gitlab projects.

4 - Authorize the kubernetes cluster of the business project to access docker images from the DevOps project:

We can easily develop a generator that create for each new business project a blueprint of gitlab repositories and pipelines. I had the chance to build a generator like this for a customer using Yeoman. It was a really cool challenge.

Business applications

Let's say we have docker images hosted on premise and we want to deploy them in a Google Cloud Services.

There are two ways to achieve that:

Lift & Shift in Compute Engine, if we had a self-managed docker platform.
Improve & Move in App Engine, Cloud Run or Google Kubernetes Engine (GKE).

Unless there is a strong dependency between the docker runtime and the business applications, there is no value to go to the Cloud if you don't use a managed services for Docker.

App Engine Flexible can be a good choice if you need a platform as a service without caring on cost usage.

Cloud Run is currently ideal if your application is internet facing and does not depend on network or security restrictions (Unless you have an Anthos license to use Cloud Run for Anthos).

GKE is still the most used to deploy docker images, as it's fully managed by Google and highly maintained and secured by the cloud provider.

Configuring a GKE cluster

In a cloud hybrid architecture with a frequent access to on-premise workloads from the GKE cluster, I apply the following configuration to secure the connectivity:

Stable version as possible,
Private master and private nodes to avoid any public access,
Regional cluster for production workloads,
Nodepools with auto scaling enabled,
Network Policy enabled to manage network traffic,
Workload Identity enabled to avoid using Google Service Account Keys,
Config Connector if the project depends mostly on Kubernetes,
External Load balancer with Cloud CDN enabled for internet facing applications + Cloud Armor when necessary,
Identity Aware Proxy for internal applications.

This configuration ensure high availability and a secure communication between Google Cloud and On premise.

Kubernetes architecture

To migrate a 3-tiers architecture in GKE, I commonly deploy this architecture in Kubernetes:

For Non production

For production

Depending on the use case, you can deploy additional features like Istio.

Docker images

When you migrate existing docker images in the cloud, it's not easy to optimize the existing layers for fear of breaking something.

What I recommend for all existing images is to :

Properly tag images (semantic versioning and/or Git commit bash),
Use Cloud Build to build and push images,
Use Container Registry to store the images.

For new images:

Package a single app per container,
Build the smallest image possible,
Optimize for the Docker build cache,
Remove unnecessary tools,
Carefully consider whether to use a public image,
Use vulnerability scanning in Container Registry. [4]

Cost Savings tips

When you move existing workloads in a Cloud Provider, you can save money in two differents ways:

Cost-optimization: Running the workloads in the right service with the right configuration.
Cost-cutting: Removing deprecated and unused resources.

To apply a cost-cutting for customers, I have developed cloud functions that:

Stops all resources that are not Serverless on non production environments like VMs and SQL databases,
Resizes GKE nodepools to zero on non production environments ,
Removes old docker images.

For new workloads, the best way to save money is to use Serverless services as possible like firebase.

Conclusion

It took me around 20 days the first time I deployed such architectures. However, after applying these principles, the ensuing implementations took just few days to deploy them.

There are other subjects I would have liked to give feedback on: Monitoring, Data Analytics and AI implementation. We can keep it for a part 3.

In the meantime, if you have any questions or feedback, please feel free to leave a comment.

Otherwise, I hope it helped you to see how automating everything via CI/CD and using infrastructure as code anywhere will ensure you to have a scalable strategy to hybrid on GCP whatever is your business.

By the way, do not hesitate to share with peers 😊

Documentation:

[1] https://www.idexcel.com/blog/true-business-efficiency-combines-the-power-of-cloud-computing-and-devops-practices/
[2] https://www.weave.works/blog/gitops-modern-best-practices-for-high-velocity-application-development
[3] https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/application-delivery
[4] https://cloud.google.com/solutions/best-practices-for-building-containers#carefully_consider_whether_to_use_a_public_image

Reviewers

Thanks Ezzedine for your review 👊🏻

DEV Community