Ruben Hakopian
Ruben Hakopian
Aug 8, 2019 7 min read

High-Power CI/CD Pipeline For A Fraction Of The Cost

thumbnail for this post

Are you using a managed CI/CD pipeline? Builds are taking too long, yet CI/CD costs a fortune? Then continue reading to learn how we overcame those challenges at Berlioz.

Background

We are a SaaS company and run our services on AWS and used a 12-core Jenkins server to run production deployment. Our build process consists of three significant steps: build & packaging, artifact upload, and orchestration.

Since we are heavily AWS dependent, it is almost impossible to run the services locally. During active development, we had to
build, upload, and run orchestration from 8-core laptops.

We had the following challenges:

  1. Build & packaging stages are very CPU intensive and took anywhere from 8 to 12 minutes. Performance is heavily dependent on the machine
  2. Artifact upload & orchestration took 20 minutes
  3. Managing the Jenkins server on-premises was another headache

Leading to the following implications:

  • PSU on Jenkins server got burned recently, and we could not release for three days
  • Active development became too slow. We had to take too many coffee breaks,
    because builds were taking more than 30 minutes
  • It was impossible to work on the go. Deployment from the hotel took 2 hours

It was apparent that we had to move to a SaaS CI/CD pipeline.

Migration to CI/CD

The choice of SaaS CI/CD providers is very diverse, and after evaluating three, we decided to go with Codefresh. We liked their feature-set, UI, and responsive support.

One thing we realized with any CI/CD SaaS was that high-power machines (16 cores or higher) are not always available. Whenever they are - the price was astronomical. So, we decided to improvise.

Instead of running heavy-lifting pipeline steps within the pipeline itself, we trigger build, packaging, upload, and orchestration steps in short-lived AWS EC2 instances. This provides full freedom of choice for CPU, Memory, Network bandwidth, storage and of course cloud region. It is needless to say that such an approach also provides more than tenfold reduction in overall CI/CD bill.

Below is the screenshot of our CI/CD pipeline. To be more precise - the deployment stage of the pipeline (unit tests, pre-deployment, post-deployment tests, branch release, etc are omitted). We are going to cover our entire dev flow process in-depth in a separate article. If you are building a SaaS, I’d highly recommend subscribing to our newsletter here.

Codefresh Pipeline

As you can see, we managed to squeeze the actual deployment to under 3 minutes - a significant improvement from our previous 30 minutes deployment pipeline. We are using 16-core AWS instances there. This step can be further reduced to 2 minutes flat by optimizing this initial step and moving to a 32-core machine.

How Does It Work

Here comes the fun part. Management of EC2 instances was not a trivial task, and there was a significant level of automation required to make it work end-to-end.

We are using something called AWS Systems Manager Agent (SSM Agent). It allows communicating with EC2 instances and running shell commands on them. SSM has its limitations. It cannot return command output longer than a few lines. For such purposes, an S3 Storage Bucket needs to be used to store command output.

To make it usable, we developed tools to automate and hide all that complexity. These tools are open-sourced and are available to the public for free to use: https://github.com/berlioz-the/aws-cool-cli

Finally, let’s get technical! For EC2 instances, we assigned the following IAM policy roles below:

AmazonEC2RoleforSSM

and

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::<the-bucket-name-goes-here>/*"
        }
    ]
}

This is enough to allow SSM to run commands on EC2 instances and upload logs to S3 bucket.

Now, let’s go over every single pipeline step.

Step 1. Find EC2

The first step is to identify the EC2 instance to run the commands on. Instead of InstaceIDs, we would use instance names. The aws-cool-cli-ec2-find-by-name program returns the InstanceID for the specified name. Command Usage:

aws-cool-cli-ec2-find-by-name 
    --name <instance-name>
    --region <region>
    --profile <aws-profile-name>

You can also provide –key and –secret as and input instead of the –profile.

Inside the pipeline:

INSTANCE_ID=$(aws-cool-cli-ec2-find-by-name --name $INSTANCE_NAME --region us-west-2 --profile deployer)
echo "INSTANCE_ID=$INSTANCE_ID"
if [[ -z "${INSTANCE_ID// }" ]]; then
    echo "ERROR. Instance with name $INSTANCE_NAME not found.";
    exit 1;
fi
instance_count=$(echo "$INSTANCE_ID" | wc -l);
if [[ $instance_count -ne "1" ]]; then
    echo "ERROR. Expecting to have 1 instance, found $instance_count"; 
    exit 1; 
fi
Step 2. Check EC2

We verify that the instance found in the previous step is stopped. Command Usage:

aws-cool-cli-ec2-state 
    --instanceId <instance-id>
    --region <region>
    --profile <aws-profile-name>

Inside the pipeline:

state=$(aws-cool-cli-ec2-state --instanceId $INSTANCE_ID --region us-west-2 --profile deployer)
echo "Instance State=$state"
if [[ "$state" != "stopped" ]]; then
    echo "Something wrong with Instance $INSTANCE_ID. Expending instance to be stopped, but it is $state";
    exit 1;
fi
Step 3. Launch EC2

We run the instance. Command Usage:

aws-cool-cli-ec2-start 
    --instanceId <instance-id>
    [--wait]
    --region <region>
    --profile <aws-profile-name>

Inside the pipeline:

aws-cool-cli-ec2-start --instanceId $INSTANCE_ID --wait --region us-west-2 --profile deployer
export state=$(aws-cool-cli-ec2-state --instanceId $INSTANCE_ID --region us-west-2 --profile deployer)
echo "Instance State= $state"
if [ "$state" != "running" ]; then
    echo "Something wrong with Instance $INSTANCE_ID. Expending instance to be running, but it is $state";
    exit 1;
fi
aws-cool-cli-ssm-wait-ready --instanceId $INSTANCE_ID --region us-west-2 --profile deployer

In this step we also used aws-cool-cli-ssm-wait-ready tool to wait the ssm agent to be up and running. Command Usage:

aws-cool-cli-ssm-wait-ready
    --instanceId <instance-id>
    --region <region>
    --profile <aws-profile-name>
Step 4. Clone Repo

In order to clone the git repository, perform build or run any other command, we would use the following tool below. It will block the terminal until completion and stream the standard output into the pipeline.

aws-cool-cli-ec2-run-command
    --command <shell-command>
    --task <task-name>
    [--timeout <timeout-in-seconds>]
    [--s3Bucket <s3-bucket-name>]
    --region <region>
    --profile <aws-profile-name>

Inside the pipeline:

aws-cool-cli-ec2-run-command --command "cd /var/stuff; git clone https://github.com/stuff/name" --instanceId $INSTANCE_ID --s3Bucket $SSM_S3_BUCKET --task "CloneRepo" --timeout 30 --region us-west-2 --profile deployer
Step 5. Prepare, Build, Push, Orchestrate

Just like in the clone step above, run all necessary steps to build and deploy the product. All you need is the aws-cool-cli-ec2-run-command tool.

Step 6. EC2 Stop

At this point, deployment is completed. All we are left to do is to stop the instance. Command Usage:

aws-cool-cli-ec2-stop
    --instanceId <instance-id>
    [--wait]
    --region <region>
    --profile <aws-profile-name>

Inside the pipeline:

aws-cool-cli-ec2-stop --instanceId $INSTANCE_ID --wait --region us-west-2 --profile deployer

Compromises and Mitigation

Every approach has its shortcomings, and we will discuss them here.

Dangling Instances

You might have already guessed that if the pipeline fails during the build, the instance which was started in Step 3 will stay running. There are a few ways to mitigate the problem.

  1. Depending on the choice of your CI/CD check if there are ones that support “Cleanup Steps”. Those steps are guaranteed to be called even if the pipeline fails.
  2. If the CI/CD doesn’t support “Cleanup Steps”, you can mock the failure results of particular steps that are prone to errors. Take note of the failure and force it to proceed forward skipping all the steps except the final step, which stops instances. In this case, you would have to mark pipelines as failed manually.
  3. Use a special Lambda in AWS which stops instances that are running longer than X minutes.

As of today, Codefresh does not natively support “Cleanup Steps”. We also did not want to pollute the pipeline with lots of optional conditionals. So we used the third - Lambda approach. The Lambda function wakes up every minute, queries the list of running instances with a tag auto-stop=true, then checks if the instance is running longer auto-stop-duration and stops the instance. We know that our build is running for 3 minutes. So we set the auto-stop-duration to 5 minutes. Not ideal, but it just works! If you would need the code for that lambda, email me at ruben@berlioz.cloud

Preprovisioning Instances

EC2 instances should be created beforehand. It is essential if you need to run the pipeline on multiple branches.

We overcome this by extending our git automation. We never use git commands. There are special commands for any operation like new branch, sync, merge, commit, etc. The details will be covered in a different article. As part of this article, consider that they do magic. When we need a new feature branch, we use a git-new-branch command, which not only creates the branch but it also we create a new EC2 instance to be used in the CI/CD pipeline. For such feature branches, it configures the pipeline to deploy only, skipping test phases (it is not crucial during active development). The git-delete-branch terminates the EC2 instances.

Bottom Line

This was all you need to extend a SaaS CI/CD pipeline to use any AWS EC2 Instance. Some might need to reduce the overall bill, others to get access to the entire lineup of AWS EC2 instances. I hope this was helpful. Happy CI/CDing!

comments powered by Disqus
© 2019 Rudi, Inc - All rights reserved