Infrastructure as code
On June 20th we dodged a bullet. After 810 days of uptime, a network card in one of our EC2 instances started to fail.
This particular instance hosted an application server. Due to a lack of automation, we couldn’t take any immediate actions. Luckily- after minimal downtime- the network card started working again. But that was just luck. When it comes to systems engineering, you need to be prepared to handle downtime.
It was time for a big change. We needed to be able to quickly recreate our environment(s). We needed to get our backend ready for additional scale. We needed to better secure our platform. And we really needed to adopt a Devops culture. Essentially the goal was to construct an infrastructure for this particular system all-the-while keeping things simple and engineering-friendly.
Getting started with docker containers
We started by “dockerizing” our services. With docker running locally in development, a basic configuration was established. Dependencies were connected via the docker network and backend service(s) were run using docker-compose.
docker network create <network_name>
docker-compose up
With ability to run locally, Jenkinsfile were added to each git repository. Jenkins pipelines began creating docker [binary] images that were stored in Artifactory.
Because binary images were stored on-line, engineers started sharing services without having to check out code and build images. This saved time as building a [complex] docker binary image can be protracted. Also, the pull command was used to promote re-use:
docker pull [OPTIONS] NAME[:TAG|@DIGEST]
CloudFormation
In order to deploy docker images, CloudFormation configuration files were created. Common shell scripts were used to handle basic tasks. Service templates were established with environment parameters. It became possible to create ECS services and tasks by definition.
As per https://12factor.net, using declarative formats for infrastructure automation:
- Minimizes time and cost for new developers joining the project
- Establishes a clean contract with the underlying operating system, offering maximum portability between execution environments
- Obviates the need for servers and systems administration
- Minimizes divergence between development and production, enabling continuous deployment for maximum agility
- And can scale up without significant changes to tooling, architecture, or development practices
Lastly, Environment Templates for ECS clusters (and other resources) were created. Services could then “live” within environments (ECS Clusters) such as development, qa and production:
Full deployment in action
Let’s put this together. Docker images combined-with service and environment templates established an end-to-end continuous integration and deployment pipeline using Jenkins:
After lots of testing, we deployed services and tasks to ECS in a controlled manner. Docker images were promoted from development — > qa — > production environments. A useful side-effect emerged. Its was now possible to see every service for a given environment (dev, qa and production) in one AWS console:
Problem solved
We went from feeling helpless to feeling empowered. With environment, services and tasks configuration declared in files, we can now recreate our environment(s) from scratch. We can scale up/down resources for ECS clusters and tasks. And we’ve made much better use of IAM through declarative roles and policies.
Your turn
You too can prevent downtime and benefit from taking an infrastructure-as-code approach. Start small with ancillary services. Put containers and schedulers in place. Empower backend engineers to build end-to-end systems. Happy coding :)