In our previous Tech Note, we described how binaries for multi-module applications can be created in a Docker environment, ready to be deployed. In this post we are going to discuss the underlying deployment infrastructure that can be used for staging and productive environments. To ensure app uptime and scalability, we choose a stack that adjusts resources dynamically according to the required throughput, but also would adapt to new requirements when our app grows. To automatically provision and configure the resources of our Stack, we used CloudFormation.
A CloudFormation Stack is a bundle of AWS elements that are created based on a template - written in general as a JSON or YAML file. Following the same logic that a generic container image could work for building every module in our deployments, we designed our Stacks for staging and production environments following the same principle: the main structure (the image) is shared between every Stack and the content (the concrete binaries to run and their configuration), differ. This way, we can reuse the underlying image across all our projects.
For provisioning containers, we use ECS and, in particular, we use the Fargate compute engine. With Fargate, AWS provides a way of provision container execution without having to define underlying infrastructure like EC2 machines. The engineering team just have to choose which container image to run, the parameters that will be passed to it, and that's it
ECS works mainly at three levels, Cluster, Service and Task Definitions. We can summarize the ECS Cluster as nothing more than a collection of Services and Tasks (running containers) grouping. As a design decision (not a limitation) we decided to create one Cluster by Stack, one Service by Cluster and one Task Definition by Service.
A Task Definition is the name given to the element that provides the required configuration to provision containers in ECS. A Task Definition also describes underlying resources like the amount of vCPU cores, and how much memory will be provided to the container. The Task Definition has all the end-state configuration for deploying and running it and, if Docker has the internal configuration for our app (for instance, a custom command or/and entry-point), the Task Definition will define those as a part of its configuration.
An ECS Service is a manager for instances of Task Definitions. The Service is in charge of spawning the amount of desired containers and, if any instance fails, those that requires to be re-provisioned will be launched again. The health of the containers are checked all the time through health checks - for instance, pinging them in HTTP port every 15 seconds. A Service takes care also of creating and destroying containers according to auto-scaling policies (if provided) and also to attaching them to a load balancer. A Service has a strategy for its deployment, that will be used to launch each task, as many times as necessary, until one manages to be healthy. There are two strategies for the services: DAEMON, that only let you have one task for each service, and REPLICA that was designed to distribute the traffic through tasks.
The Elastic Load Balancer, as expected, distributes loads between two or more nodes (in this case, Tasks) according to one or more policies. But at the same time, the ELB will check if every Task is healthy and will inform the Service if one or more are not passing the basic health checks. If that is the case, the Service will start the draining and killing process and will take care of relaunching a new Task if required.
All the previous systems are secured through a Security Group. In a nutshell, a Security Group can be seen as a firewall that controls access from/to IPs or others Security Groups. By default, a Security Group let any IP/TCP or UDP packet out from its system, and denies every access to it. If needed, we can edit the inbound rules to allow traffic, always remembering to protect the private parts of our apps with them - for example, if our backend is private, we can restrict the access to only other systems in a valid Security Group, making public the inbound only in the frontend.
For logging purposes, it is recommended to use integrated CloudWatch Logs support, which collects all the information sent to the standard output from the Tasks. Logs can be easily searched using CloudWatch Logs Insights - see our post devoted to that.
To anyone interested in a real example of a CloudFormation Template, a simplified version can be found here. With this structure, all our stacks are actually implemented in the same way, where we only need to edit the parameters when we create a stack, not the resources themselves.
To implement an automated deployment as a part of our CI/CD processes, theoretically we must need to update the original Stack every time that we want to deploy a new version of our app. However, we chose a simpler approach.
At this point it is important to mention that an ECS Service is configured with a specific Task Definition that will be used to instantiate their containers. When the Task Definition is updated or changed, the Service takes care of deploying new containers and draining old ones, ensuring zero downtime.
Since the structure of a Task Definition can be imported/exported using a JSON schema, the approach that we decided was to save a template for it in S3, replace the required placeholders that change on every deploy and then update the Task Definition via the AWS CLI. A simplified example of our JSON used for Task Definition rewriting is listed below.
In this second part of this Tech Note series, we saw how to define the infrastructure that will be running our app and the underlying foundation that it implements. With the concept of Stacks, we can provision all the infrastructure that we need for new modules only providing the required variables in the template, without creating it from scratch each time that a new module is integrated in the project.