How to deploy a service from scratch – Bharat's digital garden

This aspires to be the go to guide when starting up a infra stack from scratch. If you feel any other important piece is missing, feel free to message me on either telegram or mail

Let’s say you have your own shiny new web server ready and you now want to deploy this to a production grade, highly reliable & durable infrastructure setup. So like all the big players, you choose AWS. And before we can start thinking about the server, we’ll need to first start from the foundation. The network stack.

The Network Stack

The network is where all infra foundations are built on and based out off. Let’s start with the most fundamental block, the VPC.

Virtual Private Cloud

The service needs a network to run on. AWS calls your own bubble of network a Virtual Private Cloud, VPC in short. The idea of the VPC is very simple, you can group a certain set of IPs and setup all your services inside that bubble. This is great because you can truly isolate services at a network level. which means if a service is trying to access another instance which is private in another bubble, it will never be able to. So you can have multiple network bubbles for different services or environments.

Apparently in 2006 AWS launched EC2 instances, and companies started raising concerns on security, isolation & privacy. This lead into AWS launching their own logically separated isolated network units in 2009 with a guarantee that there will be no overlap in VPC unless explicitly configured.

But how do we define this bubble? We start by selecting a large range of IPs. People do this by making use of CIDRs (Classless Inter-Domain Routing). Let’s zoom out a bit. Every server runs with an IP. An example IP would be 10.0.0.1. What if you want to select a whole range of IPs? it would be cool to say 10.0.x.x where x can range between 0 to 255. That’s exactly what CIDR does. All IPv4’s are 32 bit. 255.255.255.0 in binary reads to be 11111111.11111111.11111111.00000000, so 255.255.255.0/24 masks the first 24 bits, which makes everything from 255.255.255.0 to 255.255.255.255 valid. But not 255.255.225.0.

With this tool, we can now define a range of IPs which fall into the VPC.

Sub nets

Now that the VPC is defined, we’ll create smaller bubbles with very strong properties. These properties enable us to draw strong boundaries and make sure wrong configurations don’t backfire. Inside the VPC, we can further create divisions called subnets. Subnet’s as the name implies, are sub networks inside the VPC. We’ll again use the CIDR notation to borrow some IPs into the subnets.

For example, if the VPC is 10.0.0.0/16 (that is 65536 IPs in the block!), subnet’s could be 10.0.0.0/20 (4096 IPs), 10.0.16.0/20(4096 IPs), 10.0.160.0/20(4096 IPs) etc.. The suggestion is to create one sub net per availability zone. The reason for this is that if for some reason in Hyderabad on AZ (availability zone) goes down because of a natural calamity or unforeseen circumstances, the other AZ in Hyderabad will still continue to function.

Why do all this? Well we can make our subnet’s respect some boundaries. We can create 3 sub nets (one per AZ) and decide them to be public facing and 3 sub nets which are private facing. Inhernetly a subnet anyways is private, we inculcate these properties into the subnets by setting up the route tables.

Route tables

Probably the most straight forward idea here. The route tables decide how the network traffic should be routed in a particular bubble of network. We’ll use the route tables to enforce the properties of the sub nets.

NAT gateway

NAT stands for Network Address Translation. We’ll setup our private sub nets traffic to go through the NAT gateway. The property of a NAT gateway is such that it only allows outgoing network requests and the corresponding responses. But no incoming.

This is a great idea for a couple of reatsons. The first being that all the traffic in that sub net will be routed through a single point. This means any client / service provider wants to whitelist an IP so that you can hit their servers peacefully? Just share the NAT gateway IP and you are sorted. Second is that all the external traffic can be now measured and analyzed thoroughly basis from IP, to IP, packet size etc.. . Third is obviously that the private sub net stays private.

So, in the route tables. we’ll connect the private sub nets to a NAT gateway so that under no circumstance if a server is started in a private sub net, it can be accessed directly from outside.

Side note: NAT is billed per GB sent & the NAT running cost per hour. on top of this, if packets are sent across AZs. The bill is higher. NAT’s get pretty expensive pretty fast. Use with care.

Internet gateway

Some instances require two way traffic. For example, maybe you would want to deploy an instance and directly expose it out to the public. That might not be a good idea if you are exposing a service to a large set of customers, but not really a bad idea for an internal service for an organization.

Internet gateway allows traffic both ways into the sub net. So if an instance is in the public sub net with a static IP, users can directly hit the static IP. And also if the instance wants to reach out to the public internet, internet gateway allows that as well. With this, the public sub net is actually public. So without an internet gateway, no IPs will ever be exposed from the VPC.

With this, we finally wrap up setting up the foundation of the network infrastructure. On top of this we now will deploy applications.

Network Stack with Infrastructure as code

The same in AWS CDK code would be

const envDetails = {account: "<accountId>", region: "ap-south-1"};  
const vpcName = 'uat-vpc'  
  
export class NetworkStack extends cdk.Stack {  
    constructor(scope: Construct, id: string, props?: cdk.StackProps) {  
        super(scope, id, props);  
  
        new cdk.aws_ec2.Vpc(this, "vpc", {  
            vpcName: vpcName,  
            availabilityZones: ['ap-south-1a', 'ap-south-1b', 'ap-south-1c'],  
            ipAddresses: cdk.aws_ec2.IpAddresses.cidr('10.0.0.0/16'),  
            createInternetGateway: true,  
            enableDnsHostnames: true,  
            enableDnsSupport: true,  
            natGateways: 1,  
            ipProtocol: IpProtocol.IPV4_ONLY,  
            subnetConfiguration: [{  
                subnetType: SubnetType.PUBLIC,  
                name: 'public',  
                mapPublicIpOnLaunch: false,  
                cidrMask: 24,  
            }, {  
                subnetType: SubnetType.PRIVATE_WITH_EGRESS,  
                name: 'private',  
                cidrMask: 24,  
            }],  
        });  
    }
}
const app = new cdk.App();
new NetworkStack(app, "NetworkStack", {env: envDetails});

This construct sets up 3 public sub nets, three private sub nets, configures an internet gateway, NAT gateway with three AZ’s.

The Application Stack

This is for constant / persistent load. For burst load, the recommended path is to deploy via lambdas.

Now that the base is built, let’s build the application stack on top.

There are some basics requirements for our application stack

Versioned application artifacts
Deploy application code
Auto scaling based on metrics
Logging and monitoring
Blue green deployment with zero downtime’s. We don’t want the instances to restart all at once and because of that the application would be unresponsive for a brief period of time.

We’ll be using docker. Using docker as a packaging format has some great advantages like

Runs everywhere including on macOS, Linux, Windows etc..
Easy to store, a lot of infra is already built around storing and pulling docker images
Platforms built to support docker also pipe logs, metrics etc.. comes nicely out of the box
Auto scaling up & down will be easier since we are operating in the container territory and not on bare metal. AWS deploys images to a service called Elastic Container Service, which has a whole host of these pre-built features. So we’ll just be using that.

Application deployment architecture

We’ll be deploying our application using docker images. Docker images in AWS are deployed on the Elastic Container Service (ECR). The containers will be deployed on the private sub net.

We’ll need a load balancer so that traffic is managed effectively across containers. So we’ll have an application load balancer in the public sub net with a public IP.

Docker image for the application

For this example, we’ll just use crccheck/hello-world which will just spit out hello world on hitting /.

We’ll not be covering the idea of how to have build docker for your applications in too much in depth, but the idea is very straightforward. Have a simple DockerFile in your repository. Run docker build, docker tag to tag it to the commit ID and latest & then finally push it to Elastic container service (ECR). We’ll be using this image to deploy it to the servers.

Deploying the image

Elastic Container Service (ECS) is the service offered by AWS to deploy docker images in the platform. Traditionally deployments happened on EC2 instances / dedicated servers. Now a days it’s quicker to just deploy and spin up & down containers rather than an instance.

To work with ECS, we’ll need to understand three important ideas

Task definitions: These are the blueprints/specification files for defining which docker image to pull, which command to run in the docker image, how much CPU to allocate, how much memory to allocate etc..
Clusters: A cluster sets some max specifications and there are containers spun up inside the cluster
Tasks: The task is the deployment of the task definition into the container. So, if 15 pods run that’s 15 tasks running inside a cluster.

So to start off, the image definition we’ll opt for would be with the image of crccheck/hello-world, setup the container port at 8080, setup some environment variables, a role with some accesses & setup CPU and memory constraints.

Auto scaling

Any production grade infra stack needs to scale up & down based on some metric like network traffic, CPU etc..

For example, when the CPU goes above 80% we might want to start one more instance to manage the workload. And when the CPU goes below 70 we might want to scale down the container count since the traffic is reducing. This policy is what is called as a scaling policy.

With this policy, we’ll set a min of 1 instance and a max of 3 instances. As the CPU goes up we’ll spawn one more container, max containers at any point can be three. You’ll see this policy neatly laid out in code.

With this, effectively the service on deploy will be up and running with / will respond with hello world.

export class HelloWorldApplicationStack extends cdk.Stack {  
    constructor(scope: Construct, id: string, props?: cdk.StackProps) {  
        super(scope, id, props);  
  
        const applicationId = 'HelloWorldApplication'
  
        const vpc = cdk.aws_ec2.Vpc.fromLookup(this, 'vpc', {vpcName: vpcName})  
        const applicationLoadBalancer = new cdk.aws_elasticloadbalancingv2.ApplicationLoadBalancer(this, `${applicationId}LoadBalancer`, {  
            vpc: vpc,  
            internetFacing: true,  
            vpcSubnets: {subnetType: SubnetType.PUBLIC}  
        });  
        const taskRole = new cdk.aws_iam.Role(this, `${applicationId}TaskRole`, {  
            assumedBy: new cdk.aws_iam.ServicePrincipal('ecs-tasks.amazonaws.com'),  
            managedPolicies: [{managedPolicyArn: 'arn:aws:iam::aws:policy/SecretsManagerReadWrite'}, {managedPolicyArn: 'arn:aws:iam::aws:policy/AmazonS3FullAccess'}]  
        })  
        const helloWorldApplication = new cdk.aws_ecs_patterns.ApplicationLoadBalancedFargateService(this, applicationId, {  
            vpc: vpc,  
            taskSubnets: {subnetType: SubnetType.PRIVATE_WITH_EGRESS},  
            loadBalancer: applicationLoadBalancer,  
            assignPublicIp: false,  
            cpu: 2048,  
            memoryLimitMiB: 4096,  
            taskImageOptions: {  
                image: new cdk.aws_ecs.RepositoryImage('crccheck/hello-world:latest'),  
                containerPort: 8000,  
                environment: {  
                    'NODE_ENV': 'uat'  
                },  
                taskRole: taskRole,  
            },  
        });  
  
        const scalableTarget = new cdk.aws_applicationautoscaling.ScalableTarget(this, `${applicationId}ScalingTarget`, {  
            serviceNamespace: cdk.aws_applicationautoscaling.ServiceNamespace.ECS,  
            resourceId: `service/${helloWorldApplication.cluster.clusterName}/${helloWorldApplication.service.serviceName}`,  
            scalableDimension: 'ecs:service:DesiredCount',  
            minCapacity: 1,  
            maxCapacity: 3,  
        });  
        const scalingPolcy = new cdk.aws_applicationautoscaling.TargetTrackingScalingPolicy(this, `${applicationId}ScalingPolicy`, {  
            scalingTarget: scalableTarget,  
            targetValue: 70,  
            scaleOutCooldown: cdk.Duration.seconds(60),  
            scaleInCooldown: cdk.Duration.seconds(60),  
            predefinedMetric: cdk.aws_applicationautoscaling.PredefinedMetric.ECS_SERVICE_AVERAGE_CPU_UTILIZATION  
        });  
    }  
}
 
new HelloWorldApplicationStack(app, "HelloWorldApplicationStack", {env: envDetails});

Figure out the load balancer’s URL, hit and the response should be hello world!