AWS High Availability and Scalability: ELB and ASG

Introduction

In the cloud, building resilient and performant applications requires two key strategies: Scalability and High Availability. Scalability ensures systems can handle increasing loads, while high availability ensures systems remain functional even when parts of the architecture fail.

AWS provides two essential services to implement these strategies:

Elastic Load Balancing (ELB) – for distributing incoming traffic.
Auto Scaling Groups (ASG) – for automatically adjusting the number of instances.

In this post, we’ll dive deep into these services, explain concepts like vertical and horizontal scaling, and explore features like Cross-Zone Load Balancing, Sticky Sessions, and Auto Scaling Policies in detail.

1. High Availability and Scalability

What is Scalability?

Scalability allows your system to handle increased traffic or load by adapting resources. It is categorized into:

Vertical Scaling (Scaling Up):
- Upgrading a single server's capacity (e.g., more CPU, RAM).
- Example: Upgrading from a t2.micro to a t2.large.
- Use Case: Databases like RDS or ElastiCache where distribution isn’t feasible.
Horizontal Scaling (Scaling Out):
- Adding more servers to distribute the load.
- Example: Launching more EC2 instances to handle traffic.
- Use Case: Distributed systems like web applications.

What is High Availability?

High availability ensures systems remain operational during failures by distributing workloads across multiple Availability Zones (AZs).

Example: Operators in New York (AZ1) and San Francisco (AZ2). If New York fails, San Francisco continues handling requests.

2. Elastic Load Balancing (ELB)

What is Load Balancing?

Load balancing distributes incoming traffic across multiple servers to ensure no single server is overwhelmed.

How Does ELB Help?

Distributes load evenly to backend servers.
Performs health checks to avoid routing traffic to unhealthy instances.
Ensures High Availability by routing traffic across multiple AZs.
Simplifies SSL management with SSL termination.

Why Use ELB?

Managed by AWS – no need to configure or maintain load balancers.
Cost-effective compared to managing your own.
Seamlessly integrates with Auto Scaling Groups, CloudWatch, and other AWS services.

3. Application Load Balancer (ALB)

What is an ALB?

An ALB operates at Layer 7 (HTTP/HTTPS) and is designed for modern applications that require advanced routing.

Key Features:

Path-Based Routing: Route requests based on the URL path (e.g., /user to one target group, /posts to another).
Host-Based Routing: Route based on domain name (api.example.com → API servers).
Microservices Support: Integrates seamlessly with ECS and containers.
WebSocket Support: ALB can handle real-time applications using WebSockets.

Benefits:

Consolidates multiple microservices behind a single ALB.
Intelligent routing reduces latency and optimizes traffic flow.

4. Network Load Balancer (NLB)

What is an NLB?

An NLB operates at Layer 4 and is optimized for TCP/UDP traffic.

Key Features:

Supports millions of requests per second with low latency (~100ms).
Provides static IP addresses per AZ using Elastic IPs.
Ideal for applications requiring high-performance and fixed IPs.

Use Case:

Financial applications, real-time gaming, or IoT systems requiring ultra-low latency and extreme performance.

5. Gateway Load Balancer (GWLB)

What is a GWLB?

GWLB is designed to manage third-party network appliances like firewalls or intrusion prevention systems.

How it Works:

Traffic first flows through the GWLB.
GWLB routes the traffic to network appliances for inspection.
Approved traffic continues to the application.

Use Case:

Deploying virtual appliances for security (firewalls, IDS/IPS).
Analyzing network traffic for anomalies.

6. Sticky Sessions

What are Sticky Sessions?

Sticky Sessions (session affinity) ensure that a client consistently connects to the same backend instance for all its requests.

Why Use Sticky Sessions?

Maintain user session data (e.g., login states, shopping cart).
Prevent session loss due to load balancing.

How it Works:

A cookie is used to identify the session:
1. Application-Based Cookie: Generated by your application.
2. Duration-Based Cookie: Generated by the load balancer (e.g., expires after 1 day).

Tradeoff:

While sticky sessions help with session data, they may cause uneven load distribution across backend servers.

7. Cross-Zone Load Balancing

What is Cross-Zone Load Balancing?

Cross-Zone Load Balancing ensures traffic is evenly distributed across all registered instances, regardless of their Availability Zone (AZ).

Behavior:

ALB: Always enabled, no extra cost.
NLB/GWLB: Disabled by default; enabling incurs charges.
CLB: Disabled by default but free when enabled.

Why Use It?

Ensures balanced traffic distribution in multi-AZ setups.
Avoids imbalances caused by unequal instances per AZ.

8. SSL Certificates

What is an SSL/TLS Certificate?

SSL certificates encrypt data between clients and the load balancer, ensuring secure connections (HTTPS).

AWS Certificate Manager (ACM):

Manages SSL/TLS certificates easily.
Supports Server Name Indication (SNI): Enables hosting multiple SSL certificates on a single ALB/NLB.

9. Connection Draining (Deregistration Delay)

What is Connection Draining?

Connection Draining ensures in-flight requests to an instance are completed before the instance is deregistered or replaced.

How it Works:

Once an instance is marked for removal, it stops receiving new traffic.
Existing requests are allowed to finish within a configurable grace period (default: 300 seconds).

Why Use It?

Ensures a smooth transition during instance replacement or scaling events.
Prevents incomplete user requests.

10. Auto Scaling Groups (ASG)

What is an ASG?

An Auto Scaling Group automatically adjusts the number of EC2 instances based on demand.

How ASG Works:

Scale Out: Add instances when load increases.
Scale In: Remove instances when load decreases.
Health Checks: Replace unhealthy instances automatically.

Key Components:

Launch Template: Defines instance type, AMI, security groups, etc.
Scaling Policies: Rules for when to scale.

11. Auto Scaling Policies

What are Auto Scaling Policies?

Scaling policies define how an ASG adjusts its size based on metrics like CPU usage or requests.

Types of Policies:

Target Tracking Scaling: Maintain a target value (e.g., CPU = 50%).
Simple/Step Scaling: Scale based on CloudWatch alarms (e.g., CPU > 70%).
Scheduled Scaling: Predict scaling needs based on specific times (e.g., weekends).
Predictive Scaling: Forecasts demand and preemptively scales.

Scaling Cooldown:

Ensures new instances stabilize before initiating further scaling actions.
Default cooldown: 300 seconds.

Conclusion

By combining Elastic Load Balancing (ELB) and Auto Scaling Groups (ASG), AWS makes it easy to build scalable and highly available systems.

ELB distributes traffic intelligently across instances.
ASG adapts dynamically to handle changing loads and maintains system health.

Understanding these services and their key features (e.g., sticky sessions, cross-zone load balancing, scaling policies) is essential for designing resilient cloud architectures.