How to Design Cloud Services and Best Practices:
- Design for failure, and create self-healing application environments.
- Always design applications with instances in at least two availablility zones.
- Guarantee that you have "reserve" capacity in the event of an emergency by purchasing reserved instances in a designated recovery availability zone (AWS does not guarantee on-demand instance capacity).
- Rigorously test to find single points of failure and apply high availability.
- Always enable RDS Multi-AZ and automated backups (InnoDB table support only for MySQL).
- Utilize Elastic IP addresses for fail over to "stand-by" instances when auto scaling and load balancing are not available.
- Use Route 53 to implement failover DNS techniques that include:
- Latency based routing.
- Failover DNS routing.
- Have a disaster recovery and backup strategy that utilizes:
- Multiple regions.
- Maintain up to date AMI's (and copy AMI's from one region to another).
- Copy EBS snapshots to other regions (use CRON jobs that take snapshots of EBS).
- Automate everything in order to easily re-deploy resources in the event of a disaster.
- Utilize bootstrapping to quickly bring up new instances with minimal configuration and allows for "generic" AMIs.
- Decouple application components using services such as SQS (when available).
- "Throw away" old or broken instances.
- Utilize CloudWatch to monitor infrastructure changes and health.
- Utilize MultiPartUpload for S3 uploads (for objects over 100 MB).
- Cache static content on Amazon CloudFront using EC2 or S3 origins.
- Protect your data in transit by using HTTPS/SSL endpoints.
- Protect data at rest using encrypted file systems or EBS/S# encryption options.
Connect to instances inside of the VPC using a bastion host or VPN connection.
Use IAM roles on EC2 instances instead of using API keys. Never store API keys on an AMI.
Monitoring your AWS Environment:
Use CloudWatch for
- Shutting down inactive instances.
- Monitoring changes in your AWS environment with CloudTrail integration.
- Monitor instances resources and create alarms based off of usage and availability:
- EC2 instances have "basic" monitoring which CloudWatch supports out of the box, and includes all metrics that can be monitored at the hypervisor level.
- Status Checks which can automate the recovery of failed status checks by stopping and starting the instance again.
- EC2 metrics that include custom scripts to work with CloudWatch:
- Disk Usage; Available Disk Space
- Swap Usage; Available Swap
- Memory Usage; Available Memory
Use CloudTrail for
- Security and compliance.
- Monitoring all actions taken against AWS account.
- Monitoring (and being notified) of changes to IAM accounts (with CloudWatch/SNS Integration).
- Viewing what API Keys/Users performed any given API action against an environment (i.e. view what user terminated a set of instances or an individual instance).
- Fulfilling auditing requirements inside of organizations.
Use AWS Config for
Receiving detailed configuration information about an AWS environment.
Taking a point in time "snapshot" of all supported AWS resources to determine the state of your environment.
Viewing historical configurations within your environment by viewing the "snapshots".
Receiving notifications whenever resources are created, modified, or deleted.
Viewing relationships between resources, i.e. what EC2 instances an EBS volume is attached to.
Architectural Trade-off Decisions:
Storage Trade-off Options
- S2 Standard Storage
- 99.999999999% durability and 99.99% availability, but is the most expensive.
- S3 RRS
- Reduce redundancy durability is 99.99%, but the storage costs is cheaper.
- Should be used for easily reproducible data, and you should take advantage of lost object notification using S3 event.
- Glacier
- Requires and extended timeframe to check-in and check-out data from archiving.
- Costs are significantly reduced compared to S3 storage options.
Database Trade-off Options
- Running databases on EC2 instances:
- Have to manage the underlying operating system.
- Have to build for high availability.
- Have to apply your own backups.
- Can use additional software to cluster MySQL.
- Requires more time to manage than RDS.
- Managed RDS database provides:
- Fully managed database updates and does not require managing of the underlying OS.
- Provides automatic point in time backups.
- Easily enable Multi-AZ failover, and when a failover occurs the DNS is switched from the primary instance to the standby instance.
- If Multi-AZ is enabled then backups are taken against the stand-by to reduce I/O freezes and updates are applied to the standby then is switched to the primary.
- Easily create read replicas.
Elasticity and Scalability:
- Proactive Cycle Scaling: Scaling that occurs at a fixed interval.
- Proactive Event-based Scaling: Scaling that occurs in anticipant of an event.
- Auto-scaling Based on Demand: Scaling that occurs based off of increase in demand for the application.
Plan to scale out rather than up (Horizontal scaling):
Add more EC2 instances to handle increases in capacity rather than increasing instance size.
Be sure to design for the proper instance size to start.
Use tools like Auto Scaling and ELB.
A scaled service should be fault tolerant and operationally efficient.
Scalable service should become more cost effective as it grows.
DynamoDB is a fully managed NoSQL services from AWS:
With high availability and scaling already built in.
All the developer has to do is specify required throughout for the tables.
RDS requires scaling in a few different ways:
RDS does not support a cluster of instance to load balance traffic across.
Because of this there are a few different methods to scale traffic with RDS:
Utilize read replicas to offload heavy read only traffic.
Increase the instance size to handle increase in load.
Utilize ElastiCache clusters for caching database session information.