How to Design Cloud Services and Best Practices:

  • Design for failure, and create self-healing application environments.
  • Always design applications with instances in at least two availablility zones.
  • Guarantee that you have "reserve" capacity in the event of an emergency by purchasing reserved instances in a designated recovery availability zone (AWS does not guarantee on-demand instance capacity).
  • Rigorously test to find single points of failure and apply high availability.
  • Always enable RDS Multi-AZ and automated backups (InnoDB table support only for MySQL).
  • Utilize Elastic IP addresses for fail over to "stand-by" instances when auto scaling and load balancing are not available.
  • Use Route 53 to implement failover DNS techniques that include:
    • Latency based routing.
    • Failover DNS routing.
  • Have a disaster recovery and backup strategy that utilizes:
    • Multiple regions.
    • Maintain up to date AMI's (and copy AMI's from one region to another).
    • Copy EBS snapshots to other regions (use CRON jobs that take snapshots of EBS).
    • Automate everything in order to easily re-deploy resources in the event of a disaster.
    • Utilize bootstrapping to quickly bring up new instances with minimal configuration and allows for "generic" AMIs.
  • Decouple application components using services such as SQS (when available).
  • "Throw away" old or broken instances.
  • Utilize CloudWatch to monitor infrastructure changes and health.
  • Utilize MultiPartUpload for S3 uploads (for objects over 100 MB).
  • Cache static content on Amazon CloudFront using EC2 or S3 origins.
  • Protect your data in transit by using HTTPS/SSL endpoints.
  • Protect data at rest using encrypted file systems or EBS/S# encryption options.
  • Connect to instances inside of the VPC using a bastion host or VPN connection.

  • Use IAM roles on EC2 instances instead of using API keys. Never store API keys on an AMI.

Monitoring your AWS Environment:

Use CloudWatch for

  • Shutting down inactive instances.
  • Monitoring changes in your AWS environment with CloudTrail integration.
  • Monitor instances resources and create alarms based off of usage and availability:
    • EC2 instances have "basic" monitoring which CloudWatch supports out of the box, and includes all metrics that can be monitored at the hypervisor level.
    • Status Checks which can automate the recovery of failed status checks by stopping and starting the instance again.
    • EC2 metrics that include custom scripts to work with CloudWatch:
      • Disk Usage; Available Disk Space
      • Swap Usage; Available Swap
      • Memory Usage; Available Memory

Use CloudTrail for

  • Security and compliance.
  • Monitoring all actions taken against AWS account.
  • Monitoring (and being notified) of changes to IAM accounts (with CloudWatch/SNS Integration).
  • Viewing what API Keys/Users performed any given API action against an environment (i.e. view what user terminated a set of instances or an individual instance).
  • Fulfilling auditing requirements inside of organizations.

Use AWS Config for

  • Receiving detailed configuration information about an AWS environment.

  • Taking a point in time "snapshot" of all supported AWS resources to determine the state of your environment.

  • Viewing historical configurations within your environment by viewing the "snapshots".

  • Receiving notifications whenever resources are created, modified, or deleted.

  • Viewing relationships between resources, i.e. what EC2 instances an EBS volume is attached to.

Architectural Trade-off Decisions:

Storage Trade-off Options

  • S2 Standard Storage
    • 99.999999999% durability and 99.99% availability, but is the most expensive.
    • S3 RRS
      • Reduce redundancy durability is 99.99%, but the storage costs is cheaper.
      • Should be used for easily reproducible data, and you should take advantage of lost object notification using S3 event.
    • Glacier
      • Requires and extended timeframe to check-in and check-out data from archiving.
      • Costs are significantly reduced compared to S3 storage options.

Database Trade-off Options

  • Running databases on EC2 instances:
    • Have to manage the underlying operating system.
    • Have to build for high availability.
    • Have to apply your own backups.
    • Can use additional software to cluster MySQL.
    • Requires more time to manage than RDS.
  • Managed RDS database provides:
    • Fully managed database updates and does not require managing of the underlying OS.
    • Provides automatic point in time backups.
    • Easily enable Multi-AZ failover, and when a failover occurs the DNS is switched from the primary instance to the standby instance.
    • If Multi-AZ is enabled then backups are taken against the stand-by to reduce I/O freezes and updates are applied to the standby then is switched to the primary.
    • Easily create read replicas.

Elasticity and Scalability:

  • Proactive Cycle Scaling: Scaling that occurs at a fixed interval.
  • Proactive Event-based Scaling: Scaling that occurs in anticipant of an event.
  • Auto-scaling Based on Demand: Scaling that occurs based off of increase in demand for the application.
  • Plan to scale out rather than up (Horizontal scaling):

    • Add more EC2 instances to handle increases in capacity rather than increasing instance size.

    • Be sure to design for the proper instance size to start.

    • Use tools like Auto Scaling and ELB.

    • A scaled service should be fault tolerant and operationally efficient.

    • Scalable service should become more cost effective as it grows.

  • DynamoDB is a fully managed NoSQL services from AWS:

    • With high availability and scaling already built in.

    • All the developer has to do is specify required throughout for the tables.

  • RDS requires scaling in a few different ways:

    • RDS does not support a cluster of instance to load balance traffic across.

    • Because of this there are a few different methods to scale traffic with RDS:

      • Utilize read replicas to offload heavy read only traffic.

      • Increase the instance size to handle increase in load.

      • Utilize ElastiCache clusters for caching database session information.

results matching ""

    No results matching ""