Disaster Recovery:
Business disaster recovery key words: Very important for AWS CSA Exam
Recovery time objective (RTD): Time it takes after a disruption to restore operation back to its regular service level, as defined by the companies operational level agreement. (i.e. if the RTO is 4 hours, you have hours to restore service back to an acceptable level).
Recovery point objective (RPO): Acceptable amount of data loss measured in time (i.e. if the system goes down at 10 PM, and RPO is 2 hours, then you should recover all data as part of the application as it was before 8 PM).
Not only should you design for disaster recovery for your application running on AWS, you can also use AWS as a disaster recovery solution for your on-premise applications or data. The AWS services used should be determined based off of the business RTO and RPO operational agreement.
Pilot Light: A minimal version of your production environment that is running on AWS. This allows for replication from on-premise servers to AWS, and in the event of a disaster the AWS environment spins up more capacity (elasticity/automatically) and DNS is switch from on-premise to AWS. It is important to keep up to date AMI and instance configurations if following pilot light protocol.
Warm Standby: Has a larger foot print than a pilot light setup, and would most likely be running business critical applications in "standby". This type of configuration could also be used as a test area for applications.
Multi-Site Solution: Essentially clones your "production" environment, which can either be in the cloud or on-premise. Has an active-active configuration which means instances size and capacity are all running in full standby and can easily convert at the flip of a switch. Methods like this could also be used to "load balance" using latency based routing or Route 53 failover in the event of an issue.
Services Examples:
- Elastic Load Balancer and Auto Scaling
- Amazon EC2 VM Import Connector
- AMI's with up to date configurations
- Replication from on-premise database servers to RDS
- Automate the increasing of resources in the event of a disaster
- Use AWS import/Export to copy large amounts of data to speed up replication times (also used for off site archiving).
- Route 53 DNS failover/latency based routing solutions
- Storage gateway (gateway-cached volumes/gateway-stored volumes)