The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. By using the Framework you will learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. It provides a way to consistently measure your architectures against best practices and identify areas for improvement. We believe that having well-architected systems greatly increases the likelihood of business success.
Introduction
When architecting solutions you make trade-offs between pillars based upon your business context. Security and operational excellence are generally not traded-off against the other pillars.
Definitions
Operational Excellence
The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.
Security
The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
Reliability
The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
Performance Efficiency
The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
Cost Optimization
The ability to run systems to deliver business value at the lowest price point.
General Design principles
-
Stop guessing your capacity needs
-
Test systems at production scale
-
Automate to make architectural experimentation easier
-
Allow for evolutionary architectures
-
Drive architectures using data
-
Improve through game days
Operational Excellence
Design Principles
-
Perform operations as code
-
Annotated documentation (after every build)
-
Make frequent, small, reversible changes
-
Refine operations procedures frequently
-
Anticipate failure
-
Learn from all operational failures
Prepare
Operational Priorities
Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals in order to set the priorities that will enable business success. You also need to consider external regulatory and compliance requirements that may influence your priorities. Use your priorities to focus your operations improvement efforts where they will have the greatest impact (for example, developing team skills, improving workload performance, automating runbooks, or enhancing monitoring). Update your priorities as needs change.
Key AWS Services
-
AWS Cloud Compliance
-
AWS Trusted Advisor
-
Business Support
-
Entreprise Support
Design for Operations
The design of your workload should include how it will be deployed, updated, and operated. You will want to implement engineering practices that align with defect reduction and quick and safe fixes. To understand what is happening inside your architecture, you will need to enable observation with logging, instrumentation, and insightful business and technical metrics.
In AWS, you can view your entire workload (applications, infrastructure, policy, governance, and operations) as code.
Key AWS Services
-
AWS CloudFormation
-
AWS Developer Tools
-
AWS X-Ray
Operational Readiness
You should use a consistent process (including checklists) to know when you are ready to go live with your workload.
Key AWS Services
-
AWS Config
-
AWS Systems Manager
Operate
Understanding Operational Health
Key AWS Services
-
Amazon CloudWatch Logs
-
Amazon ES
-
Personal Health Dashboard
-
Service Health Dashboard
Responding to Events
Planned and Unplanned.
Key AWS Services
-
Amazon CloudWatch
-
Amazon CloudWatch Events
-
Amazon SNS
-
Auto Scaling
-
AWS Systems Manager
Evolve
Learning from Experience
Key AWS Services
-
Amazon QuickSight
-
Amazon Athena
-
Amazon S3
Share Learnings
You should share what your teams learn to increase the benefit across your organization. You will want to share information and resources to prevent avoidable errors and ease development efforts. This will allow you to focus on delivering features.
Key AWS Services
-
Amazon SNS
-
AWS CodeCommit
-
AWS Lambda
-
AWS CloudFormation
-
Amazon Machine Images (AMIs)
Conclusion
Operational excellence is an ongoing effort. Every operational event and failure should be treated as an opportunity to improve the operations of your architecture. By understanding the needs of your workloads, predefining runbooks for routine activities, and playbooks to guide issue resolution, using the operations as code features in AWS, and maintaining situational awareness, your operations will be ready and responsive when events occur. Through focusing on incremental improvement based on operational priorities, and lessons learned from event response and retrospective analysis, you will enable the success of your business by increasing the efficiency and effectiveness of your operations.