Amazon Web Services Well Architected Framework

The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. By using the Framework you will learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. It provides a way to consistently measure your architectures against best practices and identify areas for improvement. We believe that having well-architected systems greatly increases the likelihood of business success.

Introduction

When architecting solutions you make trade-offs between pillars based upon your business context. Security and operational excellence are generally not traded-off against the other pillars.

Definitions

Operational Excellence

The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

Security

The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.

Reliability

The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

Performance Efficiency

The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.

Cost Optimization

The ability to run systems to deliver business value at the lowest price point.

General Design principles

Stop guessing your capacity needs
Test systems at production scale
Automate to make architectural experimentation easier
Allow for evolutionary architectures
Drive architectures using data
Improve through game days

Operational Excellence

Design Principles

Perform operations as code
Annotated documentation (after every build)
Make frequent, small, reversible changes
Refine operations procedures frequently
Anticipate failure
Learn from all operational failures

Prepare

Operational Priorities

Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals in order to set the priorities that will enable business success. You also need to consider external regulatory and compliance requirements that may influence your priorities. Use your priorities to focus your operations improvement efforts where they will have the greatest impact (for example, developing team skills, improving workload performance, automating runbooks, or enhancing monitoring). Update your priorities as needs change.

Key AWS Services

AWS Cloud Compliance
AWS Trusted Advisor
Business Support
Entreprise Support

Design for Operations

The design of your workload should include how it will be deployed, updated, and operated. You will want to implement engineering practices that align with defect reduction and quick and safe fixes. To understand what is happening inside your architecture, you will need to enable observation with logging, instrumentation, and insightful business and technical metrics.

In AWS, you can view your entire workload (applications, infrastructure, policy, governance, and operations) as code.

Key AWS Services

AWS CloudFormation
AWS Developer Tools
AWS X-Ray

Operational Readiness

You should use a consistent process (including checklists) to know when you are ready to go live with your workload.

Key AWS Services

AWS Config
AWS Systems Manager

Operate

Understanding Operational Health

Key AWS Services

Amazon CloudWatch Logs
Amazon ES
Personal Health Dashboard
Service Health Dashboard

Responding to Events

Planned and Unplanned.

Key AWS Services

Amazon CloudWatch
Amazon CloudWatch Events
Amazon SNS
Auto Scaling
AWS Systems Manager

Evolve

Learning from Experience

Key AWS Services

Amazon QuickSight
Amazon Athena
Amazon S3

Share Learnings

You should share what your teams learn to increase the benefit across your organization. You will want to share information and resources to prevent avoidable errors and ease development efforts. This will allow you to focus on delivering features.

Key AWS Services

Amazon SNS
AWS CodeCommit
AWS Lambda
AWS CloudFormation
Amazon Machine Images (AMIs)

Conclusion

Operational excellence is an ongoing effort. Every operational event and failure should be treated as an opportunity to improve the operations of your architecture. By understanding the needs of your workloads, predefining runbooks for routine activities, and playbooks to guide issue resolution, using the operations as code features in AWS, and maintaining situational awareness, your operations will be ready and responsive when events occur. Through focusing on incremental improvement based on operational priorities, and lessons learned from event response and retrospective analysis, you will enable the success of your business by increasing the efficiency and effectiveness of your operations.