Overview
The Site Reliability Engineering Foundation℠ course provides individuals with a solid foundation in the principles, practices, and methodologies of Site Reliability Engineering (SRE). Participants will gain a comprehensive understanding of SRE concepts, including reliability engineering, service level objectives (SLOs), error budgets, monitoring, incident response, and automation. This course serves as an introduction to SRE and equips learners with the necessary knowledge to contribute to SRE initiatives within their organizations.
Objectives
At the end of Applying Professional Scrum Training for Site Reliability Engineering (SRE) Foundation℠ course, participants will be able to
Prerequisites
- There are no specific prerequisites for this course.
- However, a basic understanding of software development, system administration, and cloud computing concepts would be beneficial.
Course Outline
- Understanding the principles and objectives of SRE
- Exploring the role of SRE in modern technology organizations
- Importance of reliability, availability, and performance in system design
- Implementing best practices for building and operating reliable systems
- Defining and establishing SLOs to measure system reliability
- Managing error budgets and balancing risk and innovation
- Developing effective incident response processes
- Incident escalation, communication, and post-incident analysis
- Implementing effective monitoring strategies for system health and performance
- Leveraging observability tools for in-depth system insights
- Automating infrastructure management and deployment processes
- Using configuration management tools and infrastructure-as-code principles