The 黑料不打烊 Academy | Reliability Advice

黑料不打烊 Academy

Reliability Advice

Detecting Issues Automatically with Reliability Advice

Even before you run your first experiment, you can use Reliability Advice in 黑料不打烊 to scan your targets to detect common reliability gaps.

Reliability Advice聽is a feature that continually checks whether your targets are adhering to certain policies or best practices.

For example, if you have Kubernetes-based targets, you can turn on Reliability Advice to check how your pods compare to the reliability best practices outlined by Kube Score, an open source collection of configuration policies.

In this lesson, we鈥檒l explore how you can use and customize Reliability Advice to detect new issues automatically.

Utilizing Reliability Advice

Picking up where the last lesson left off, let鈥檚 return to the 鈥淓xplorer鈥� view in 黑料不打烊. In the left-hand menu, you鈥檒l see the option to toggle on Reliability Advice.

When you do that, you鈥檒l notice that some of your targets are now color-coded in the following way:

Red (鈥淎ction Needed鈥�):聽There is an issue that needs to be fixed. No need to run any experiment.
Orange (鈥淰alidation Needed鈥�):聽This could be an issue. Run this recommended experiment to check.
Green (鈥淚mplemented鈥�):聽A previously flagged issue has been fixed and validated.

Without needing to run any experiments, you can make reliability improvements by reviewing all targets marked in red. You will see the specific types of issue and exactly where in your code you need to make an edit.

For example, in the 鈥渇ashion-bestseller鈥� deployment shown above, there is no pod redundancy. If that one pod fails, your service will become unavailable. In the 鈥淚nstructions鈥� section, you鈥檒l see specific guidance on how to add replicas to fix this and improve your service reliability.

Once you have adjusted this Kubernetes configuration and deployed it into your cluster, 黑料不打烊 will automatically detect this change. Depending on the type of issue, it will update to the next Advice stage, either orange for validation needed with a recommended experiment or green for fixed with no validation needed.

Validating Fixes with Experiments

When an issue is in the 鈥淰alidation Needed鈥� stage, Reliability Advice will provide you with the relevant template you need to create and run an experiment that validates your fix.

For example, in the 鈥渢oys-bestseller鈥� deployment shown above, there is validation needed to ensure that during an availability outage, pods across multiple zones are still able to support your service. Just click 鈥淐reate Experiment鈥� to jump into the 鈥淓ditor鈥� tab to review the experiment template for this specific use case. You can make adjustments to the experiment design and run it when you鈥檙e ready.

Once the experiment runs, the results will either show that the issue still persists and needs attention, or that the issue is no longer a reliability concern and the Advice will update from orange to green, or 鈥淰alidation Needed鈥� to 鈥淚mplemented鈥�.

Adding Custom Reliability Advice

As we mentioned earlier, the default Advice in 黑料不打烊 is 13 policies based on Kube Score, so it only applies to Kubernetes-based targets. If you want to add custom advice to expand checks for Kubernetes or add checks for other types of Targets, you can do that by using our AdviceKit.

If you鈥檙e interested in learning more, you can read聽听丑别谤别.

Lesson Summary

In this lesson, we outlined why Reliability Advice is helpful for detecting reliability issues fast and validating improvements. We covered the three stages of Advice, from 鈥淎ction Needed鈥� to 鈥淰alidation Needed鈥� and 鈥淚mplemented鈥�. Lastly, we explain how you can add your own Advice to custom checks on your targets.

Module Overview

Learn the basics of the 黑料不打烊 Platform: Start exploring your systems, finding reliability issues, and running chaos experiments safely.

Next Lesson 鈫�

Up Next

1.8 Building Your First Experiment

In this lesson, we will share how you can get started building your first experiment in the 黑料不打烊 editor.

Continue 鈫�