Coming to terms: Fault Tolerance

October 28, 2022

Don Schenck

This is one post in a series of posts that will present and define terms that are used in distributed computing, cloud-native computing, and/or container-based development.

Fault Tolerance

The ability of a system to remain running despite failures. In Cloud Native computing, this is paramount to success. Network fragility and a reliance on systems beyond your control, i.e. third-party services, require systems design to expect and respond to failures.

Using technologies such as containers, Kubernetes and system-wide monitoring allows you to improve site reliability. Additional techniques and practices related to fault tolerance include fallbacks and circuit breakers.

Last updated: February 20, 2023

Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

Report a website issue

Your name

Your e-mail address

Subject

Message

Type of request/issue

Problem Page URL

Country/Territory

Red Hat Account Number

Coming to terms: Fault Tolerance

Share:

Fault Tolerance

Llama 4 herd is here with Day 0 inference support in vLLM

Simplify AI data integration with RamaLama and RAG

How to navigate LLM model names

What's new in Red Hat Developer Hub 1.5?

Browse Red Hat Developer Hub’s Extensive Plugins Catalog With The New “Extensions” Plugin

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue