This is one post in a series of posts that will present and define terms that are used in distributed computing, cloud-native computing, and/or container-based development.
Fault Tolerance
The ability of a system to remain running despite failures. In Cloud Native computing, this is paramount to success. Network fragility and a reliance on systems beyond your control, i.e. third-party services, require systems design to expect and respond to failures.
Using technologies such as containers, Kubernetes and system-wide monitoring allows you to improve site reliability. Additional techniques and practices related to fault tolerance include fallbacks and circuit breakers.