Cargando…
Distributed consensus and fault tolerance - Lecture 2
<!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, c...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2255145 |
_version_ | 1780953686895230976 |
---|---|
author | Bitzes, Georgios |
author_facet | Bitzes, Georgios |
author_sort | Bitzes, Georgios |
collection | CERN |
description | <!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, crash because of some race condition, or have its hard drive fail.
What are the challenges when designing fault-tolerant distributed systems, where a cluster is able to survive the loss of individual nodes? In this lecture, we will discuss some basics on this topic (consistency models, CAP theorem, failure modes, byzantine faults), detail the raft consensus algorithm, and showcase an interesting example of a highly resilient distributed system, bitcoin. |
id | cern-2255145 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2017 |
record_format | invenio |
spelling | cern-22551452022-11-02T22:32:27Zhttp://cds.cern.ch/record/2255145engBitzes, GeorgiosDistributed consensus and fault tolerance - Lecture 2Inverted CERN School of Computing 2017inverted CSC<!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, crash because of some race condition, or have its hard drive fail. What are the challenges when designing fault-tolerant distributed systems, where a cluster is able to survive the loss of individual nodes? In this lecture, we will discuss some basics on this topic (consistency models, CAP theorem, failure modes, byzantine faults), detail the raft consensus algorithm, and showcase an interesting example of a highly resilient distributed system, bitcoin.oai:cds.cern.ch:22551452017 |
spellingShingle | inverted CSC Bitzes, Georgios Distributed consensus and fault tolerance - Lecture 2 |
title | Distributed consensus and fault tolerance - Lecture 2 |
title_full | Distributed consensus and fault tolerance - Lecture 2 |
title_fullStr | Distributed consensus and fault tolerance - Lecture 2 |
title_full_unstemmed | Distributed consensus and fault tolerance - Lecture 2 |
title_short | Distributed consensus and fault tolerance - Lecture 2 |
title_sort | distributed consensus and fault tolerance - lecture 2 |
topic | inverted CSC |
url | http://cds.cern.ch/record/2255145 |
work_keys_str_mv | AT bitzesgeorgios distributedconsensusandfaulttolerancelecture2 AT bitzesgeorgios invertedcernschoolofcomputing2017 |