Cargando…

Distributed consensus and fault tolerance - Lecture 1

<!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, c...

Descripción completa

Detalles Bibliográficos
Autor principal: Bitzes, Georgios
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:http://cds.cern.ch/record/2255004
_version_ 1780953680955047936
author Bitzes, Georgios
author_facet Bitzes, Georgios
author_sort Bitzes, Georgios
collection CERN
description <!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, crash because of some race condition, or have its hard drive fail. What are the challenges when designing fault-tolerant distributed systems, where a cluster is able to survive the loss of individual nodes? In this lecture, we will discuss some basics on this topic (consistency models, CAP theorem, failure modes, byzantine faults), detail the raft consensus algorithm, and showcase an interesting example of a highly resilient distributed system, bitcoin.
id cern-2255004
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22550042022-11-02T22:32:27Zhttp://cds.cern.ch/record/2255004engBitzes, GeorgiosDistributed consensus and fault tolerance - Lecture 1Inverted CERN School of Computing 2017inverted CSC<!--HTML-->In a world where clusters with thousands of nodes are becoming commonplace, we are often faced with the task of having them coordinate and share state. As the number of machines goes up, so does the probability that something goes wrong: a node could temporarily lose connectivity, crash because of some race condition, or have its hard drive fail. What are the challenges when designing fault-tolerant distributed systems, where a cluster is able to survive the loss of individual nodes? In this lecture, we will discuss some basics on this topic (consistency models, CAP theorem, failure modes, byzantine faults), detail the raft consensus algorithm, and showcase an interesting example of a highly resilient distributed system, bitcoin.oai:cds.cern.ch:22550042017
spellingShingle inverted CSC
Bitzes, Georgios
Distributed consensus and fault tolerance - Lecture 1
title Distributed consensus and fault tolerance - Lecture 1
title_full Distributed consensus and fault tolerance - Lecture 1
title_fullStr Distributed consensus and fault tolerance - Lecture 1
title_full_unstemmed Distributed consensus and fault tolerance - Lecture 1
title_short Distributed consensus and fault tolerance - Lecture 1
title_sort distributed consensus and fault tolerance - lecture 1
topic inverted CSC
url http://cds.cern.ch/record/2255004
work_keys_str_mv AT bitzesgeorgios distributedconsensusandfaulttolerancelecture1
AT bitzesgeorgios invertedcernschoolofcomputing2017