Cargando…

Addressing multiple bit/symbol errors in DRAM subsystem

As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors...

Descripción completa

Detalles Bibliográficos
Autores principales: Yeleswarapu, Ravikiran, Somani, Arun K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959593/
https://www.ncbi.nlm.nih.gov/pubmed/33817009
http://dx.doi.org/10.7717/peerj-cs.359
_version_ 1783664983025909760
author Yeleswarapu, Ravikiran
Somani, Arun K.
author_facet Yeleswarapu, Ravikiran
Somani, Arun K.
author_sort Yeleswarapu, Ravikiran
collection PubMed
description As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller.
format Online
Article
Text
id pubmed-7959593
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79595932021-04-02 Addressing multiple bit/symbol errors in DRAM subsystem Yeleswarapu, Ravikiran Somani, Arun K. PeerJ Comput Sci Algorithms and Analysis of Algorithms As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller. PeerJ Inc. 2021-02-09 /pmc/articles/PMC7959593/ /pubmed/33817009 http://dx.doi.org/10.7717/peerj-cs.359 Text en © 2021 Yeleswarapu and Somani https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Algorithms and Analysis of Algorithms
Yeleswarapu, Ravikiran
Somani, Arun K.
Addressing multiple bit/symbol errors in DRAM subsystem
title Addressing multiple bit/symbol errors in DRAM subsystem
title_full Addressing multiple bit/symbol errors in DRAM subsystem
title_fullStr Addressing multiple bit/symbol errors in DRAM subsystem
title_full_unstemmed Addressing multiple bit/symbol errors in DRAM subsystem
title_short Addressing multiple bit/symbol errors in DRAM subsystem
title_sort addressing multiple bit/symbol errors in dram subsystem
topic Algorithms and Analysis of Algorithms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959593/
https://www.ncbi.nlm.nih.gov/pubmed/33817009
http://dx.doi.org/10.7717/peerj-cs.359
work_keys_str_mv AT yeleswarapuravikiran addressingmultiplebitsymbolerrorsindramsubsystem
AT somaniarunk addressingmultiplebitsymbolerrorsindramsubsystem