Cargando…

OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems

With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a larg...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Duy-Thanh, Ho, Nhut-Minh, Wong, Weng-Fai, Chang, Ik-Joon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708231/
https://www.ncbi.nlm.nih.gov/pubmed/34960359
http://dx.doi.org/10.3390/s21248271
_version_ 1784622632309620736
author Nguyen, Duy-Thanh
Ho, Nhut-Minh
Wong, Weng-Fai
Chang, Ik-Joon
author_facet Nguyen, Duy-Thanh
Ho, Nhut-Minh
Wong, Weng-Fai
Chang, Ik-Joon
author_sort Nguyen, Duy-Thanh
collection PubMed
description With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.
format Online
Article
Text
id pubmed-8708231
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87082312021-12-25 OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems Nguyen, Duy-Thanh Ho, Nhut-Minh Wong, Weng-Fai Chang, Ik-Joon Sensors (Basel) Article With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation. MDPI 2021-12-10 /pmc/articles/PMC8708231/ /pubmed/34960359 http://dx.doi.org/10.3390/s21248271 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nguyen, Duy-Thanh
Ho, Nhut-Minh
Wong, Weng-Fai
Chang, Ik-Joon
OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title_full OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title_fullStr OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title_full_unstemmed OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title_short OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems
title_sort obet: on-the-fly byte-level error tracking for correcting and detecting faults in unreliable dram systems
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708231/
https://www.ncbi.nlm.nih.gov/pubmed/34960359
http://dx.doi.org/10.3390/s21248271
work_keys_str_mv AT nguyenduythanh obetontheflybytelevelerrortrackingforcorrectinganddetectingfaultsinunreliabledramsystems
AT honhutminh obetontheflybytelevelerrortrackingforcorrectinganddetectingfaultsinunreliabledramsystems
AT wongwengfai obetontheflybytelevelerrortrackingforcorrectinganddetectingfaultsinunreliabledramsystems
AT changikjoon obetontheflybytelevelerrortrackingforcorrectinganddetectingfaultsinunreliabledramsystems