Cargando…

Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration

Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations wit...

Descripción completa

Detalles Bibliográficos
Autores principales: Inayat, Kashif, Muslim, Fahad Bin, Iqbal, Javed, Hassnain Mohsan, Syed Agha, Alkahtani, Hend Khalid, Mostafa, Samih M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/
https://www.ncbi.nlm.nih.gov/pubmed/37177500
http://dx.doi.org/10.3390/s23094297
_version_ 1785041616964157440
author Inayat, Kashif
Muslim, Fahad Bin
Iqbal, Javed
Hassnain Mohsan, Syed Agha
Alkahtani, Hend Khalid
Mostafa, Samih M.
author_facet Inayat, Kashif
Muslim, Fahad Bin
Iqbal, Javed
Hassnain Mohsan, Syed Agha
Alkahtani, Hend Khalid
Mostafa, Samih M.
author_sort Inayat, Kashif
collection PubMed
description Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths.
format Online
Article
Text
id pubmed-10181616
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101816162023-05-13 Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. Sensors (Basel) Article Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths. MDPI 2023-04-26 /pmc/articles/PMC10181616/ /pubmed/37177500 http://dx.doi.org/10.3390/s23094297 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Inayat, Kashif
Muslim, Fahad Bin
Iqbal, Javed
Hassnain Mohsan, Syed Agha
Alkahtani, Hend Khalid
Mostafa, Samih M.
Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_full Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_fullStr Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_full_unstemmed Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_short Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_sort power-intent systolic array using modified parallel multiplier for machine learning acceleration
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/
https://www.ncbi.nlm.nih.gov/pubmed/37177500
http://dx.doi.org/10.3390/s23094297
work_keys_str_mv AT inayatkashif powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration
AT muslimfahadbin powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration
AT iqbaljaved powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration
AT hassnainmohsansyedagha powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration
AT alkahtanihendkhalid powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration
AT mostafasamihm powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration