Cargando…
Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations wit...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/ https://www.ncbi.nlm.nih.gov/pubmed/37177500 http://dx.doi.org/10.3390/s23094297 |
_version_ | 1785041616964157440 |
---|---|
author | Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. |
author_facet | Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. |
author_sort | Inayat, Kashif |
collection | PubMed |
description | Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths. |
format | Online Article Text |
id | pubmed-10181616 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101816162023-05-13 Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. Sensors (Basel) Article Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths. MDPI 2023-04-26 /pmc/articles/PMC10181616/ /pubmed/37177500 http://dx.doi.org/10.3390/s23094297 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title | Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title_full | Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title_fullStr | Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title_full_unstemmed | Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title_short | Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration |
title_sort | power-intent systolic array using modified parallel multiplier for machine learning acceleration |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/ https://www.ncbi.nlm.nih.gov/pubmed/37177500 http://dx.doi.org/10.3390/s23094297 |
work_keys_str_mv | AT inayatkashif powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT muslimfahadbin powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT iqbaljaved powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT hassnainmohsansyedagha powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT alkahtanihendkhalid powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT mostafasamihm powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration |