Cargando…

Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration

Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations wit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Inayat, Kashif, Muslim, Fahad Bin, Iqbal, Javed, Hassnain Mohsan, Syed Agha, Alkahtani, Hend Khalid, Mostafa, Samih M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/ https://www.ncbi.nlm.nih.gov/pubmed/37177500 http://dx.doi.org/10.3390/s23094297

_version_	1785041616964157440
author	Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M.
author_facet	Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M.
author_sort	Inayat, Kashif
collection	PubMed
description	Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths.
format	Online Article Text
id	pubmed-10181616
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101816162023-05-13 Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. Sensors (Basel) Article Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths. MDPI 2023-04-26 /pmc/articles/PMC10181616/ /pubmed/37177500 http://dx.doi.org/10.3390/s23094297 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Inayat, Kashif Muslim, Fahad Bin Iqbal, Javed Hassnain Mohsan, Syed Agha Alkahtani, Hend Khalid Mostafa, Samih M. Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title	Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_full	Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_fullStr	Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_full_unstemmed	Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_short	Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration
title_sort	power-intent systolic array using modified parallel multiplier for machine learning acceleration
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181616/ https://www.ncbi.nlm.nih.gov/pubmed/37177500 http://dx.doi.org/10.3390/s23094297
work_keys_str_mv	AT inayatkashif powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT muslimfahadbin powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT iqbaljaved powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT hassnainmohsansyedagha powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT alkahtanihendkhalid powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration AT mostafasamihm powerintentsystolicarrayusingmodifiedparallelmultiplierformachinelearningacceleration

Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration

Ejemplares similares