Cargando…
BLAS3 optimization for the Godson-3B1500
This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122567/ https://www.ncbi.nlm.nih.gov/pubmed/27933269 http://dx.doi.org/10.1186/s40064-016-3690-3 |
_version_ | 1782469602002862080 |
---|---|
author | Zhang, Ming Gu, Naijie Ren, Kaixin |
author_facet | Zhang, Ming Gu, Naijie Ren, Kaixin |
author_sort | Zhang, Ming |
collection | PubMed |
description | This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance. |
format | Online Article Text |
id | pubmed-5122567 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-51225672016-12-08 BLAS3 optimization for the Godson-3B1500 Zhang, Ming Gu, Naijie Ren, Kaixin Springerplus Research This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance. Springer International Publishing 2016-11-25 /pmc/articles/PMC5122567/ /pubmed/27933269 http://dx.doi.org/10.1186/s40064-016-3690-3 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Research Zhang, Ming Gu, Naijie Ren, Kaixin BLAS3 optimization for the Godson-3B1500 |
title | BLAS3 optimization for the Godson-3B1500 |
title_full | BLAS3 optimization for the Godson-3B1500 |
title_fullStr | BLAS3 optimization for the Godson-3B1500 |
title_full_unstemmed | BLAS3 optimization for the Godson-3B1500 |
title_short | BLAS3 optimization for the Godson-3B1500 |
title_sort | blas3 optimization for the godson-3b1500 |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122567/ https://www.ncbi.nlm.nih.gov/pubmed/27933269 http://dx.doi.org/10.1186/s40064-016-3690-3 |
work_keys_str_mv | AT zhangming blas3optimizationforthegodson3b1500 AT gunaijie blas3optimizationforthegodson3b1500 AT renkaixin blas3optimizationforthegodson3b1500 |