Cargando…

BLAS3 optimization for the Godson-3B1500

This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Ming, Gu, Naijie, Ren, Kaixin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122567/
https://www.ncbi.nlm.nih.gov/pubmed/27933269
http://dx.doi.org/10.1186/s40064-016-3690-3
_version_ 1782469602002862080
author Zhang, Ming
Gu, Naijie
Ren, Kaixin
author_facet Zhang, Ming
Gu, Naijie
Ren, Kaixin
author_sort Zhang, Ming
collection PubMed
description This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance.
format Online
Article
Text
id pubmed-5122567
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-51225672016-12-08 BLAS3 optimization for the Godson-3B1500 Zhang, Ming Gu, Naijie Ren, Kaixin Springerplus Research This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance. Springer International Publishing 2016-11-25 /pmc/articles/PMC5122567/ /pubmed/27933269 http://dx.doi.org/10.1186/s40064-016-3690-3 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Zhang, Ming
Gu, Naijie
Ren, Kaixin
BLAS3 optimization for the Godson-3B1500
title BLAS3 optimization for the Godson-3B1500
title_full BLAS3 optimization for the Godson-3B1500
title_fullStr BLAS3 optimization for the Godson-3B1500
title_full_unstemmed BLAS3 optimization for the Godson-3B1500
title_short BLAS3 optimization for the Godson-3B1500
title_sort blas3 optimization for the godson-3b1500
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122567/
https://www.ncbi.nlm.nih.gov/pubmed/27933269
http://dx.doi.org/10.1186/s40064-016-3690-3
work_keys_str_mv AT zhangming blas3optimizationforthegodson3b1500
AT gunaijie blas3optimizationforthegodson3b1500
AT renkaixin blas3optimizationforthegodson3b1500