Cargando…
Large-scale distributed linear algebra with tensor processing units
We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs’ fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9388123/ https://www.ncbi.nlm.nih.gov/pubmed/35939669 http://dx.doi.org/10.1073/pnas.2122762119 |
_version_ | 1784770154237788160 |
---|---|
author | Lewis, Adam G. M. Beall, Jackson Ganahl, Martin Hauru, Markus Mallick, Shrestha Basu Vidal, Guifre |
author_facet | Lewis, Adam G. M. Beall, Jackson Ganahl, Martin Hauru, Markus Mallick, Shrestha Basu Vidal, Guifre |
author_sort | Lewis, Adam G. M. |
collection | PubMed |
description | We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs’ fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size [Formula: see text] in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization. |
format | Online Article Text |
id | pubmed-9388123 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-93881232023-02-08 Large-scale distributed linear algebra with tensor processing units Lewis, Adam G. M. Beall, Jackson Ganahl, Martin Hauru, Markus Mallick, Shrestha Basu Vidal, Guifre Proc Natl Acad Sci U S A Physical Sciences We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs’ fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size [Formula: see text] in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization. National Academy of Sciences 2022-08-08 2022-08-16 /pmc/articles/PMC9388123/ /pubmed/35939669 http://dx.doi.org/10.1073/pnas.2122762119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Lewis, Adam G. M. Beall, Jackson Ganahl, Martin Hauru, Markus Mallick, Shrestha Basu Vidal, Guifre Large-scale distributed linear algebra with tensor processing units |
title | Large-scale distributed linear algebra with tensor processing units |
title_full | Large-scale distributed linear algebra with tensor processing units |
title_fullStr | Large-scale distributed linear algebra with tensor processing units |
title_full_unstemmed | Large-scale distributed linear algebra with tensor processing units |
title_short | Large-scale distributed linear algebra with tensor processing units |
title_sort | large-scale distributed linear algebra with tensor processing units |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9388123/ https://www.ncbi.nlm.nih.gov/pubmed/35939669 http://dx.doi.org/10.1073/pnas.2122762119 |
work_keys_str_mv | AT lewisadamgm largescaledistributedlinearalgebrawithtensorprocessingunits AT bealljackson largescaledistributedlinearalgebrawithtensorprocessingunits AT ganahlmartin largescaledistributedlinearalgebrawithtensorprocessingunits AT haurumarkus largescaledistributedlinearalgebrawithtensorprocessingunits AT mallickshresthabasu largescaledistributedlinearalgebrawithtensorprocessingunits AT vidalguifre largescaledistributedlinearalgebrawithtensorprocessingunits |