Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 20 (2012), Issue 2, Pages 89-114

Manycore Performance-Portability: Kokkos Multidimensional Array Library

H. Carter Edwards,1 Daniel Sunderland,2 Vicki Porter,2 Chris Amsler,3 and Sam Mish4

1Computing Research Center, Sandia National Laboratories, Livermore, CA, USA
2Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA
3Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS, USA
4Department of Mathematics, California State University, Los Angeles, CA, USA

Copyright © 2012 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website,, August 2011].