Research Article

Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization

Figure 4

Gigaflop per second to compute a general square matrix-matrix product, where the average was taken from three executions. Matrix dimension is in Double and in Float. () These executions use a simpler blocking scheme that shows better performance for the respective configurations (Xl-P8-OP Figure 4(f)).
(a) Gcc-I3-PC
(b) Clang-I3-PC
(c) Gcc-IX-HPC
(d) Intel-IX-HPC
(e) Gcc-P8-OP
(f) Xl-P8-OP
(g) Gcc-KNL
(h) Intel-KNL