Figure 3: Submatrix multiplication with 16 threads.