Abstract

Over the last decade processor speed has increased dramatically, whereas the speed of the memory subsystem improved at a modest rate. Due to the increase in the cache miss latency (in terms of the processor cycle), processors stall on cache misses for a significant portion of its execution time. Multithreaded processors has been proposed in the literature to reduce the processor stall time due to cache misses. Although multithreading improves processor utilization, it may also increase cache miss rates, because in a multithreaded processor multiple threads share the same cache, which effectively reduces the cache size available to each individual thread. Increased processor utilization and the increase in the cache miss rate demands higher memory bandwidth. A novel compiler optimization method has been presented in this paper that improves data locality for each of the threads and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit high level of intra‐thread and inter‐thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.