Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2015, Article ID 316012, 14 pages
Research Article

A Performance Study of a Dual Xeon-Phi Cluster for the Forward Modelling of Gravitational Fields

1ABACUS-CINVESTAV-IPN, Apartado Postal 14-740, 07000 México City, DF, Mexico
2Centro de Desarrollo Aeroespacial del Instituto Politécnico Nacional, Belisario Domínguez 22, 06010 México City, DF, Mexico
3Escuela Superior de Física y Matemáticas, Av. Instituto Politécnico Nacional Edificio 9, Unidad Profesional Adolfo López Mateos, 07738 México City, DF, Mexico
4Instituto Mexicano del Petróleo, Eje Central Lázaro Cardenas No. 152, 07730 México City, DF, Mexico
5Department of Industrial Engineering, Campus Celaya-Salvatierra, University of Guanajuato, Mutualismo 303 Colonia Suiza, 38060 Celaya, Gto, Mexico

Received 31 December 2014; Revised 27 May 2015; Accepted 8 June 2015

Academic Editor: Enrique S. Quintana-Ortí

Copyright © 2015 Maricela Arroyo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240 GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes. This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.