Scientific Programming

Research Article

Multi-GPU Support on Single Node Using Directive-Based Programming Model

The worker algorithm for multi-GPU programming in OpenACC.

(1) function WORKER_ROUTINE
(2) Create the context for the associated GPU
(3) pthread_mutex_lock(⋯)
(4) context_created++;
(5) while do
(6) pthread_cond_wait(⋯) ⊳wait until all threads created their contexts
(7) end while
(8) pthread_mutex_unlock(⋯)
(9) if then
(10) pthread_cond_broadcast(⋯)
(11) end if
(12) Enable peer access among all devices
(13) while (1) do
(14)
(15) while do
(16)
(17) if then
(18)
(19) Synchronize the GPU context ⊳the context is blocked until the device has
completed all preceding requested tasks
(20) pthread_exit(NULL)
(21) end if
(22) end while
(23) cur_task = cur_thread queue_head; ⊳fetch the task from the queue head
(24) cur_thread queue_size−−;
(25) if then
(26) cur_thread queue_head = NULL;
(27) cur_thread queue_tail = NULL;
(28) else
(29) cur_thread queue_head = cur_task next;
(30) end if
(31) pthread_mutex_unlock(&cur_thread queue_lock);
(32) cur_task routine((void)cur_task args); ⊳execute the task
(33) end while
(34) end function