Research Article

Multi-GPU Support on Single Node Using Directive-Based Programming Model

Algorithm 6

The worker algorithm for multi-GPU programming in OpenACC.
(1)     function WORKER_ROUTINE
(2)        Create the context for the associated GPU
(3)        pthread_mutex_lock(⋯)
(4)        context_created++;
(5)        while    do
(6)            pthread_cond_wait(⋯)  ⊳wait until all threads created their contexts
(7)        end while
(8)        pthread_mutex_unlock(⋯)
(9)        if     then
(10)          pthread_cond_broadcast(⋯)
(11)       end if
(12)      Enable peer access among all devices
(13)      while  (1)  do
(14)          
(15)          while    do
(16)              
(17)              if    then
(18)                  
(19)                  Synchronize the GPU context  the context is blocked until the device has
   completed all preceding requested tasks
(20)                 pthread_exit(NULL)
(21)              end if
(22)         end while
(23)         cur_task = cur_thread queue_head; fetch the task from the queue head
(24)         cur_thread queue_size−−;
(25)         if    then
(26)             cur_thread queue_head = NULL;
(27)             cur_thread queue_tail = NULL;
(28)         else
(29)             cur_thread queue_head = cur_task next;
(30)         end if
(31)          pthread_mutex_unlock(&cur_thread queue_lock);
(32)         cur_task routine((void)cur_task args);    execute the task
(33)     end while
(34) end function