Research Article
Multi-GPU Support on Single Node Using Directive-Based Programming Model
Algorithm 6
The worker algorithm for multi-GPU programming in OpenACC.
(1) function WORKER_ROUTINE | (2) Create the context for the associated GPU | (3) pthread_mutex_lock(⋯) | (4) context_created++; | (5) while do | (6) pthread_cond_wait(⋯) ⊳wait until all threads created their contexts | (7) end while | (8) pthread_mutex_unlock(⋯) | (9) if then | (10) pthread_cond_broadcast(⋯) | (11) end if | (12) Enable peer access among all devices | (13) while (1) do | (14) | (15) while do | (16) | (17) if then | (18) | (19) Synchronize the GPU context ⊳the context is blocked until the device has | completed all preceding requested tasks | (20) pthread_exit(NULL) | (21) end if | (22) end while | (23) cur_task = cur_thread queue_head; ⊳fetch the task from the queue head | (24) cur_thread queue_size−−; | (25) if then | (26) cur_thread queue_head = NULL; | (27) cur_thread queue_tail = NULL; | (28) else | (29) cur_thread queue_head = cur_task next; | (30) end if | (31) pthread_mutex_unlock(&cur_thread queue_lock); | (32) cur_task routine((void)cur_task args); ⊳execute the task | (33) end while | (34) end function |
|