Application codes in a variety of areas are being updated for performance on the latest architectures. In this paper we examine an application, which comes from magnetic fusion for performance acceleration with a particular emphasis on methods that are applicable for many/multicore and future architectural designs. We take an important magnetic fusion particle code that already includes several levels of parallelism including hybrid MPI combined with OpenMP. We study how to include new advanced hybrid models, which extend the applicability of OpenMP tasks and exploit multi-threaded MPI support to overlap communication and computation. Experiments carried out on Cray XT4 and XT5 machines resulting in a speed-up of up to 35% of the investigated GTS particle shifter kernel show the benefits and applicability of this approach.