Abstract
The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel modelcentered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI) and OpenMP parallelization tools have been chosen for implementation.
1. Background and Introduction
Teaching of scientific and parallel computing, and advanced programming are under permanent attention of scientists and educators. Different approaches, models, and solutions for teaching, assessments, and evaluation are proposed. The constructivist model for advanced programming education is one out of the presented approaches [1, 2]. In the model, a learner constructs the relevant knowledge, experiments with the provided environment, observes the results, and draws conclusions. The constructivist approach incorporates the modelcentered learning as well as comparative programming teaching methods. Experimenting with the provided model, “turning” the model and observing it from different sides, comparing different solutions, and analysing and drawing conclusions, the learner improves the relevant knowledge and competences.
Modern technologies widely involve parallel computing, and plenty of scientific and industrial applications use parallel programming techniques. Teaching of parallel computing is one of the most important and challenging topics in the scientific computing and advanced programming education. In this research, we present a methodology for the introduction to scientific and parallel computing. This methodology is based on the constructivist technology and uses a modelcentered approach and learning by comparison.
The paper proposes a set of programming models, based on stochastic simulations of the provided model of a multiphase queueing system. The multiphase queueing system is selected due to the simplicity of the primary definitions and wide possibilities for parallelization. We implement different parallelization techniques for programming the model. That allows us to carry out a series of experiments with different programming models, compare the results, and investigate the effectiveness of parallelization and different parallelization methods. Such parallelization methods include shared memory, distributed memory, and hybrid parallelization and are implemented by MPI and OpenMP APIs. Figure 1 presents the modelcentered approach to the introduction into scientific computing and parallel programming.
The paper continues the earlier authors’ research, presented in [3]. A possible application scope of this research could be the second level course in scientific computing and programming with an emphasis on pipeline and parallel computing programming and modelling.
Scientific Computing in Science and Engineering Education. Scientific computing plays an important role in science and engineering education. World leading universities and organizations pay an increasing attention to the curriculum and educational methods. Allen et al. report on a new graduate course in scientific computing that was taught at the Louisiana State University [4]. “The course was designed to provide students with a broad and practical introduction to scientific computing which would provide them with the basic skills and experience to very quickly get involved in research projects involving modern cyber infrastructure and complex real world scientific problems.” Shadwick stresses the importance of more comprehensive teaching of scientific computing [5]: “…computational methods should ideally be viewed as a mathematical tool as important as calculus, and receive similar weight in curriculum.”
The Scope of Scientific Computing Education. One of the tasks in the scientific computing education is to provide a general understanding of solving scientific problems. Heath writes, “…try to convey a general understanding of the techniques available for solving problems in each major category, including proper problem formulation and interpretation of results…” [6]. He offers a wide curriculum to be studied including a system of linear equations, eigenvalue problems, nonlinear equation, optimization, interpolation, numerical integration and differentiation, partial differential equations, fast Fourier transform, random numbers, and stochastic simulation. All these topics require a large amount of computations and could require parallelization solutions to be solved. Karniadakis and Kirby II define, “scientific computing is the heart of simulation science” [7]. “With the rapid and simultaneous advances in software and computer technology, especially commodity computing, the socalled supercomputing, every scientist and engineer will have on her desk an advanced simulation kit of tools consisting of a software library and multiprocessor computers that will make analysis, product development, and design more optimal and costeffective.” The authors suggest the integration of teaching of MPI tools to the educational process. A large number of MPI implementations are currently available, each of which emphasizes different aspects of highperformance computing or is intended to solve a specific research problem. Other implementations deal with a grid, distributed, or cluster computing, solving more general research problems, but such applications are beyond the scope of this paper. Heroux et al. describe the scientific computing as “…a broad discipline focused on using computers as tools for scientific discovery” [8]. The authors claim, “The impact of parallel processing on scientific computing varies greatly across disciplines, but we can strongly argue that it plays a vital role in most problem domains and has become essential in many.”
Teaching of Parallel Computing. NSF/IEEETCPP Curriculum Initiative on parallel and distributed computing (PDC), Core Topics for Undergraduates, contains a comprehensive research on the curriculum for parallel computing education [9]. The authors suggest including teaching of PDC: “In addition to enabling undergraduates to understand the fundamentals of ‘von Neumann computing,’ we must now prepare them for the very dynamic world of parallel and distributed computing.” Zarza et al. report, “highperformance computing (HPC) has turned into an important tool for modern societies, becoming the engine of an increasing number of applications and services. Along these years, the use of powerful computers has become widespread throughout many engineering disciplines. As a result, the study of parallel computer architectures is now one of the essential aspects of the academic formation of students in Computational Science, particularly in postgraduate programs” [10]. The authors notice significant gaps between theoretical concepts and practical experience: “In particular, postgraduate HPC courses often present significant gaps between theoretical concepts and practical experience.” Wilkinson et al. offer, “… an approach for teaching parallel and distributed computing (PDC) at the undergraduate level using computational patterns. The goal is to promote higherlevel structured design for parallel programming and make parallel programming easier and more scalable” [11]. Iparraguirre et al. share their experience in a practical course of PDC for Argentina engineering students [12]. One of the suggestions is that “Shared memory practices are easier to understand and should be taught first.”
Constructivist and ModelCentered Learning. R. N. Caine and G. Caine in their research [13] propose the main principles of constructivist learning. One of the most important principles is as follows: “The brain processes parts and wholes simultaneously.” Under this approach, a wellorganized learning process should provide details as well as underlying ideas. In his research [1], BenAri develops a constructivist methodology for computerscience education. Wulf [2] reviews “the application of constructivist pedagogical approaches to teaching computer programming in high school and undergraduate courses.” The modelcentered approach could enhance constructivist learning proposing the tool for studying and experimentation. Using modelcentered learning, we first present the goal of the research after providing a model for simulation experiments. That allows us to analyze the results and to draw the relevant conclusions. Gibbons introduced modelcentered instruction in 2001 [14]. The main principles are as follows:(i)learner’s experience is obtained by interacting with models;(ii)learner solves scientific and engineering problems, using the simulation on models;(iii)problems are presented in a constructed sequence;(iv)specific instructional goals are specified;(v)all the necessary information within a solution environment is provided.
Xue et al. [15] introduce “teaching reform ideas in the ‘scientific computing’ education by means of modeling and simulation.” They suggest, “…the use of the modeling and simulation to deal with the actual problem of programming, simulating, data analyzing….”
2. Parallel Computing for Multiphase Queueing Systems
2.1. Multiphase Queueing Systems and Stochastic Simulations
A general multiphase queueing system consists of a number of servicing phases that provide service for arriving customers. The arriving customers move through the phases stepbystep from entrance to exit. If the servicing phase is busy with servicing the previous customer, the current customer waits in the queue in front of the servicing phase. The extended Kendall classification of queueing systems uses 6 symbols: , where is the distribution of intervals between arrivals, is the distribution of service duration, is the number of servers, is the queueing discipline (omitted for FIFO, first in first out), is the system capacity (omitted for unlimited queues), and is the number of possible customers (omitted for open systems) [16, 17]. For example, M/M/1 queue represents the model having a single server, where arrivals are determined by a Poisson process, service times have an exponential distribution, and population of customers is unlimited. The interarrival and servicing times both are independent random variables. We are interested in the sojourn time of the customer in the system and its distribution. The general schema of the multiphase queueing system is presented in Figure 2.
We consider both interarrival and servicing times as exponentially distributed random variables. Depending on the parameters of the exponential distributions, we distinguish different traffic conditions for the observed queueing system. Those include ordinary traffic, critical traffic, or heavy traffic conditions. We are interested to investigate a distribution of the sojourn time for these different cases [18–20] and we will use MonteCarlo simulations for collecting the relevant data.
2.2. Recurrent Equation for Calculating the Sojourn Time
In order to design a modelling algorithm of the previously described queueing system, some additional mathematical constructions should be introduced. Our aim is to calculate and investigate the distribution of the sojourn time of the number customer in the multiphase queueing system of phases. We can prove the next recurrent equation [19]: let us denote by the time of arrival of the th customer; let us denote by the service time of the th customer at the th phase; ; ; . The following recurrence equation for calculation of the sojourn time of the th customer at the th phase is valid:
Proposition 1. The recurrence equation to calculate the sojourn time of a customer in a multiphase queuing system.
Proof. It is true that if the time , the waiting time in the th phase of the th customer is 0. In the case , the waiting time in the th phase of the th customer is and . Taking into account the above two cases, we finally have the proposition results.
2.3. Theoretical Background: Parallel Computing
In this research, we emphasize a multiple instruction, multiple data (MIMD) parallel architecture and presume the high performance computer cluster (HPC) as a target platform for calculations. Such a platform allows us to study different parallelization techniques and implement shared memory, distributed memory, and hybrid memory solutions. The main goal of parallelization is to reduce the program execution time by using the multiprocessor cluster architecture. It also enables us to increase the number of MonteCarlo simulation trials during the statistical simulation of the queueing system and to achieve more accurate results in the experimental construction of the sojourn time distribution. To implement parallelization, we use OpenMP tools for the shared memory model, MPI tools for the distributed memory model, and the hybrid technique for the hybrid memory model.
For the shared memory decomposition, we use tasks and the dynamic decomposition technique in the case of the pipeline (transversal) decomposition and we use the loop decomposition technique in the case of threads (longitudinal) decomposition. For the distributed memory decomposition, we use the standard message parsing MPI tools. For the hybrid decomposition, we use the shared memory (loop decomposition) for the longitudinal decomposition and MPI for the pipeline decomposition.
2.4. Parallelization for Multiphase Queueing Systems
Statistical sampling for modelling the sojourn time distribution (Figure 3) presents the general schema of the imitational experiment on the queueing system.
The programming model of the multiphase queueing system is based on the recurrent equation, presented in one of the upper sections. The MonteCarlo simulation method is used to obtain the statistical sampling for modelling the sojourn time distribution on the exit of the system.
To introduce the topics as decomposition and granularity we present a space model of the simulation process of the queueing system. First, we start from a onedimensional model. The onedimensional model allows introducing a sequential programming model with no parallelism and could serve as a basic model for further improvements. Afterwards, we proceed with a twodimensional model. Such a model allows introducing the programming models for the longitudinal decomposition. As the last step, we introduce a threedimensional space model. Such a model allows constructing the programming models for the transversal and hybrid decompositions.
2.5. Stochastic Simulations and Longitudinal Decomposition
One of the solutions is to use the longitudinal decomposition (trials) and to parallelize the MonteCarlo trials. Thus we can map each or a group of trials depending on the preferred granularity and the total number of desired trials. The schema of the longitudinal decomposition is presented in Figure 4. A threedimensional model of the longitudinal decomposition is presented in Figure 5.
2.6. Pipelining and Transversal Decomposition
In another dimension (customers) the parallelization technique is not as straightforward as it was in the previous case of parallelization of the statistical trials dimension. There arises a difficulty as we have the pipelining structure of the algorithm. It is obvious, as the customer moves through the system from the previous to current servicing phase, and we need to have all the data from the previous stage for calculations in the current stage. We use the transversal decomposition and the number of customers in each stage depends on the preferred granularity and the total number of customers. Figure 6 presents the decomposition in the case of the customer’s dimension.
2.7. Shared Memory Implementation
The shared memory implementation is based on the OpenMP tools. The loop parallelization technique is used for the longitudinal decomposition. For the transversal decomposition, the OpenMP tasking model and dynamic scheduling are used.
2.8. Distributed Memory Implementation
The distributed memory implementation is based on MPI tools. In both cases, that is, longitudinal and transversal decompositions, the message parsing interface provides synchronization tools and there is no need for additional programming constructions.
2.9. Hybrid Models and HPC
Hybrid models provide a natural solution to computational platforms, based on highperformance computer clusters. It uses MPI tools to perform a decomposition and OpenMP tools for multithreading.
2.10. Dynamic and Static Scheduling
Pipelining requires dynamic scheduling, since there is a connection between various nodes in the pipeline. In the shared memory case we must take care of scheduling. If we use the OpenMP tasking model, the relevant approach could be twofold. First, it is possible to obtain a dynamic scheduling by monitoring a critical shared memory resource and using a task yielding construction. The other one is to use the dynamic task creation technique. In the case of MPI, synchronization is performed by the interface system, and then there is no need for additional programming constructions.
3. Sequential Programming Model
The flowchart of the sequential program model is presented in Figure 7. The algorithm uses the recurrent equation and cycles for modelling the queueing system phases, customers, and statistical trials.
The program model of the sequential approach uses the programming language C and GSL (GNU scientific library) and it is presented in Appendix A of this paper including the comments.
4. Programming Model for Distributed Memory Parallelization
4.1. Longitudinal Decomposition
The programming model for the distributed memory parallelization is based on MPI tools and it is optimal for the multicore/multimode computer architecture. All the processes receive a full copy of the programming code and the rooting is made by using the number of the process. The flowchart of the longitudinal decomposition is presented in Figure 8. The programming language C model with the comments is presented in Appendix B.
4.2. Transversal Decomposition
The flowchart for the transversal decomposition is presented in Figure 9. Processes are attached to the customer’s axis which is divided into chunks. We use a mutual message parsing technique and MPI is responsible for scheduling. Statistical trials are divided into the relevant chunks and provide the desired granularity. The programming language C model with the comments is presented in Appendix C.
5. Programming Model for Shared Memory Parallelization
5.1. Longitudinal Decomposition
In the case of the shared memory model, the OpenMP loop parallelization is the natural solution to the longitudinal decomposition. The scope of MC trials is divided into chunks and each of such chunks is attached to the relevant thread. The flowchart of the shared memory model is presented in Figure 10.
The programming model for the shared memory longitudinal decomposition uses the programming language C, GSL (GNU scientific library), and OpenMP libraries. The model and comments are presented in Appendix D of this paper.
5.2. Transversal Decomposition
One of the most comprehensive programming models is the model of the shared memory pipelining. We use the transversal decomposition to construct such a model. The model uses the OpenMP tasking technique and dynamic scheduling of tasks. The scheduler plays the central role in the model and it is responsible for creating new tasks and finishing the program, after completing all the tasks. Each task uses its own random generator which allows avoiding timeconsuming critical sections. The flowchart is presented in Figure 11. Programming model for the shared memory transversal decomposition uses C programming language, GSL (GNU scientific library), and OpenMP libraries. The model and comments are presented in Appendix E of this paper.
6. Programming Model of Hybrid Parallelization
The MPI transversal decomposition model could be transformed into a hybrid model by adding the OpenMP threads to the MC trials axis. The flowchart of the hybrid model is presented in Figure 12. The programming model with comments is presented in Appendix F.
7. Conclusions
7.1. Theoretical and Programming Models: The Basis of the ModelCentered Approach
The paper provides a number of programming models for the introduction to scientific and parallel computing. All these programming models, sequential, distributed memory, distributed memory pipelining, shared memory, shared memory pipelining, and the hybrid model, are based on statistical simulations of the theoretical model of the multiphase queueing system. After providing a theoretical background to the learner and explaining the main features of the theoretical model, we start experiments with programming models. The relevant problems could be provided to the learners. These could include the comparative investigation of the effectiveness of programming models, taking into account different computational platforms as well as different input parameters of the queueing system. For the advanced learner, the emphasis could be put on variation of the parameters of interarrival and servicing time exponential distributions, moving from the heavy traffic to the nonheavy traffic case, since that could fundamentally change the distribution of the sojourn time of the customer.
7.2. Introduction to Scientific Computing: Research Tasks and Research Methods
While investigating the theoretical model, studying the recurrent equation, and experimenting with the input parameters of the queueing system, we provide an introduction to the scientific research tasks and methods. It includes studying the distribution of the sojourn time of the customer, varying the parameters of interarrival and servicing time exponential distributions, comparing the results, analyzing the provided theoretical constructions, and studying the MonteCarlo method for statistical simulations, which is one of the basic methods in studying the topics related with probability.
7.3. Introduction to Parallel Computing: Terminology and Methodology
Studying and experimenting with the programming models, the introduction to parallel computing terminology and methodology is provided. It includes the basic concepts such as shared and distributed memory parallelization techniques, homogenous and heterogeneous computational platforms, and HPC and multicore programming and it explains scheduling, mapping, and granularity. Using MPI and OpenMP tools, we provide the introduction to OpenMP tasking, MPI programming methods, synchronization, load balancing, decomposition techniques, and other important topics of parallel computing.
7.4. Problems: Studying Effectiveness, Debugging, and Benchmarking
We could enhance the learner’s understanding by providing a set of problems such as debugging, benchmarking, and comparative studying of the effectiveness of programming models. That includes variation of different computing platforms for one of the models as well as testing different models for a definite platform. It could be done by applying a single processor, multicore, and multiprocessor machines and computer clusters. As an example, the model with efficient results achieved by a singleprocessor machine could be inefficient on the other platforms. When modifying the model, the relevant debugging tools and methods must be implemented. So the learner could proceed by modifying the model, that is, changing the distribution parameters and tuning the granularity.
7.5. Further Studies: Queueing Networks for Constructing Learning Objects
The theory of queueing systems renders wide possibilities for relevant theoretical constructions. The next obvious step could be studies of queueing networks of various types including open, closed, or mixed networks, constructing the relevant learning objects, and investigating the respective theoretical and programming models.
Appendices
A. Sequential Programming Model
See Listing 1.

B. Distributed Memory Programming Model
See Listing 2.

C. Distributed Memory Pipeline Model
See Listing 3.

D. Shared Memory Programming Model
See Listing 4.

E. Shared Memory Pipeline Programming Model
See Listing 5.

F. Hybrid Programming Model
See Listing 6.

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.