Research Article | Open Access
A Dynamic Programming-Based Sustainable Inventory-Allocation Planning Problem with Carbon Emissions and Defective Item Disposal under a Fuzzy Random Environment
There is a growing concern that business enterprises focus primarily on their economic activities and ignore the impact of these activities on the environment and the society. This paper investigates a novel sustainable inventory-allocation planning model with carbon emissions and defective item disposal over multiple periods under a fuzzy random environment. In this paper, a carbon credit price and a carbon cap are proposed to demonstrate the effect of carbon emissions’ costs on the inventory-allocation network costs. The percentage of poor quality products from manufacturers that need to be rejected is assumed to be fuzzy random. Because of the complexity of the model, dynamic programming-based particle swarm optimization with multiple social learning structures, a DP-based GLNPSO, and a fuzzy random simulation are proposed to solve the model. A case is then given to demonstrate the efficiency and effectiveness of the proposed model and the DP-based GLNPSO algorithm. The results found that total costs across the inventory-allocation network varied with changes in the carbon cap and that carbon emissions’ reductions could be utilized to gain greater profits.
The need for environmental awareness has affected several aspects of the global economy such as supply chain management. Traditionally, supply chain network design problems have tended to be analyzed from a fixed and variable cost perspective without any consideration of the carbon footprint factor [1, 2]. However, this analysis behavior has now been forced to focus on more environmentally conscious supply chain planning optimization models in which economic aspects (profit maximization and cost minimization) are integrated with clear environmental goals such as carbon footprint reductions [3–5]. There has been an increasing research interest in sustainable supply chain network design, with most suggesting that environmental sustainability be viewed as an opportunity rather than a risk [6–8]. Recently, many companies have realized that sustainability is a bottom-line requirement and therefore can no longer be ignored. Despite all these studies, there is still an urgent requirement to develop quantitative models that address these sustainability issues.
Current global efforts to minimize environmental impacts have encouraged companies to change their practices to increase efficiency and reduce negative externalities , which has led to a higher focus on sustainable practices such as recycling and waste management [10, 11]. Shaw et al.  designed a sustainable location-allocation model that considered the consumers’ environmental behavior, which affected consumer demand for low carbon emissions products. Torabi et al.  proposed a generic model for a sustainable wine manufacturer-distribution network that encompassed economic, environmental, and social objectives. Diabat and Al-Salem  developed a nonlinear mixed integer program that minimized the cost of a stochastic inventory-allocation network that included a carbon emissions cost term to account for environmental concerns. They proposed a concept of emission cap, which means the company needs to pay for the amount of carbon emission that exceeds the carbon cap in their model. Sustainable supply chains can be achieved by developing a supply chain that either incorporates environmental concerns or incorporates reverse logistics such as recycling. The most notable international framework for minimizing greenhouse gas (GHG) emissions was the Kyoto Protocol, an international agreement ratified by the United Nations, in which emissions trading schemes and a carbon credit market were outlined so that countries who had not exceeded their nominated carbon emissions targets could sell the excess to other countries, thereby giving GHGs the status of an international commodity .
The uncertain competitive environment means that inventory-allocation management needs to be more flexible and efficient as enterprises must not only reduce their cost of storage and distribution, but also ensure the downstream supply chain retailers are not unduly affected because of out of stock items at a critical time. Without proper inventory control, a retailer’s loss can directly affect the interests across the whole supply chain; therefore, supply chain inventory and distribution management has become an important element of supply chain efficiency in the past few years . Time has also become a very important factor when managing products, especially when a number of time-periods are involved. Pal et al.  proposed a model to determine order quantities between suppliers at the initial stage and the optimal inventory levels over multiple periods for all stages in the inventory-allocation network. Radhi and Zhang  extended multiobjective nonlinear mixed integer models for multiperiod allocation planning problems that involved multiple suppliers and multiple products. In addition, because production systems are not perfect, defective products randomly appear, the production of which follows a probability distribution. Kennedy and Eberhart  considered a single-vendor, single-buyer inventory model that considered the impact of varying percentages for defective goods, storage costs, and disposal schemes. There has also been significant research interest in different aspects of imperfect production inventory models . However, as most of these studies have tended to focus on economic order quantities  or economic production quantities , defective item disposal has not been applied across the whole inventory-allocation network. Further, in most studies, the defective item rate has been assumed to be a constant , which does not accurately reflect reality. Product defect rates are characterized by both fuzzy uncertainty and randomness, or the so-called twofold uncertainty. Therefore, an inventory-allocation management dynamic programming model with a fuzzy random defect rate and fuzzy annual demand is proposed in this paper.
In recent years, the higher levels of uncertainty within inventory-allocation management have been shown to be extremely costly for manufacturers and the total supply chain [24, 25]; therefore, inventory-allocation models that can reduce or eliminate uncertainty to avoid incorrect and costly decisions are needed. As a general theoretical framework to model practical problems with unknown parameters, uncertain random programming was introduced by Lin , which was then extended to uncertain random multiobjective programming  and uncertain random multilevel programming . More recent studies on the application of fuzzy set theory to inventory-allocation problems can be found in [29, 30].
In this paper, an inventory-allocation planning model with carbon emissions and defective item disposal under a fuzzy random environment is considered, with annual demand, transportation costs, inventory conversion factors, and product defect percentages being fuzzy random variables. Of the many heuristics and metaheuristics algorithms, global best, local and near neighbor best particle swarm optimization (GLNPSO)  has been proven to be a powerful competitor in the field of nondeterministic polynomial-time- (NP-) hard problem optimization. Because of the relationships between the state equation, the constraint conditions, and the objective functions, a dynamic programming-based GLNPSO (DP-based GLNPSO) algorithm was developed [32, 33] which reduced the particle dimensions using the state equation. In this paper, a DP-based GLNPSO algorithm is developed to solve the research problem model, in which initialization and adjustment methods are developed to avoid infeasible solutions.
The main contributions of this paper are as follows. A sustainable inventory-allocation model with carbon emissions and defective item disposal is developed, for which several constraints are considered to make the model more applicable to reality. Then, a modified version of the particle swarm optimization algorithm called the DP-based GLNPSO is constructed to solve the developed model. Finally, a representative example is applied to tune the parameters of the DP-based GLNPSO. The remainder of this paper is organized as follows. The problem statement for the inventory-allocation planning model with carbon emissions and defect item disposal (IAPCEDID) under an uncertain environment is introduced in Section 2. In Section 3, the suggested model and its formulations are described. Section 4 describes the development of the DP-based GLNPSO to solve the IAPCEDID, and the efficiency of the proposed model is illustrated by a representative example in Section 5. Finally, in Section 6, the conclusions and limitations are discussed and future research directions elaborated.
2. Key Problem Statement
In the supply chain, inventory-allocation management with effective quality and carbon emissions controls is essential for an efficient manufacturer-retailer network. As retailers order products from different manufacturers at specified time-periods, there is a multiple stage problem planning horizon, with replenishment taking place at the beginning of each of these stages [34, 35]. With government regulations on carbon emissions (carbon cap), transport needs to maintain carbon emissions below a certain level. This sustainable manufacturer-retailer network is based on the allocation of carbon units in line with established carbon emissions reduction targets. At the end of each period, the emissions values of the company are verified, and each emitter must then offset its carbon emissions against the target established by the government. The discrepancy between the imposed target and the actual emissions may be offset by the company purchasing carbon units in the domestic market . Alternatively, for each ton of CO2 emissions avoided, the company receives a carbon emissions certificate that can be sold on the futures market.
In many inventory-allocation problems, all products are deemed to be of suitable quality; however, in the real world, there is a probability that some items will be defective, the percentage of which is uncertain. Items are classified as being of suitable quality or as being defective, with all defective items found during the screening process being returned to the manufacturer. For the sake of convenience, the manufacturers take back the defective items as a batch in the next shipment [28, 37, 38]. However, if there are defective items, shortages may be difficult to avoid. Therefore, a penalty cost is considered to reduce the losses because of possible shortages. Due to the uncertain constraints, manufacturers are not able to produce items at more than a specified value and must also provide the products to the retailers under all-unit and incremental quantity discount policies . From the above, a sustainable inventory-allocation planning model with carbon emissions and defective item disposal (IAPCEDID) is considered based on dynamic programming under a fuzzy random environment [40, 41]. The flow of items in the proposed supply chain network is shown in Figure 1. The proposed IAPCEDID problem can be described as follows. There are manufacturers, warehouses, and retailers. The manager purchases the required items from specific manufacturers at the beginning of each stage. On receipt from the manufacturer, items are classified as suitable or defective. Defective items are returned to the manufacturer while suitable items are transported to the corresponding warehouses and allocations made according to retailer demands.
There are the following assumptions in this study: (1) Item demand, transportation costs, and the percentage of defective items in each stage are regarded as fuzzy random variables. (2) The span of each stage is identical. (3) Shortages are allowed and a penalty cost is applied to reduce losses because of shortages. (4) The manufacturer is liable for the costs incurred for returned defective items. (5) Every item type has a corresponding warehouse with a maximum storage capacity. The items are first transported from the manufacturers to the warehouses and then from the warehouses to the retailer stores. (6) The order lead time is negligible. At the beginning of each stage, all purchased items arrive at the corresponding warehouses. (7) The retailer product demands are independent of one another and are fixed in a stage.
In this section, a dynamic programming model for the IAPCEDID that considers fuzziness and randomness is constructed.
The following notations are adopted.
Indices : stage index; . : item index; . : retailers index; . : an index for the price break points; .
Fuzzy Random Variables : the unit transport price of Item per kilometer in the th stage. : the demand for Item in the th stage at retailer . : the demand for Item in the th stage. : the conversion factor of Item in the th stage. : the fraction defective of Item in the th stage.
Decision Variables : the inventory level of Item in the warehouse at the beginning of the th stage. : the purchase quantity of Item in the th stage.
Parameters : the maximum purchase quantity of Item in the th stage. : the minimum purchase quantity of Item in the th stage. : the maximum inventory level of Item in each stage. : the initial inventory level of Item in the warehouse at the beginning of the first stage. : the terminal inventory level of Item in the warehouse at the end of the whole duration. : the unit storage cost of Item in each stage. : the function of current inventory for Item in the whole process. : the distance between manufacturer and the corresponding warehouse. : the distance between warehouses and store . : the inspection price of Item . : the return price of defective Item . : the stock out penalty price of defective Item . : the unit cost of the item from manufacturer at th price break point. : the th price break point for the item in the th stage. : total purchase budget of the retailer for the planning horizon. : fuel consumption per kilometer for transportation vehicle. : CO2 emission for unit gasoline fuel for transportation vehicle. : carbon cap over the network. : carbon credit price per ton.
3.2. Objective Functions
The objective function defines the total cost of the complete manufacturer-retailer network. The aim of the project manager is to determine the order quantity and inventory level for each item in each stage so that total manufacturer-retailer network costs are minimized. The total costs are made up of purchasing costs, transport costs, inventory costs, penalty costs, and carbon emissions costs.
In the proposed inventory-allocation model, the retailer orders products under several discount policies. In this paper, an incremental quantity discount is considered, for which the products are delivered in known packets containing a certain number of items. In the incremental quantity discount policy, the purchase cost of Item in the th stage depends on the ordered quantity. Each price discount-point is obtained byTherefore, the purchase cost under this policy isLet be the unit inventory cost of Item . However, as not all items are stored in the warehouse over the whole stage, the actual inventory cost is less than . To deal with this, an inventory conversion factor is introduced to balance the difference between the actual inventory quantity and in the th stage. is the function for the current inventory for Item across the whole manufacturer-retailer network, in which a unit of Item is one stage and ; therefore, the inventory conversion factor can be defined as follows:Let be the inspection fee for Item . Before being transported to the warehouse, each item is inspected, after which all defective items are returned to the manufacturer and a return price is requested. As the purchase quantity is , the all inspection fee should naturally be . Let be the percentage of defective Item in the th stage. Let be the total inventory price, soAs the transportation distances between the manufacturers, warehouses, and retail stores are all different, the transportation vehicles are also different, making total transportation costs difficult to determine. Let be the transportation price of Item per kilometer, be the transportation distance between the manufacturers and the corresponding warehouses, and be the transportation distance between the warehouses and the retail stores. Hence, is the transportation quantity for Item from the manufacturer to the corresponding warehouse in the th stage, and is the demand at each retail store in the th stage; therefore, is the total transportation costs for Item over the manufacturer-retailer network as follows:
A penalty cost is applied when the demand for Item cannot be met. Let be the penalty if the demand for Item cannot be met in the th stage. Let be the penalty cost for Item , which can be determined as follows:
The carbon emissions costs are the penalties/rewards in a carbon constrained scenario. These two terms represent the transport emissions from the manufacturers to the warehouses and from the warehouses to the retail stores. Let be the fuel consumption for a transportation vehicle and be the CO2 emissions from the gasoline; therefore, the vehicle’s carbon emissions per kilometer are .
Let be the carbon cap during transport and a similar carbon price () be considered for the purchase as well as the sale of carbon credits ; therefore, is the carbon emissions cost for Item over the complete manufacturer-retailer network, as follows:
As it is very difficult to deal with objective functions that have fuzzy random factors, Khan et al.  developed a method to convert fuzzy random variables in both the objective function and the constraints into fuzzy variables similar to trapezoidal fuzzy numbers. Based on the theory proposed by Heilpern , without loss of generality, the expected value operator is used to convert the uncertain model into a deterministic model, which can then be used to transform the fuzzy random objective functions and constraints into crisp equivalences.
3.3. State Equation
The state equation describes the relationship between stage th and stage th. Let be the inventory level and be the demand. If the item is deemed suitable after inspection, it is then transported to the warehouse; therefore, the inventory level of Item in the corresponding warehouse at the beginning of the th stage, , is , or is zero. The relationship between the inventory level, purchase quantity, and demand can be modeled as follows:
3.4. Initial and Terminal Conditions
The initial conditions describe the storage level for Item before the beginning stage. The terminal conditions describe the storage level for Item at the end of the manufacturer-retailer network. Let and be the initial and terminal inventory levels for Item . Generally, in a practical condition, the two conditions above can be settled as and . The initial condition and terminal condition can be presented mathematically as follows:
3.5. Constraint Conditions
If a manager decides to purchase Item in stage th, let and be the minimum purchase quantity for Item and the maximum purchase quantity for Item in stage th; the purchase quantity for Item in each stage must be within this specified range:The retailer has financial constraints. Let be the total purchase budget; therefore, should be within the budget.As maximum storage levels must be taken into consideration, the inventory level of each item in each stage cannot exceed the maximum storage level. Let be the maximum storage for Item . The storage level should satisfy the following condition:
3.6. Global Model
The IAPCEDID determines the quantity of item that needs to be purchased from the manufacturer and distributed to the retailers in stage to minimize the total expected cost function under the considered constraints and a carbon emissions cost that is added to account for the environmental considerations. The model proposed here is based on dynamic programming over a planning horizon that has multiple periods with initial and terminal conditions and state equation constraints. The objective function is made up of the purchase costs (), transportation costs (), inventory costs (), penalty costs (), and carbon costs (). As the items are classified as suitable or defective, the processes for both item inspection and defective item disposal are included. In summary, the global model is as follows:
4. Dynamic Programming-Based GLNPSO
4.1. General Mechanism of DP-Based GLNPSO
Based on the particle swarm optimization (PSO) proposed by Kennedy , the main PSO algorithm is developed based on a GLNPSO with multiple social structures . In this study, based on an iterative dynamic programming model, a DP-based GLNPSO algorithm is developed to solve the problem. The proposed DP-based GLNPSO is a variant of the GLNPSO, with the main difference being the dimensionality reduction of the variables. With appropriate model transformations, a dynamic programming-based particle swarm optimization with a multiple social learning structures (DP-based GLNPSO) algorithm is developed to solve the IAPCEDID. The goal is to search for satisfactory solutions to (14) by constantly moving the direction of the particles towards optimization. The notations needed are as follows: : iteration index, . : dimension index, . : particle index, . : inertia weight. : velocity of the th particle at the th dimension in the th iteration. : position of the th particle at the th dimension in the th iteration. : personal best position of the th particle at the th dimension. : local best position of the th particle at the th dimension. : near neighbor best positions position of the th particle at the th dimension. : global best position at the th dimension. : personal best position acceleration constant. : global best position acceleration constant. : local best position acceleration constant. : near neighbor best position acceleration constant. : vector position of the th particle, . : vector velocity of the th particle, . : vector personal best position of the th particle, . : vector local best position of the th particle, . : vector near neighbor best position of the th particle, . : vector global best position, . : the th part of the th particle in the th.
In the GLNPSO, the algorithm is initialized with a swarm of th random particles. Each particle consists of the personal best position , the global best position , the local best position , and the near neighbor best position . The local best is the best position for several adjacent particles and the near neighbor best is a social learning behavior that is determined based on the fitness-distance-ratio (FDR). Each particle is represented by its position in a space, where is the problem dimension. Unlike the GLNPSO, using the state equation in the dynamic programming model , the DP-based GLNPSO can reduce the particle dimensions, the details for which are shown in Figure 2. In this problem, the problem dimension contains decision variables and state variables , which are, respectively, related to the objectives and constraints. It should be noted that if the decision variables are known, then the state variables can be determined using the state equation.
The essential difference between the DP-based GLNPSO and the GLNPSO is that the DP-based GLNPSO takes advantage of the iterative mechanism in the dynamic programming model to reduce the dimensions of the particles, thereby significantly reducing the solution search space. It should be noted that if a GLNPSO were used in this study, the particle dimensions would be compared to for a DP-based GLNPSO particle.where can be the th part of the th particle in the th generation. Note that every part of a particle is a vector, which can be denoted aswhere is the th dimension of for the th particle in the th generation; . In order to be in line with the expression , .
4.2. Initializing Strategy
Based on the state equation from dynamic programming theory, an initialization strategy is used to initialize the particles and avoid an infeasible position.
Step 1. Set , .
Step 2. Initialize by generating a random real number within .
Step 3. Then, based on (note that , where denotes the initial inventory level of Item in the warehouse at the beginning of the first stage). If , then go to Step . Otherwise, return to Step .
Step 4. If the stopping criterion is met, that is, and , then the initialization for the th particle is completed. Otherwise, and return to Step .
4.3. Adjusting Strategy
An adjustment strategy is used to generate the particle and adjust it to the feasible region. After updating to avoid an infeasible position, the particle is adjusted as follows.
Step 1. Set , .
Step 2. If , then . If , then .
Step 3. Based on = .
Step 4. If , then go to Step . Otherwise, let ; ; return to Step .
Step 5. If , then and go to Step . Otherwise, and return to Step .
Step 6. If the stopping criterion is met, that is, and , then the adjustment for the th particle is completed. Otherwise, and return to Step .
4.4. Updating Strategy and Decoding Strategy
Throughout the DP-based GLNPSO optimization process, the social learning behavior component includes the global best, the local best, and the near neighbor best. The search benefits from the sharing of information with the whole population about the particles’ discoveries and past experiences. In each generation, the is calculated as the best position the swarm reaches; the is calculated as the best position from several adjacent particles; the is a social learning behavior which is determined based on the fitness-distance-ratio (FDR) ; and is the inertia weight used to control the impact of the previous velocities on the current velocity, which influences the trade-off between the global and the local exploration abilities during the search. The particle then updates the positions using the new velocity, after which each particle updates its velocity to approach the new , , , and :The DP-based GLNPSO decoding strategy transforms the particle into a corresponding purchase quantity for each item at the beginning of each stage. Based on the state equation; , decoding in the th dimension into the purchase quantity of item at the beginning of stage . The decoded result can be represented as .
4.5. Overall Procedure
Based on the above sections, the overall procedure for the DP-based GLNPSO algorithm can be given. The algorithm is shown in Figure 3, the details for which are as follows.
Step 1. Initialize the particle and using the initialization strategy.
Step 2. Check the constraints based on the DP-based GLNPSO, and avoid an infeasible position.
Step 3. Calculate the initial particles to generate the fitness value, , , and the .
Step 4. Update particle positions and velocities, for .
Step 4.1. Update the personal best, if , .
Step 4.2. Update the global best, if , .
Step 4.3. Update the local best, and set , which obtains the least fitness value to be .
Step 4.4. Generate the near neighbor best, and set to maximize the FDR according to (20), where is .
Step 5. Adjust the particles to the feasible region using the adjustment strategy.
Step 6. If the stopping criterion is met, go to Step ; otherwise, and return to Step .
Step 7. Determine the fitness value and global best position.
Step 8. Decode the particle and integrate and (for , ).
5. Case Study
To illustrate the performance of the proposed DP-based method and to show the effect of a carbon cap on the optimization results, the method was applied to a particular case. A sustainable logistics item structure made up of five main parts is considered, as shown in Figure 1, in which each stage is one month, and four periods, five retail stores, and four items with corresponding warehouses are considered. After the items are inspected, suitable items are transported to the warehouses and defective items returned to the corresponding manufacturers. In this case, a strategy is generated to minimize the inventory, allocation, and carbon emissions costs. The carbon emissions can be converted into the carbon credits cost price, which has the same dimensions as the economic costs .
This case has four items and five retail stores. Each retail store’s demand for each item for each month is shown in Table 1; the purchase information and item inventory information are shown in Tables 2 and 3; and the distribution information is shown in Tables 4 and 5. All fuzzy random variables are represented by triangular fuzzy numbers, with the parameters obeying a normal distribution. The fuel consumption () is 0.245 (l/km), CO2 emissions for a unit of gasoline are 2.63 (kg/l), and the carbon credit price is 189.29 (CNY/ton). These emissions’ parameters were referenced from the Environmental Data for International Cargo Transport & Road Transport .