Abstract

The Internet of Things (IoT) is spreading much faster than the speed at which the supporting technology is maturing. Today, there are tens of wireless technologies competing for IoT and a myriad of IoT devices with disparate capabilities and constraints. Moreover, each of many verticals employing IoT networks dictates distinctive and differential network qualities. In this work, we present a context-aware framework that jointly optimises the connectivity and computational speed of the IoT network to deliver the qualities required by each vertical. Based on a smart port application, we identify energy efficiency, security, and response time as essential quality features and consider a wireless realisation of IoT connectivity using short range and long-range technologies. We propose a reinforcement learning technique and demonstrate significant reduction in energy consumption while meeting the quality requirements of all related applications.

1. Introduction

The Internet of Things (IoT) is today’s buzzword, often coupled with Big Data and Artificial Intelligence (AI). However, there is a lot of ambiguity of what is meant by that and scepticism about the actual value generated by the IoT. IoT devices have become pervasive but cover a broad range of technologies and standards. Wireless technology is key to connect these devices through gateways or aggregation points; but, similarly, a wide range of wireless protocols and standards are available and competing [1]. Once these devices are connected, they start reporting the sensed or measured data to the platform. Again, multiple choices are possible in this aspect with different strengths and weaknesses. Reporting raw data to the cloud is very costly as every bit gets charged and may also exhaust the battery of the device; this results in massive data. On the other hand, running scripts locally in the device and reporting the resulting events to the cloud reduce the cloud service cost but limits the visibility to the actual data; this still results in big data. Moreover, local scripts result is real-time actions and do not expose the privacy of the data, whereas cloud computing incurs latency due to the transmission network and requires stringent security measures to protect the data.

An environment, which is rich in IoT devices that are connected to a platform, qualifies as digitised, and often as intelligent. Analytics, which uses AI, is the added layer that transforms such an environment into a smart one. The default application of AI is to draw actionable insights from the data in order to generate value to the given vertical. In this work, we argue that IoT solutions should not be addressed through a layered perspective but, instead, a holistic optimisation approach is needed to generate the desired added value efficiently. In such a holistic approach, AI, among other machine learning tools, is employed in every stage of the solution including connectivity, storage, computing, and analytics.

Since there are many use-cases of the IoT paradigm [2], it should be approached from a given vertical perspective, e.g., smart health, smart cities, smart manufacturing (Industry 4.0), smart transport, etc. Each of these verticals comprises multiple IoT-based applications with various requirements. In [3], for example, signalling measurements and modelling are performed for both static and vehicular machine-to-machine (M2M) applications, as both have different signalling overhead characteristics. As another example, remote monitoring in smart cities requires full compliance with privacy regulations, whereas security-related applications rank response time highest among all key performance indicators (KPI).

In this article, we adopt the smart port use-case to demonstrate the context-aware smart connectivity, since it includes various types of applications and has a determined need for monetisation (as opposed to smart cities that are primary developed for the well-being and productivity of the society). According to figures from the World Trade Organization, 80% of worldwide freight is transported through ports (https://www.wto.org/). The smart port concept entails the use of technologies to transform the different public services at ports into interactive systems with the purpose of meeting the needs of port users with a greater level of efficiency, transparency, and value. European smart port initiatives include the following among many others:(i)The port of Rotterdam where IoT-sensors are used to generate a digital twin and enable augmented intelligence.(ii)The port of Hamburg which exploits 5G networks to enable virtual reality for vital infrastructure monitoring.(iii)The port of Antwerp employs blockchain technology to enable a secure transfer of rights to be exchanged between often competing parties.(iv)The port of Seville through the Tecnoport 2025 project uses mobile network technology for traffic and goods tracking on port and their logistical transfer on land.

Smart ports present a particular challenge due to the necessity of information exchange among competing stakeholders including port authorities, port operators, terminal operators, logistics companies, shipping companies, etc. It is then likely that multiple IoT networks would coexist and would consist of partly private and partly public or shared infrastructure. As described in [4], there are various communication standards, with different strengths and weaknesses, which may be used for connecting IoT networks in the context of smart ports. Mobile IoT, i.e., connectivity over licensed mobile wireless networks, is often the preferred solution for handling private data, since it is reliable, end-to-end secure (owing to the eSIM card), scalable, ubiquitous, and mature. Two main technologies have been introduced by mobile networks to connect IoT devices: eMTC and NB-IoT [5]. Both of these technologies are compatible with LTE (state-of-the-art commercial mobile network technology) which means that a software update suffices to deploy the IoT options. The former is geared towards higher rates ( Mbps) and supports VoIP (Voice over IP based on ITU H.323 protocol (https://www.itu.int/rec/T-REC-H.323/e)) and flexile mobility. The latter is designed for low data rates ( kbps) and long range ( km) but with limited mobility. The NB-IoT technology consists of restricting the energy of an LTE normal carrier in a narrow band, hence allowing a maximum coupling loss that is dB higher ( dB) than LTE [6]. Mobile IoT is a public service enabled by telecom carriers and may be used by any party who subscribes to it. Other long-range and low-power solutions, such as LoRa(https://www.lora-alliance.org/) and Sigfox(https://www.sigfox.com/en), are unlicensed and can reach similar coverage and data rates as NB-IoT and eMTC. These may be privately owned but require the usage of a gateway to connect to the Internet and are often considered less secure. Many short range unlicensed wireless connectivity solutions are available, such as WiFi (IEEE ), Bluetooth, ZigBee, etc., as described in [7], and may be shared, public, or private.

In the presence of multiple wireless technologies, disparate IoT applications, competing parties, and a broad range of static and moving IoT devices with multiple connectivity options, it is of key importance to identify the best way to collect, store, cache, and process the IoT data. What qualifies as the best way depends on the device capabilities (e.g., connectivity options, available battery); the wireless conditions; the security requirements; the processing complexity and availability; the cost of storage/caching/uploading, etc.

As the energy consumption is one of the challenges for IoT networks [8], recent works, such as [9, 10], study the trade-off between local and cloud computing in terms of device energy consumption. The former proposes an analytical framework that minimises the energy consumption by optimising the offloading decision of multiple user devices. The latter elaborates a theoretical framework for establishing trade-offs in the energy consumption and IoT infrastructure billing comprising cloud computing. Mobile wireless networks are a prime contender in the race to connect IoT networks owing to their well-established and ubiquitous coverage and secure communication based on the subscriber identity module (eSIM card). In [11], authors investigate the connectivity of NB-IoT and LoRa in terms of both area and population coverage in order to highlight the importance of the network deployments. In [12], big data analytics based user-centric smart connectivity is argued by providing corresponding research challenges.

Although data aggregation seems a promising solution to ease the signalling overhead, it is one of the causes of the transmission delay. In [13], authors discuss the trade-off between delay and signalling overhead in order to demonstrate the impacts of data aggregation. Authors in [14] analyse the joint optimisation of caching and task offloading in such networks with mobile edge computing. They present an efficient online algorithm based on Lyapunov optimisation and Gibbs sampling that succeeds in reducing computation latency while keeping the energy consumption low. In [15], a recommendation system is proposed to address the challenge of link selection in a cloud radio access network. A data-driven scheme is introduced that results in optimised classification of link strengths between remote radio heads and IoT devices.

A deep learning algorithm for edge computing is introduced in [16] to boost the learning performance in IoT networks. They also attempt to increase the amount of edge tasks by considering the edge capacity constraints. An open-source database is designed in [17] for the edge computation of Industrial IoT (IIoT) networks. The authors use a time-series analysis for predicting conditions of IIoT machines in order to decrease the amount of condition reports to be sent to the cloud. A holistic view of communication, computation, and caching is presented in [18] using graph-based representations as learning methods for innovative resource allocation techniques. The performance of the edge-caching as well as the energy efficiency and delivery time is investigated in [19] with quality of service (QoS) constraints.

In this work, we employ machine learning techniques, based on reinforcement learning, in order to manage multiple optimisation objectives jointly and to dynamically identify the best connection and route for each device. We identify four key quality features that dominate IoT applications in general and smart ports in particular: security, energy, latency, and cost. This work is the first to address these multiple IoT optimisation objectives jointly using reinforcement learning. We compare our novel approach to the state-of-the-art connectivity solutions and demonstrate significant gains in all aspects (ranging from % to %). Moreover, our approach is the only one that is able to meet the context-aware requirements fully, while minimising the cost and the energy consumption. The advantage of the machine learning scheme adopted is primarily its low complexity and its ability to optimise in a dynamic environment such as a smart port.

The rest of the paper is organised as follows. In Section 3 we define the system model of our research. In Section 4, we present our novel machine-learning-based solution for solving the multiobjective problem. Section 5 elaborates the results and analysis, and in Section 6 we conclude the article.

3. System Model

The energy-aware smart connectivity novel approach proposed in this work applies to any IoT network with diverse options of connectivity and processing. For the sake of clarity in the presentation, we build the system model around a smart port scenario such as the one shown in Figure 1. All IoT devices are battery operated and have different battery lives. They all have some processing power to perform basic tasks and can either offload the task to the gateway (or fog), i.e., the WiFi access point or to the evolved node B (eNB or cloud).

Differently from the state-of-the-art research, we propose to decide simultaneously on the best connectivity and the best location for processing the tasks by jointly optimising energy, response time, security, and cost. A two-stage approach, which describes the decision and optimisation processes, is presented in Figure 2. It is assumed that every IoT device is controlled by a given application and they jointly determine the context-aware constraints. Each combination of connectivity option and processing location offers specific characteristics and limitations. Stage 1 consists of optimising these decisions based on the context-aware constraints, while Stage 2 refines the trade-off between energy consumption and cost. In the following paragraphs, we describe the models adopted to capture the propagation loss, energy consumption, and response time for the proposed system. Table 1 lists all the parameters that are pertinent to our simulations.

3.1. Propagation Model

There are three wireless connections that require modelling: (a) Device-to-Gateway (WiFi), (b) Device-to-eNB (NB-IoT), and (c) Gateway-to-eNB (LTE). Connections (a) and (c) are often interference limited, as the employed spectrum is likely to be shared by other neighbouring connections. Connections of type (b) are, however, considered to be noise limited, as we assume that there are no other eNB in the surrounding employing NB-IoT technology. The objective of the propagation modelling is to determine the transmission power required to cater for each of the wireless connection types. Accordingly, the energy consumption will be calculated. We start with the propagation loss which is modelled as a function of two technology-specific parameters, the propagation constant and the propagation exponent , and the distance of the wireless hop measured in , as shown below:Moreover, the probability of having line of sight between the device and the gateway is much higher than in the case of the other types of wireless connections; hence the propagation loss per decade is less [20]. On the other hand, NB-IoT connections suffer the same propagation loss per decade as LTE links, however, are successfully received with dB less power (threshold receiver sensitivity is dBm). For all types of links, the received power at a distance from the transmitting device can be expressed as in mWatt. Next, we calculate the required received power (in mWatt) in order to achieve the target data transmission in bits:where is the time period, is the channel bandwidth, and is the cumulative interference power on the given channel during time period . Please note that is null for wireless connections of type (b). Using (2) and solving for , we get

3.2. Energy Consumption Model

There are two major processes that consume energy in an IoT network: wireless transmission and task computation. The energy consumption of the former is and the latter is ; thus the total energy consumption is the sum of both. Depending on the route of communication taken by the device, the energy consumed due to transmission power can be a result of either one hop using NB-IoT () or two hops using WiFi for the first link and LTE for the second (). The energy consumed for processing the task is a function of the data rate requirement of device , , and the computational power of the processor, (see Table 1), and is expressed as .

3.3. Response Time Model

The response time perceived by the IoT device is the combination of the uplink and downlink delays between the IoT device and the server. In this work, the uplink delay is modelled, while the downlink delay is assumed the same for all devices.

The uplink delay is caused by two phenomena: task processing (processing delay, ) and data transmission (transmission delay, ). The processing delay depends on the processor’s computational power, which is measured in the number of computational cycle per data element (); i.e., the higher , the less computational power. Naturally, a server has higher computational power than a small gateway and much higher than a simple IoT device (). Thus, in this work, is modelled based on the computational powers of the processing locations: . In addition, while the input to the task processing stage is large raw data, the output is compressed data with comparably less volume. To that end, the compression rate between the input and output data volumes is given as ; , where and are the volumes of raw and processed (compressed) data, respectively.

The transmission delay is affected by the type of radio access technology and the volume of data to be transmitted. Since WiFi access employs the unlicensed frequency bands, it often suffers from higher retransmission rates, which results in increased transmission delays, due to frequent collisions. Therefore, in this work, this effect is captured by the factor whereby the delay incurred for transmitting the same volume of data over WiFi is times higher than that over LTE or NB-IoT; . This model is represented in Figure 3, in which the source could be either the IoT device or the gateway, and the recipient could be either the gateway or the cloud.

Consequently, the overall response time for each action is calculated for and as follows:where is the number of hops and . Besides, and represent the values of and for the hop, respectively. Then, the calculated values populate Table 2 after the application of feature scaling into the range of 0, 1] using the function given aswhere is the set of . Note that both (a) and (b) type connections constitute the first hop, while the connection type (c) is the second hop.

4. Machine Learning-Based Solution

In this work, we propose to employ reinforcement learning (RL), a machine learning technique based on a goal-seeking approach. It is a trial and error approach in which the agent (or learning device) learns to take the correct action by interacting with its surroundings and being rewarded or penalised in each iteration. RL is selected in this work due to its great applicability to the presented problem. For example, IoT devices need to interact with its environment in order to assess the circumstances and to take subsequent actions, which is determination of the connection type and the data processing location. Therefore, RL maps to this requirement very well, since it allows optimisation with environmental interactions.

Being one of the most prominent reinforcement learning techniques, -learning aims to find the optimum policy for a given problem, that is, the best action to take at any given state. To do this, the agent takes an action and evaluates the subsequent reward/cost of taking that action given that it was in a certain state. This reward/cost is then used to update a look-up-table known as the -table, which is later utilised by the agent to select the best action. Further, the agent calculates the -value for every possible state/action pair. Therefore, a simple implementation can result in the agent learning online the best actions, regardless of the policy.

Moreover, -learning offers two key features which enable an efficient solution to our problem. First, as it is a model-free learning approach [21, 22], it is (1) capable of operating in dynamically changing environments, (2) a low-complexity algorithm which does not require a lot of power, thus reducing the energy consumption of the IoT network. Second, -learning is known to converge in most cases [23], which has also been demonstrated in multiagent noncooperative environments [24], as are IoT networks.

We propose a two-stage approach to solve the energy-aware smart IoT connectivity where each of the stages employs -learning.

4.1. First Stage Learning

Stage 1 consists of learning the best combination of connectivity and processing location in view of the device and application requirements and the limitations offered by each of these options. Thus, there are five possible actions that may be taken by each device as described in Table 2. As a side note, all the variables in Table 2 are the feature scaled values (into the range of 0, 1]) calculated through (5). The tuples shown represent the limitations of each action, e.g., , where and are described in Sections 3.3 and 3.2, respectively, is the available processing capacity, and is the processing cost where as defined in Table 1. The parameter refers to the level of data security offered by the wireless technology, whereby, the value indicates eSIM protection (only provided by NB-IoT) and the absence of that. Moreover, each device may be in four different states, as shown in Table 3, depending on the context-aware constraints defined jointly by the device and application. These constraints are , , and which represent the response time, security level, and computational power requirements, respectively.

4.1.1. Penalty Function Determination

Each device will estimate the penalty function associated with each possible action it is able to take, following the system shown in Table 3, where is the difference between the available and required characteristics. The fourth penalty is , where is the fifth index of action and the parameter is the available budget.

The penalty function determination policy aims to satisfy the optimisation objective by including the elements that are desired to be minimised. As seen from Table 3, the penalty functions consist of three main elements: constant term, dissatisfaction level, and energy consumption. The constant value is the cost of being in the states and it decreases while the level of state increases. This element compels the agent try to achieve the highest possible level of states, as it is one of the objectives of the optimisation problem. The element of dissatisfaction level, as a supportive of the constant value, incurs cost for not satisfying the device requirements in order to improve the satisfaction levels. Lastly, the energy consumption element provides minimisation in the end-to-end energy consumption (connection and data processing). The parameter is the battery level, where represents an empty battery and represents the full charge. In the expressions in Table 3, the parameter specifies the priority level of the energy consumption. For instance, low values of prioritise the energy consumption once the battery level, , is very low (e.g., 5%), while high values prioritise the energy consumption even when the battery level is high (e.g., 50%).

In addition to all these, normally, the algorithm tends to select an option with a cloud processing, as it is the most energy efficient one. However, some amount of data will not be offloaded due to budget constraints, and will then be processed locally, which is the most energy consuming option. Note that this amount is evaluated by the second stage learning. Thus, the selected option by the first stage would be more energy consuming than the fog processing-included option, as the processing will be the combination of the cloud and device. Therefore, the last parts of the penalty functions (inside the square brackets) prevent the algorithm from making blind decisions, which ignores the budget availability, by including an average energy consumption of the actions with the device processing. The reason of taking the average value is that the final action is yet to be taken during the learning process. The coefficients of these three elements are determined empirically. However, they can be used to prioritise any element that is desired to be minimised more.

The -table entries are then updated according to the following expression, where , , , and are the current state, next state, penalty function, and action under evaluation:

4.2. Second Stage Learning

The second stage aims to find the best policy for task offloading by considering the budget and availability of the fog or cloud. To this end, the second stage is activated only when the action taken in Stage 1 does not result in local processing (i.e., and ). In Stage 2, -learning is also employed with 21 possible actions , and the constraints are the available budget and the availability of the fog and/or cloud. The resulting states and penalty functions for this stage are listed in Table 4.

4.2.1. Penalty Function Determination

The penalty function of this stage is determined with a similar procedure to the first stage; hence, there are three cost elements: constant term, energy consumption, and monetary cost. Similar to the first stage, the constant value ensures ending up with the highest possible level of state. Having the energy consumption and monetary cost elements simultaneously provides finding the best trade-off between the two. However, unlike the first stage, these elements are calculated for a piece of data that is planned to be transferred, as specifying the best amount is the objective of this stage learning. Similarly, the coefficients are obtained empirically.

The interaction between Stage 1 and Stage 2 in the learning process is depicted in Algorithms 1 and 2, respectively.

Data: Context-aware constraints, available computational capacity in gateway and eNB, budget
Result: Combination of connectivity route and processing venue
1initialization;
2 for  all IoT devices  do
3Determine the current state using Table 3;
4Evaluate all the actions;
5Calculate the penalty using Table 3;
6Select the best action;
7Jump to the next state;
8Update the -table;
9  if  the selected action includes fog(gateway) or
cloud (eNB) processing  then
10go to Algorithm 2
11  end
12 end
Data: Action selected by the first stage, available computational capacity in gateway and eNB, budget
Result: Share of data to be offloaded
13 initialization;
14 for  all IoT devices  do
15Determine the current state using Table 4;
16Evaluate all the actions;
17Calculate the penalty using Table 4;
18Select the best action;
19Jump to the next state;
20Update the -table;
21 end

5. Results and Analysis

In this section, we implement the proposed reinforcement learning approach in a simulation environment, as shown in Figure 4, using the parameter values defined in Table 1. We consider that half of the IoT devices connect with NB-IoT in view of the data privacy and related security requirements; these represent Group A. The remaining devices connect to the eNB through the WiFi gateway, hence over two wireless hops, and represent Group B. Consequently, there are six possible fixed scenarios that may be formed by selecting the processing location of each group of devices; these are listed in Table 6. A total of iterations is conducted and, in each, random battery levels are allocated to each of the devices.

We compare the results obtained with our method to the six listed scenarios in terms of five different parameters: energy, cost, dissatisfaction, number of out of budget devices, and joint penalty. First, energy represents the end-to-end energy consumption caused from both connection and data processing. Second, cost is the overall monetary cost incurred by the use of the data processing locations, such as fog and cloud. Third, dissatisfaction is a measure of the total number of device requirements that are not satisfied. Fourth, number of out of budget devices reflects the count of devices that exceed their available monetary budgets during performing their tasks. Finally, the joint penalty indicates the cumulative combination of previous four parameters (energy, cost, dissatisfaction, and number of out of budget devices).

The results in terms of gain (positive values) and loss (negative values) are shown in Figure 5. Note that the values for parameters energy, cost, dissatisfaction, and joint penalty are obtained as follows:where and are the values from Table 5 for Scenarios A-F and -learning, respectively.

On the other hand, the gain/loss values for the parameter of number of out of budget devices in Figure 5 is calculated using the function given as

It is worth noting that the results provided in Figure 5 are evaluated using the average values given in Table 5 along with 95% confidence intervals. Moreover, the joint cost parameter in Table 5 is calculated by summing them. However, before the summation, other four parameters (energy consumption, cost, dissatisfaction, and number of out of budget devices) are feature scaled into the range of 0, 1] using the function in (5) in order to keep their impacts in the same scale.

Our method outperforms any fixed combination when examining the joint or holistic gain, with values ranging from to . Similarly, the reinforcement learning technique results in better matching between the context-aware constraint and the availability of the IoT network compare to any other scenario, with gains varying from to . Although the processing cost of our proposed method is higher than that of Scenario A, the resulting gain in energy saving is even more important as well as the context-aware constraint compliance. The closest contender to reinforcement learning, with respect to the generated results, is Scenario C, in which the processing of Group A IoT devices is locally conducted while that of Group B occurs in the gateway. Nonetheless, the reinforcement learning allows for a device-driven context-aware connectivity that improves the compliance criteria by more than two times while saving of energy, resulting in a holistic gain of . Scenario D manages to reduce the energy consumption more than our proposed approach at the same total cost; however, of the devices are out of budget resulting in incomplete or interrupted computational tasks. Moreover, in this scenario, connected devices are more than two times more likely to be dissatisfied with one or more of the context-aware requirements.

Next, we examine the impact of the battery priority factor, , on the energy efficiency. As shown in Figure 6, low values of result in almost neglecting the battery life of the device in the optimisation process until it drops below %. Very high values of prioritise the reduction of energy consumption for all devices except those that have higher than % battery life. To this end, it is possible to tune this parameter depending on the scenario at hand and in a device-specific manner. For instance, some devices may be part of a moving vehicle with the possibility of agile and low cost battery replenishment. Such devices may benefit from low settings of to allow more flexibility in meeting the remaining constraints. Other devices may be in hard-to-reach places and would require skilled force, special equipment, and hence high cost to replace the dead battery. In this case, higher settings of are more suitable and would result in better cost to quality ratio.

The simulation results achieved in this work are very promising, as they indicate a large margin for improvement that is not possible in fixed connection schemes. The proposed reinforcement learning method relies on centralised intelligence, which has access to all the constraints and requirements of all devices, gateways, and connections. Hence, the -learning-based method selects the best action (connection type/processing location pair in the first stage, and amount of data to be transmitted in the second stage) after the convergence. We appreciate that such a deployment is not realistic and propose to explore the feasibility and corresponding gains of multiagent and distributed reinforcement learning, as adopted in [24], in our future work. Nonetheless, this work is undoubtedly the first to highlight the importance of context-aware connectivity in the IoT context that addresses jointly security, energy, and computational power as well as cost. We present a new application, Smart Ports, and quantify the potential margin for improvement by employing the novel scheme and highlight its effects on the application.

6. Conclusion

In this work, we have presented novel approach for energy-aware and context-aware IoT connectivity that jointly optimises the energy, security, computational power, and response time of the connection. The proposed scheme employs reinforcement learning and manages to achieve a holistic gain of up to % compared to deterministic routes. Although some deterministic scenarios may result in lower computational cost or lower energy consumption, none is able to meet the holistic context-aware performance target. In addition, we presented an analysis of the impact of the energy prioritisation factor in which we demonstrated the importance of tuning this parameter in a device-centric manner in order to achieve better optimisation of the whole system.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partly funded by EPSRC Global Challenges Research Fund—the DARE Project—EP/P028764/1. The first author was supported by the Republic of Turkey Ministry of National Education (MoNE-1416/YLSY).