Abstract

This paper reviews the application of intelligent vision enhancement technology in the battlefield environment and explores new research directions. This paper mainly introduces three parts. First, we introduce a solution for enhancing the battlefield situation and using head-mounted displays for soldier combat. Second, we summarize three core technologies supporting intelligent vision enhancement technology: 3D environment reconstruction, tracking and registration, and situational awareness. Third, we summarize three application directions of intelligent vision enhancement technology on the battlefield. Finally, the problems and challenges in the research work are proposed, including current issues such as accurate battlefield situational awareness by augmented reality technology, multiperson collaborative management and data flow, as well as challenges in future battlefield situation enhancement and perception. The future development trend of intelligent vision enhancement technology on the battlefield has been prospected. Two meanings of this paper are as follows: the first is to review the research status of intelligent vision enhancement technology from the technical level and identify the key technical points that may restrict development in the future; the second is to analyze the advantages and disadvantages of intelligent vision enhancement technology from the level of battlefield application and the roles of users. In addition, this paper proposes how to take the lead and take initiative in future wars.

1. Introduction

Given the intelligent and integrated needs for future battlefields, military training based solely on the firing range environment can hardly cope with the battlefield environment with diversified battlefields, complicated individual soldier behavior, and diversified combat effectiveness [13]. To accelerate the processing, integration, and visualization of complex battlefield data and improve the perception and decision-making ability of battlefields, this paper explores the development and technological innovation of intelligent vision enhancement (IVE) technology in dealing with situation awareness and decision-making assistance in complex battlefield environments from the perspective of actual soldier engagement.

In recent years, developing optical devices and improving graphics computing ability have promoted IVE technology based on augmented reality (AR) to be deeply involved in soldier combat, collaborative operation command, campaign and tactical confrontation, battlefield situation calculation, and equipment research of various military services. How to apply IVE technology to the process of actual soldier combat has become a research hotspot in the military field. In the battlefield environment of actual soldier combat, commanders and soldiers have core user needs. Although the augmented reality sand table or immersive battlefield map oriented to the command level has been widely used, how can problems such as the significant difference in users’ battlefield situational awareness and the lack of “unity” in their cognition of tactics and even battle level be solved? There is still much room for development.

Compared to traditional methods, augmented reality-based military applications have the following advantages [4, 5]: (1) AR military applications usually rely on 3D simulation engines to provide users with real battlefield scenes, with two significant features of “immersion” and “imagination,” enabling commanders to better control large-scale military tasks and develop refined combat operations, forming the perception ability of combat space, time, and force; (2) through the use of AR glasses, AR headphones, motion capture sensors, positioning devices, and other devices to achieve virtual and real battlefield information fusion, users can get a real sense of the scene, can effectively reduce the training cost of field soldiers, and can improve the real-time perception of battlefield; (3) AR military applications can effectively simulate to perform “generation difference” weapons and equipment and realize the real “red and blue” confrontation; (4) AR military applications can test the opposed effect of the equipment not in service of both the enemy and us and build the “predictive” advanced training; and (5) overlaying that do not exist in the real world into the real world to build a battlefield environment with a “high sense of realism” can effectively solve the problem of lacking “realism” in ordinary computer simulation modeling [6].

By using recognition and tracking based on a strengthened display to obtain battlefield information and weapon firing state, the battlefield environment and soldier combat can be realistically simulated, and the sense of immersion and confrontation in the virtual battlefield can be constructed to improve the sense and scene sense of participants and strengthen the battlefield adaptability [7]. After years of research, many have now entered the stage of practical use and are widely used in the fields of individual soldier training, combat command training, and campaign and tactical confrontation training of various arms [8]. When AR was first proposed in the United States, it was used in the military field. After years of development, AR technology has been developed and applied to the greatest extent. At present, the AR training systems developed by the US military are as follows: (1) Army Infantry Training System (DSTS) [9]; (2) Battlefield Augmented Reality System (BARS) [8]: BARS is an augmented reality system developed by the US military to provide individual soldiers with environmental location and collaborative information; (3) ULTRA-Vis project [10]: ULTRA-Vis is a US Defense Advanced Research Projects Agency research project, and its purpose is to develop a lightweight low-power wearable device to provide real soldiers with battlefield situational awareness and improve single-handedness and soldier combat capability; and (4) TAR system [11]: after the US military developed an AR combat assistance system called “tactical augmented reality (TAR),” it has become a reality to accurately locate the enemy and control the overall situation at 360° without dead ends.

The research and development of sensitive sensors and the innovation of augmented reality display technology provide more ideas for the development of IVE technology, which makes the methods of military training and actual combat exercises no longer limited to the traditional technical system but turns the research target to how to make full use of weapons and equipment that integrate sensor and IVE technology to enhance combat effectiveness in the battlefield environment.

This paper mainly explores the development of IVE technology and its application in the military field. The main contributions are summarized as follows: (1)Based on tactical augmented reality, this paper reviews the solutions to enhance battlefield situations and the use of head-mounted display (HMD) devices in actual soldier combat(2)Three core technologies supporting IVE technology are summarized: 3D environment reconstruction, tracking and registration, and situational awareness, including implementation methods and analysis of advantages and disadvantages(3)From the perspective of users in actual soldier combat, IVE technology is divided into three main application directions: control for commanders, situational awareness for actual soldiers, and collaborative decision-making for unmanned combat

The organizational structure of this paper is as follows: the first section introduces the research background of this paper; the second section reviews solutions to strengthen the battlefield; and the third section introduces the core technology of IVE in detail. The fourth section introduces battlefield IVE technology; the fifth section summarizes and looks forward to future work.

2. Solutions for Enhancing the Battlefield

This chapter focuses on how to enhance battlefield information and how to use HMD devices in actual soldier combat. Among them, the main method to enhance battlefield information is tactical augmented reality. This paper briefly reviews the battlefield information-enhanced display method starting from the head-up display (HUD) of fighter aircraft to the development status of portable HMDs integrated with IVE technology. HMD for actual soldier combat is introduced in this paper. The development status of HMD for actual soldier combat and the performance comparison of different optical components are introduced.

2.1. Tactical Augmented Reality

Augmented reality is committed to superimposing computer-formed objects or information into real scenes, and users can interact with virtual objects naturally to achieve the purpose of visual improvement of the physical world [12]. In 1966, Ivan Sutherland, the father of computer graphics and augmented reality, and his students jointly developed the world’s first AR system device using CRT optical transparency technology to present 3D perspective views in real time to achieve augmented reality. It became the pioneer of 3D human-computer interaction and VR technology, known as the sword of Damocles. In 1968, Furness and others first applied AR technology to the military field, using Sackland’s theory to design an AR that superimposes military information such as range and shooting targets on the pilot’s field of vision. In the early 1990s, Tom Caudell of Boeing and colleagues coined the term augmented reality about an auxiliary wiring system. In 1999, Hirokazu Kato of the Nara Institute of Advanced Science and Technology developed and realized the first AR open-source framework based on the GPL protocol: ARToolKit [13]; the framework integrates 6-DOF pose tracking and achieves recognition and detection through square fiducial and template-based methods. Bruce Thomas and others first publicly stated that ARQuake, an augmented reality game with a first-person perspective, also known as “Quake,” has a tracking system that mainly uses GPS, digital compass, and marker based (fiducial markers) in 2000 [14]. It was not until 2012 that the birth of Google Glass, Google’s AR glasses, reignited the public’s attention to AR. Then, with 2015 as a time node, many typical devices and applications emerged. Among them, the mobile-based augmented reality game “Pokémon GO” released by Nintendo and the AR device HoloLens generation released by Microsoft were represented.

The scope of AR research objects is broad, including signal processing, computer graphics and image processing, human-computer and psychology, mobile computing, computer networks, distributed computing, information acquisition and information visualization, and the design of new displays and sensors [15]. The earliest visual enhancement was the use of a head-up display that superimposed critical information (including spatial orientation data and weapon aiming) onto the pilot’s display panel or face mask for better situational awareness (see Figure 1).

The appearance of AR intelligent glasses has led to rapid development of battlefield situation perception technology. The augmented reality-based HMD can help soldiers obtain real-time battlefield information such as their equipment, target intelligence, and topographic data without obstructing their view and coordinate with friendly fighters and rear units to achieve long-range, large-scale results. Battlefield initiative reduces battlefield risks and improves ability and efficiency. Additionally, AR smart glasses can simulate complex battlefield environments and be used for military training and combat commands to assist in the formulation of combat plans and combat effectiveness evaluation [16]. United States Army Communications-Electronics Research, Development and Engineering Center (CERDEC) of US Army Research, Development and Engineering Command (RDECOM) has begun a major effort into such devices, and the center is actively studying the potential of augmented reality technology. More visual enhancement possibilities can be provided by tactical augmented reality (TAR), which shows the exact position of soldiers and allied and enemy positions. The system is mounted on the helmet in the same way as goggles and can work at night and during the day. The TAR essentially replaces the typical handheld GPS and goggles, so soldiers must no longer look down when they want to check their GPS position. The visual sights of target recognition and point of interest icons in a head-mounted display are based on intelligent visual enhancement technology (see Figure 2). This enhancement provides more maneuverability for live combat. Adding the information of enemy targets, mini maps, friend and foe signs, and mission guidance real time in the visual interface, it brings great benefits with updating the battlefield situation and guide collaborative actions. Realizing these works is based on the positioning and registration technology of AR. With the update of the variable of fast movement of soldiers in real time, the positioning and registration technology needs continuous research in the of portability and high computing power in the scenario of single-soldier combat specially [17].

In addition, the handheld weapon can be loaded with a hotspot that wirelessly connects to a tactical augmented reality eyepiece and a mobile computing module the soldier carries around, allowing the soldier to see where they are aiming and how far away they are. The display can be split into two parts, allowing the soldier to see where the gun is pointed, as well as a head-on camera on his helmet. For example, a soldier can look around a corner or outside a wall without any risk of a headshot.

2.2. HMD for Actual Soldier Combat

According to different near-eye display devices, AR display devices with traditional optical lenses can be mainly divided into AR display devices based on traditional optical lenses and AR display devices based on optical waveguide optical elements. The light emitted by the optomechanical and the light in the external environment are superimposed through the prism and coupled into the human eye to achieve the purpose of superimposing virtual and reality. The most representative product of lens-based AR display devices is Google Glass, released by Google in 2012. This solution has a small field of view and low brightness, and it is difficult to reduce the product weight due to the use of prisms [18].

The AR HMD-formed virtual image is coupled to the human eye through the optical element, and the external light is transmitted to the human eye through the optical device. Therefore, the user can experience a more realistic and natural superposition effect of the virtual and real scene, achieving the purpose of augmented reality [19, 20]. The framework of the AR glasses working principle is shown in Figure 3.

An optical waveguide near-eye display is a new solution for optical near-eye displays. The optical waveguide module has an important effect on the light weight, field of view angle, and clarity of AR glasses. Dispelix is focused on the development of surface relief grating optical waveguides that can be combined with optical wafers with a refractive index of 2.0, aiming to provide high-quality transparent optical waveguide devices for AR. The unique etching process with greater freedom means that the optical waveguide performance can be further improved. The etching process allows multiple coatings (up to 4 layers) to be superimposed on the wafer, which gives RGB single-layer optical waveguide design more freedom. It uses a waveguide lens as a transmission medium to realize pupil expansion and increase the eye box. Compared with several display solutions, the optical transmittance is higher, the field of view is increased, the imaging quality is better, and the size is more compact. It is the mainstream development solution in the consumer market [12].

The structure and layout of holographic waveguides are important considerations in the design of optical systems, especially for free-form surface holographic waveguide helmet-mounted display systems. The holographic waveguide is diverse, and each has its advantages and disadvantages. It is necessary to select a reasonable structure type according to the actual needs, and the goal is to realize a free-form surface holographic waveguide display. The imaging principle of the free-form surface holographic waveguide helmet display system is similar to that of the ordinary optical system, in which the free-form surface and the grating can be regarded as reflection or refraction. Due to the complex surface and grating structure of free-form holographic waveguides, the imaging law will be more complicated, and further research is needed [21]. Head-mounted display based on free-form technology is shown in Figure 4 [22].

Table 1 summarizes various optical combination devices involved in AR head-mounted displays and sorts out the related performance parameters. Among them, birdbath, free-form mirror, and free-form prism structures have better imaging quality, but their larger volume and thick lenses limit their application in AR head-mounted displays. The arrayed waveguide has the advantages of lightness, wide eye movement range, and color uniformity. The design program has matured and has not grown into a mainstream solution for AR head-mounted displays. The SRG structure has the advantages of a large field of view (FOV) and a large eye movement range, and at the same time, nanotechnology can be used, so it is attracting attention. Both Microsoft HoloLens and Magic Leap One AR head-mounted displays adopt the SRG solution, which can be considered the mainstream solution for AR head-mounted displays at present. However, it is still necessary to further improve the design scheme and improve the mass production process. Off-axis HoloLenses have a large field of view and a small weight but are limited by a relatively small range of eye movement. A volume holographic grating waveguide junction using the holographic grating as the input and output coupling grating of the waveguide combinator can effectively reduce the weight of the device compared with an array waveguide [23].

BMI, a private nonprofit research and development company based in Columbus, Ohio, showed a tactical augmented reality application based on Google Glass (TARA: tactical AR application). The TARA application is based on video recognition technology to provide an assessment of potentially harmful substances for emergency personnel carrying chemical and biological radioactive detection equipment. A “tactical-grade augmented reality application” could be used with the eye fissure virtual reality headset as part of an immersive training environment. Based on built-in head sensors and a pair of handheld sensors, trainees can interact with virtual targets in hazmat, explosives, and weapons of mass destruction training scenarios. The US Air Force Research Laboratory (AFRL) has reexplored the prospect of using Google Glass for airborne rescue and joint tactical control in a program dubbed batting II [24].

BAE Systems has developed a Google-like head-mounted display device (HMD), which uses a monocular display mode and is used for the individual soldier’s version Q-Warrior. It can improve the situational awareness of individual soldiers and can also provide intelligence information for individual soldiers through superimposed data and video. The main features of the device are enhanced nighttime imaging and path planning abilities, the capacity to identify and track enemy forces, tracking personnel and equipment, and supporting team coordination [25].

The US Office of Naval Research is also working with eyewear maker Vuzix Corp. and head-mounted display device (HUD) maker Sixl5 Technologies to create a dry Google Glass-like device that can be integrated with standard military goggles. The US Navy is planning to use the head-mounted augmented reality display for military training. Trained soldiers can add virtual characters, targets, and various special effects to their field of vision through the device. Performance comparison of AR optical combiners is shown in Table 1 [23]. In March 2022, Mojo Vision, an augmented reality (AR) contact lens manufacturer based in Saratoga, California, USA, announced that it is one step closer to its goal of placing a standalone head-up display on the eye of its customers [26].

3. Core Technology of Intelligent Vision Enhancement for Battlefield

This chapter mainly introduces the core technologies of IVE, including three parts: 3D environment reconstruction, space tracking and registration, and situational awareness. These three core technologies are essential to support the application of augmented reality, and they are also a complex system that IVE must solve in the battlefield environment [27].

3.1. Intelligent 3D Environment Reconstruction

In 1963, Roberts [28] first proposed the possibility of using computer vision to obtain three-dimensional information about objects from two-dimensional images, that is, from this time onward, vision-based three-dimensional reconstruction developed rapidly, and many new methods. In 1995, Kiyasu et al. [29] of the University of Tokyo, Japan, used the M-array coded light source image reflected by the object to reconstruct the surface of the object. With further in-depth research, in 2006, Snavely et al. [30] developed two 3D reconstruction systems, which interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. In 2013, the Kinect Fusion project [31] launched by Microsoft Research made a breakthrough in the field of 3D reconstruction. Different from 3D point cloud stitching, it mainly uses a Kinect to continuously scan the object and reconstruct the 3D model of the object in real time. Doing so effectively improves the reconstruction accuracy.

Commonly used vision-based three-dimensional reconstruction techniques include the following two:

3.1.1. 3D Reconstruction Technology Based on Active Vision

The 3D reconstruction technology based on active vision mainly includes the laser scanning method [32], a light method [33], TOF technology [34], radar technology [35], and Kinect technology [36]. The method mainly uses optical instruments to scan the surface of the object and then rebuilds the three-dimensional structure of the object’s surface by analyzing the scanned data. In addition, these methods can also obtain other detailed information about the target surface to accurately reconstruct the three-dimensional structure of the target object.

3.1.2. 3D Reconstruction Technology Based on Passive Vision

Classification is performed according to the number of cameras. Passive vision-based 3D reconstruction technology is a technology that obtains image sequences through visual sensors (one or more cameras) and then performs 3D reconstruction. This technology first acquires image sequences through visual sensors (one or more cameras), then extracts useful information from them, and finally reconstructs the three-dimensional structural model of the object by reverse engineering the information. The advantage of this method is that it can be applied to various complex environments, and it is a good complement to the active vision method. In addition, it has the advantages of low price, simple operation, high real-time performance, low requirements for lighting, and no need for scenes and is easy to carry out; the disadvantage is that the reconstruction accuracy is not very high.

3D reconstruction technology based on passive vision is a technology that obtains image sequences through a vision sensor (one or more cameras) and then carries out 3D reconstruction [37]. The technology first obtains the image sequence through a vision sensor (one or more cameras), then extracts useful information from it, and finally rebuilds the three-dimensional structure model of the object through information reverse engineering [38]. 3D reconstruction based on passive vision includes the monocular vision method, binocular vision method, multiocular vision method, feature vision method, and machine learning method. The advantage is that it can be applied in various complex environments, and it is a good complement to the active vision method. In addition, it has the advantages of low price, simple operation, high real-time performance, low lighting requirements, no requirements on the scene, ease of implementation, and so on. The disadvantage of such methods is that the reconstruction accuracy is not high. The accuracy evaluation indexes recognized in the industry include the mean square error method (MSE), joint crossover method (IoU), and cross entropy method’s mean loss value (CE). The main measures taken include improving the quality of the training data set (also to considering the external contour of the object, we should also pay attention to strengthening the 3D annotation supervision based on features), refined 3D modeling, speeded up level reconstruction based on luminosity cues, and other deep learning methods. The core idea is to seek a breakthrough to improve the reconstruction accuracy based on the surface, volume, and particle size of the point [39]. Simultaneous localization and mapping (SLAM) is also an important part of reconstruction technology [4042], which originated in robotics. In computer vision, a similar technology is structure from motion (SFM). At present, the international mainstream V-SLAM methods can be roughly divided into three categories: filter-based, keyframe-based BA, and based direct tracking of V-SLAM [4345].

Williamson et al. present an approach that tracks a human head’s agile movements by using an array of RGB-D sensors and a reconstruction of these sensor data into 360 degrees of features into our SLAM algorithm. This method has great reference value for improving the tracking of soldiers’ sensitive head movement [46]. Runz et al., Zhai et al., and Ren et al. presented mask fusion, a real-time, object-aware, semantic, and dynamic RGB-D SLAM system. The advantage of mask fusion over previous dynamic SLAM systems is that it enhances the dynamic map with semantic information from many object classes in real time. This is useful for accurately identifying multiple moving targets on the battlefield [4749].

Parallel tracking and mapping (PTAM) is a camera tracking system for augmented reality. It requires no markers, premade maps, known templates, or inertial sensors [50]. PTAM is the first SLAM algorithm that separates track and map as two threads. It is a monocular visual SLAM algorithm based on keyframes. The main principle of PTAM is to capture feature points from the camera images, then detect the plane, build virtual 3D coordinates on the detected plane, and then combine the camera images and CG. It is unique in three-dimensional plane detection and image synthesis using parallel processing. The advantage is the parallel tracking and mapping process and the use of a nonlinear optimization scheme to achieve real-time positioning and object stacking. The open source of PTAM has far-reaching significance for the development of V-SLAM. At present, many V-SLAM systems on the market are improved based on the algorithm framework of PTAM. According to Mur-Artal et al. [51], the ORB-SLAM proposed and open-sourced in 2015 is the best performing of the monocular V-SLAM systems. Rückert et al. present fragment fusion, a real-time dense reconstruction pipeline that combines sparse camera tracking with image-space volumetric fusion. The tracking is based on ORB-SLAM, which constructs and optimizes a sparse global map of 3D points and keyframes. Fragment fusion is lightweight in terms of computing power and memory consumption. It can easily fuse several hundred keyframes in real time with a quality comparable to other approaches. Fragment fusion achieves real-time frame rates on a notebook at approximately 20% CPU and GPU utilization and low memory consumption. This paper believes that the practice of fragment fusion provides the material to meet the demand for lightweight equipment on the battlefield [52]. Similarly, Stranner et al. reported on the design and implementation of a sensor cube, a companion device intended to address the need for high-precision global localization. In summary, the sensor cube combines a differential GPS receiver, an inertial measurement unit (IMU), an altimeter, and a Wi-Fi radio with a battery in compact encasing, which is an attempt to make the device lightweight [53]. Both filter-based and keyframe BA-based V-SLAM usually need to extract and match feature points in the image, so they are very sensitive to the richness of environmental features and image quality (such as blur degree and image noise) [54].

3.2. Intelligent Tracking and Registration

The classification and characteristics of the current main tracking registration technologies are shown in Table 2 [55].

Sensor-based tracking registration is an early approach that needs to provide robust tracking registration with high accuracy, low latency, and low vibration to adapt to a wide variety of environments [55]. The tracking registration method based on machine vision uses the camera to collect the scene image and uses the image processing and related pose localization algorithm to solve the camera pose. Due to the development of computer vision and the large number of image acquisition sensors on devices such as smartphones, tablets, and HMDs, vision-based tracking registration has gradually become the current mainstream technology. Target tracking technology is realized by manually placing markers, that is, the marker-based method, which mainly uses artificial markers such as ARToolKit, ARTag, SCR, or ring marker points. This method has the advantages of small computation, fast speed, and high tracking and positioning accuracy, but it still has some disadvantages: for example, the tracking registration will fail when the camera fails to capture the mark accidentally; it is difficult to recognize in strong light environment; markers are easy to be blocked. In 2010, Chen et al. [56] proposed an augmented reality tracking and registration algorithm based on keyframe matching according to the characteristics of the Yuanmingyuan water law landscape, and the feature recognition and classification method of a random tree was used to realize feature matching between images. Based on this, a mobile augmented reality system based on a VST helmet display is constructed.

With the iterative development of hardware, computing power has been significantly improved. Markerless technology applications such as model-based, natural feature-based, and simultaneous localization and mapping (SLAM) have begun to be widely used. The goal of the unmarked method is the same as that of the marked method, both of which are to obtain the pose of the camera. The difference between the two is that the unmarked method can use various natural objects with sufficient feature points to achieve target tracking and registration [57]. The model-based method needs to model the target object or scene in advance and then solve the pose according to the correspondence between the 2D projection image of the target feature and its three-dimensional coordinates, which can better realize the real-time performance of target tracking and positioning technology in complex environments.

Visual SLAM technology refers to a subject equipped with a specific sensor, without prior information about the environment, to prove an environment model during the movement and at the same time estimate its movement [58], mostly using a monocular depth camera or binocular camera system. Natural features can be corners, edges, or even point cloud data of objects.

Edge-based methods are invariant to viewing angle changes, illumination changes in a certain range, self-occlusion, etc., and are more stable than corner features. Aiming at the situation in which the tracking accuracy and real time needs of the target cannot be met at the same time, a new method of tracking based on natural features is designed, which is realized by tracking technology based on face recognition and natural features, and a virtual glasses test is constructed. Prototype of the wearable system. An in-depth study of 3D registration based on natural features is carried out, and a label-free registration method based on nonlinear scale space is proposed. Stanimirovic et al. [59] aim at the characteristics of the car’s high-gloss surface, and a method combining edge and texture point features is adopted, which significantly improves the robustness and stability of tracking registration compared with algorithms that only rely on texture or edge information. Cao et al. [60] proposed a markerless tracking and registration algorithm that uses an optical flow tracker to track the measured target in real time, which solves the problem of slow real-time tracking speed.

3.3. Intelligent Situational Awareness

The enhanced situation is proposed to meet the needs of three-dimensional, high-precision, all-round, holographic display, and real-time multi-interaction means of geographic information. It not only has the ability of high-definition and high-resolution three-dimensional digital situations but also combines the works of virtual-real fusion, three-dimensional registration, and real-time interaction of augmented reality to realize the display of holographic reality at any angle. Furthermore, through the sight tracking of physical equipment, gesture recognition and speech recognition complete the real-time interaction capability of multisensing means and multicontrolling means.

In the modern military, diversified survey and antireconnaissance methods make the battlefield more complicated. How to quickly and accurately perceive the battlefield situation of one’s side and the enemy in a complex environment is very important. On the one hand, in a highly confrontational state, in addition to completing their tactical actions, fighters must also collect and analyze on-site situational information such as enemy and allied placements, firepower arrangements, and threat sources in the combat environment and complete decision-making. On the other hand, the battlefield environment is complex, and the forms are changeable. In most cases, combat individuals cannot directly observe the enemy’s behavior within the range of sight, and the combat plan is often passive. This poses new challenges to the combat ability of warfighters. Augmented reality technology enables soldiers to visually realize images consistent with the processed information and the real scene, making combat more efficient and convenient [61]. Augmented reality glasses can naturally superimpose battlefield information on the soldier’s forward field of vision and combine it with interactive methods such as voice and gestures for control. In 2000, the United States launched the first battlefield augmented reality prototype system (Battlefield Augmented Reality System, BARS) [62], and on this basis, it started the “ULTRA-Vis (urban leader tactical response, awareness and visualization)” project [63], aiming to develop AR software and hardware suitable for battlefield environments. In 2017, the US Army released “TAR (tactical augmented reality),” a concept system of individual soldier tactical augmented reality that focuses more on indoor environments [64]. In 2019, the US Army released a military-integrated visual augmentation system “IVAS (integrated visual augmentation system)” based on the augmented reality glasses device HoloLens2, which can see the three-dimensional map of the scene superimposed on the real environment showing the location of the enemy and friend through the AR glasses and can realize the function of updating the map in real time with the position and posture of the glasses [65].

4. Intelligent Vision Enhancement Technology in Battlefield

In the future, the battlefield is gradually transforming toward informatization, digitization, and digital intelligence [66]. In this new war, the concept of the traditional soldier combat unit has made a qualitative leap, and the combat ability of the individual soldier directly determines the overall combat effectiveness of the battlefield [67, 68]. Then, IVE technology will highlight more possibilities and advantages in improving the combat ability of individual soldiers. In this chapter, from the perspective of users in actual soldier combat, IVE technology is divided into three main application directions: control for commanders, situational awareness for individual soldiers, and collaborative decision-making for unmanned combat.

4.1. Battlefield Command Based on IVE

Enhancing the battlefield to the commander first requires a battlefield sand table that can grasp the overall posture. Research on AR electronic sand table technology has achieved certain results at home and abroad. Yong et al. [69] invented a practical augmented reality electronic sand table using the Kinect depth camera, which was used in the US Army military simulation confrontation. Jianhua et al. [70] designed a military training sand table based on augmented reality technology and used virtual tactical maps to achieve marking and filing. Yang used photoelectric encoders and machine vision methods for 3D registration and tracking and realized an augmented reality electronic sand table that does not depend on fixed markers. Bo and Hui, based on real-time video streaming, proposed an augmented reality sand table system in which the electronic sand table is projected onto the physical sand table [71]. The 3D battlefield construction for commander is shown in Figure 5. As shown in the figure, the situation construction process for commanders is divided into two lines: one is database and the other is military elements. For data, real data and simulated data complement and verify each other, providing information to users as accurate as possible. For military elements, battlefield terrain, weapons and equipment, fortifications, and 3D military marking are the main parts, which mainly provide battlefield spatial location awareness information and form effective battlefield comprehensive situation after superimposed military marking.

Battlefield situational enhancement for commanders also requires coordination among multiple people. Sasikumar et al. present wearable remote fusion, a remote collaboration that supports spatial annotation and view frustum sharing with natural nonverbal communication cues (eye gaze and hand gesture). They described the design and implementation details of the prototype system and reported on a pilot user study investigating how sharing natural gaze and gesture cues affects collaborative performance and the user experience. This provides a research idea for a collaborative control method that integrates multiple interactive data [72]. Gao and Itoh propose a concept in this poster paper, the Kuroko Paradigm, which can enhance user engagement during interaction with an augmented reality (AR) avatar by adding a physical object to the avatar. The technology provides a design for improving augmented reality interactions between commanders and soldiers [73].

4.2. Situational Awareness Based on IVE for Soldiers

The battlefield situation refers to the distribution of enemy and allied forces on the battlefield and the situation and development trend of the battlefield environment, which mainly includes two parts: situation estimation and threat estimation. Starting from the situation generation process and combat application needs, the battlefield situation can be divided into observations, estimations, and forecasts.

The battlefield situation is an important support of the joint perception system for joint operations and an important guarantee for transforming information superiority into decision superiority. A map of the battlefield situation has the following 6 important characteristics: (1) time dimension, (2) space dimension, (3) interaction objects, (4) service objects, (5) platform support, and (6) data support. Data at all levels can form a complete map. The processing link of the data can be mined, collected, discoverable, and traceable [74, 75]. The application of augmented reality (AR) technology to military simulation training can ensure reality while considering the economy and safety. Using an AR military training system can effectively simulate the performance of “generational difference” weapons, and equipment can achieve “red and blue” confrontation in a true sense and can test the confrontation effect of equipment that is not in service on both sides of the enemy and us to achieve the purpose of advanced training.

The Norwegian Army is currently studying the use of augmented reality technology to improve the situational awareness of armored vehicles and the practicality of command-and-control information systems. Since 2013, the Norwegian Army Ground War Center Warfare Laboratory has carried out the vehicle information system integration experiment, and the project ended in May 2014 with a series of field trials. In the experiment, the occupant of the armored vehicle drives the vehicle wearing augmented reality glasses, which can provide a 360° viewing range. In addition, the data of the battlefield management system are also integrated into the display screen of augmented reality glasses [76].

Fan and Verma et al. presented a multivehicle cooperative military training system. The vehicle-based cooperative AR system not only provides each soldier with battlefield situational awareness, such as direction, virtual forces’ positions, and virtual forces’ attributes, but also offers the ability to engage in a virtual military confrontation with computer-formed virtual forces. Soldiers driving AR-equipped real vehicles can work as a team to fight against virtual forces. Each soldier of the team will see virtual forces accurately geo-registered from their perspective and simultaneously see the blast effects when a soldier hits virtual forces. The benefits of the vehicle-based cooperative AR system are that it gives soldiers a better judgment and perspective on large-scale missions, which would significantly improve each soldier’s ability of space, time, and forces [77, 78].

4.3. Situation Awareness and Collaborative Decision for Unmanned Combat Based on IVE

In function and structure, foreign advanced individual soldier digital equipment is mainly composed of 5 subsystems: integrated helmet system, computer subsystem, weapon subsystem, advanced military uniform system, and microclimate adjustment system. Individual electronic warfare equipment is the equipment used by military personnel who perform ground electronic countermeasure tasks in the army, marines, and airborne troops. Its main feature is that it is easy to carry and use, so it is also called portable electronic warfare equipment. It forms a complete electronic warfare equipment system together with the electronic warfare equipment of other carrying platforms, such as ground fixed stations, vehicles, carriers, airborne, missile, and spaceborne [79, 80].

Miniature air unmanned combat platforms, ground unmanned combat platforms, individual cruise missiles, and other light weapons platforms developed in the future will be important combat on the battlefield in the future. We must focus on collaborative operations and study a new collaborative combat mode based on the combat needs of individual squads. The future ground and air unmanned combat platforms will show the development trend of digitization, systematization, and intelligence. Unmanned combat platforms with survey modules can intelligently identify, automatically extract, find, and track inspection graphics and images. The data information undergoes fusion processing, such as spatiotemporal registration, data association, noise reduction processing, and threat judgment. The detected target or the target information issued by the superior will be displayed on the map in real time, and the danger warning information of the early warning time, type, level, content, and other attributes will be issued to remind each combat platform [81, 82].

5. Conclusions and Future Work

This paper analyses IVE on the battlefield from the perspective of tactical augmented reality, summarizes the three core technologies that support IVE, and analyses the application scenarios and development direction of IVE technology from the perspective of three battlefield users. In addition to the content reviewed above, this paper briefly summarizes other current constraints. The chip accounts for approximately 25% of the production cost of AR smart glasses, which is the support foundation of hardware technology. To perform the chip determines the work, real time, and stability of AR smart glasses. The sensor can realize the functions of precise positioning, tracking, and interaction of AR smart glasses, ensuring the user’s augmented reality experience. Chips and sensors are faced with fewer domestic high-end companies, a lack of core technology, and a certain gap with foreign countries. AR smart glasses do not have operating devices such as mice and keyboards, so to obtain a better augmented reality experience, the interaction method is the first problem to be solved. At present, the methods of AR smart glasses are mainly based on speech recognition, gesture control, and somatosensory control.

Although speech recognition is a better method of human-computer interaction, its recognition accuracy and accuracy are not very high, and it can be used as an auxiliary tool. Its intelligence level cannot currently meet the needs of AR. Gesture control has high acting efficiency and a pleasant acting experience, but frequent use of both hands will cause problems such as arm soreness. The somatosensory system can achieve a better-augmented reality experience, but the device has a single function and has great room for improvement [8385]. In addition to the needs for developing hardware such as augmented reality headsets and pose tracking devices, for miniaturization, cheapness, portability, and low power consumption, the technical challenges faced by augmented reality practical applications mainly come from high-precision virtual-real information registration, intuitive cognition information expression, and robust and convenient interaction methods: specifically (1) high-precision virtual and real information registration, (2) intuitive cognition information expression and collaboration [86, 87], and (3) robust and convenient while maintaining a natural “head-up” battlefield awareness [64, 88].

This paper proposes the following suggestions: (1) popularize IVE technology in actual soldier combat, improve the equipment performance and practicability of vision enhancement technology, and ease the exposure of practical problems. The system should be applied to military maintenance and actual combat outside of the laboratory and according to the feedback of the troops to make targeted improvements. (2) Promote domestic intelligence of equipment, especially chips and control systems, to achieve autonomy and control and avoid being hindered by European and American technologies in the future battlefield environment. (3) Make full use of new positioning, such as the BeiDou system and 5G, to accelerate the integration of edge computing and cloud computing capabilities, improve the security of data link transmission, and speed up software service equipment upgrades and technology reserves. Future IVE technology will bring subversive achievements and innovations to future wars.

Data Availability

The data used to support the findings of this study have not been made available because of commercial reasons.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was jointly supported by the project of Digital Media Art, Key Laboratory of Sichuan Province (Sichuan Conservatory of Music, Project No. 21DMAKL01), the First Batch of Industry-University Cooperation Collaborative Education Project funded by the Ministry of Education of the People’s Republic of China (Minjiang University, Project No. 202101071001), Minjiang University 2021 School-Level Scientific Research Project (Minjiang University, Project No. MYK21011), Open Fund Project of Fuzhou Technology Innovation Center of Intelligent Manufacturing Information System (Minjiang University, Grant No. MJUKF-FTICIMIS2022), Open Fund Project of Engineering Research Center for ICH Digitalization and Multi-Source Information Fusion (Fujian Polytechnic Normal University, Grant No. G3-KF2204), Guiding Project of Fujian Province (Minjiang University, Project No. 2020H0046), and Key Technology Research and Industrialization Project for Software Industry Innovation in Fujian Province (MinjiangUniversity and Fujian Guotong Information Technology Co., Ltd., Project No. 36).