Multitask Learning by Multiwave Optical Diffractive Network
Recently, there has been tremendous research studies in optical neural networks that could complete comparatively complex computation by optical characteristic with much more fewer dissipation than electrical networks. Existed neural networks based on the optical circuit are structured with an optical grating platform with different diffractive phases at different diffractive points (Chen and Zhu, 2019 and Mo et al., 2018). In this study, it proposed a multiwave deep diffractive network with approximately 106 synapses, and it is easy to make hardware implementation of neuromorphic networks. In the optical architecture, it can utilize optical diffractive characteristic and different wavelengths to perform different tasks. Different wavelengths and different tasks inputs are independent of each other. Moreover, we can utilize the characteristic of them to inference several tasks, simultaneously. The results of experiments were demonstrated that the network could get a comparable performance to single-wavelength single-task. Compared to the multinetwork, single network can save the cost of fabrication with lithography. We train the network on MNIST and MNIST-FASHION which are two different datasets to perform classification of 32∗32 inputs with 10 classes. Our method achieves competitive results across both of them. In particular, on the complex task MNIST_FASION, our framework obtains an excellent accuracy improvement with 3.2%. In the meanwhile, MNSIT also has the improvement with 1.15%.
Store and retrieve data units based on the von Neumann architecture are far more time-consuming and power-hungry than an optical device [1–6]. Different from modern computers, CIS integrated data computation, storage and fetch, is more effective, uses less power, has large storage capacity, and has higher integration level [7–9]. Besides, artificial neural network is  similar to the way in which human and animal store and process data, which is successful in a wide range of tasks such as image analysis , speech recognition , and language translation . Artificial neural network can get comparable or even superior performance than the human with the increasing data volume, problem complexity, and structure depth. Most of the tasks cannot be migrated well in smart portable devices for its complexity and power. The less power, more efficiency, and faster speed are becoming more and more critical for deep learning implemented on the embedded device.
The neuromorphic computing seeks a brain-like processing, which overcomes the limitation from conventional computers. IBM  built a 5.4-billion transistor chip with 4096 neurosynaptic cores called True North, a fully functional digital chip. To provide the extreme complexity of the human cerebral cortex, Cianciulli et al.  combined complementary metal oxide semiconductors (CMOS) and two-terminal resistive device with electric circuits. Spin-transfer torque magnetic memory  (STTMRAM), with nonvolatility, high speed, and high endurance, is suitable as a stochastic memristive device, considering the functional implication of synaptic neuronal plasticity. Tait , inspired by spiking neural networks, integrated laser devices to explore highly interactive information at speeds with optical-electronic systems. This approach promises to incorporate photonic spike processing in the training system. Besides, Rios et al. dramatically improved storage capacity to implement all-photic nonvolatile multilevel  where memory electric-photic interconnect technologies bring not only opportunities but also challenges to the unconventional circuits and systems. To overcome the wastage of optical-electric conversions coupling, all photic device can be performed with fast computational speed and lower power. On-chip nonvolatile photic device  would dramatically improve performance in existing brain-like neural networks  to eliminate electronic latency and reduce electronic consuming. The on-chip optical architecture is designed for network protocol computational element and waveguide medium to communicate among high-performance spiking neurons.
The architecture of fully optical network with Mach–Zehnder interferometer  promises the reduction of data-movement energy cost. All-optical diffractive deep neural network architecture (D2NN)  utilizes passive component and optical diffraction. D2NN can be easily scaled up and fabricated by 3D lithography  in a power-efficient manner.
In general, optical networks have more trainable parameters with complex-value modulation which provide phase and amplitude of each neuron rather than only amplitude in electric networks. Unfortunately, the optical device to form neural network has some problems. Firstly, all-optical neural network is designed for a single task, but multitasks  are significant and important. Secondly, learning rates for different tasks are important to the accuracy. It is nontrivial to balance the tasks and learning rates.
In this paper, to address the above two issues, we make most of the optical characteristics to express different tasks with different wavelengths. One is used as baseband and the other is used as a carrier frequency. Therefore, the base band wave can be set to a large learning rate and vice versa. Extensive experiments based on MNIST and MNIST-Fashion  are conducted to investigate the efficacy and properties of the proposed multiwavelength diffractive network (MWDN). In both tasks, the MWDN significantly performs the baselines even better in the same network.
2. Multiwavelength Diffractive Network
2.1. The Forward of the Network
Spatial domain implies per-wave in-plane propagation reasoning about diffractive in the particular phase and frequency, which can analyze and integrate different direction waves. It operates in the frequency domain. The wave distribution of the observation and aperture plane can be viewed as the linear combination with a great many monochromatic plane wave of different direction propagations. The amplitude and phase of each plane wave lies on the angular spectrum. The angular spectrum can be acquired by FFT analysis process . The plane wave propagation is a complex task that takes into consideration many affecting factors, such as direct, phase, and amplitude.
As shown in Figure 1, in the top row of the picture, we first adjust optical grating parameters (height and the complex index of refraction), and then the height is altered by 3D printed, and the complex index of refraction is altered by laser light with different power in the last. Different power can alter different refraction of phase change materials. The above figure shows the manufacturing process of the whole mask. The substrate changes the phase transition threshold (the phase transition changes the complex refractive index value) by calculating the film thickness, photolithography (adjusting the corresponding optical threshold), and photolithography power. After the whole system is made, in the middle row, different tasks are input, and the corresponding area is illuminated through the same network, which represents the label information. We use different wavelengths and to represent different categories. At the same time, the wavelengths are one long wave and one short wave, which have the effect of carrier wave, so that there is no impact between tasks.
We input images in MNIST and MNIST-FASHION simultaneously, the input optical wavelength of MNIST and MNIST-FASION task is and , respectively. The diffractive network with a different task has the same optical parameters. The bottom of the figure is the optical carrier with different wavelengths and .
The top left of Figure 2 shows plan on the front view corresponding to the bottom of Figure 2, and the top right of Figure 2 is physical simulation system. The bottom of Figure 2 is the framework of diffractive network, and different colors denote different indexes of refractive on the side view. The task is input through the front mask, through the follow mask with different thicknesses and different phases.
Firstly, we convert image information to the phase and amplitude of optical information as the input of systems. Then, the optical grating is manufactured by 3D-printing device with different heights. In the following sections, we discuss that MWDN tackles the tasks predominately using the angular spectrum. MWDN by the 3D-printing would influence the amplitude and phase of the wave to 0∼1 and 0∼2π, for two tasks in the same network. For each layer of MWDN, we set the neuron size range 200 μm to 700 μm, which is an effect tunable.
Following the Fresnel diffraction equation, we can consider the optical signal from the spatial domain to the frequency domain. The angular spectrum method of plane wave explains how wave propagates. It is the primary method of analyzing diffraction in the frequency domain. Based on the angular spectrum, the free space transfer function is to control free propagation. The wave plane can transfer angular spectrum by the FFT process, where diffractive data processing is more evident as follows:where fx and fy are space frequencies which correspond to x and y location (fx = 1/(x−x0)), x−x0 is gaps in the optical map, N and M is the number of grooves on optical grating in height and width direction, U0(x, y) is the original field distribution, U(x, y) is the field distribution after free-space transfer, H(fx, fy) is transfer function, A0(fx, fy) is the original angular spectrum, and A(fx, fy) is the angular spectrum after free-space transfer. The results of the inverse Fourier transform to transfer function are the impulse response function. The equation can be viewed as the Fourier transform:
The output wave plane distribution propagates through 3D material and the field distribution is changed by the refractive index:where α is extinction efficiency, n is the refractive index real part of 3D material, n0 is the refractive index real part of the vacuum, k is the refractive index imagery part, λ is the wavelength, Δd is the height of material map, and ϕ is the phase difference. If we choose transparent material (k ⟶ 0) ignoring the optical losses, the transmission coefficient of a neuron is composed of only phase term; if we select nontransparent material, the transmission coefficient of a neuron is composed of amplitude and phase in MWDN architecture.
According to the size of input data, an effective and flexible linear interpolation algorithm is to fit the diffractive input layer. The interconnection rate between adjacent layers relates to the distance and diffraction angle, which approach the critical value (1.0).
Furthermore, the number of the network layers and the axial distance are also tunable. The output layer can part into ten regions corresponding to ten classes, where the summation of light intensity can be detected in the wave plane region. Mean square error (MSE) uses to train MWDN parameters compared to the target. We aim to minimize a loss function, which increases target region wave intensity and decrease other regions. The training batch size is set to be 10 for the classifier.
2.2. The Backward of the Network
To train MWDN, we use the backpropagation algorithm with the Adam optimization method. We focus on the intensity of wave and define loss function with MSE between the output and target:where K is the number of training data, ok is the output of the MWDN, and tk is the label of the corresponding input. The optimization problem can be written follows:where l is the layer and i is the lth layer location. The gradient of loss to all parameters can be calculated, which is used to update MWDN architecture parameters during the training process. Each batch of the training data is fed into MWDN, where each layer gradient can be calculated to update.
We use multiwavelength, we need to adjust a base value through one wavelength, and then fine-tune through another wavelength without affecting another task as follows:
Our refractive index has a discrete value. Based on the principle of low-bit network, refractive index n is adjusted to be continuous in the backward and Bernoulli distribution to samples N1, N2, and Nm in the forward. There are m kinds of discrete N. The refractive index value depends on different phase values. In this paper, m = 4 is used, and the reasoning is as follows:
For the forward step, we normalize n:
2.3. The Multiwavelength Setting
The optical diffractive network and deep neural network are markedly different. The function of the optical diffractive network is determined by wavelength and the parameters of the optical grating (height and complex refractive index). Multiwavelength diffractive network has a broad range of requirements that differ from the conventional network.
Different wavelengths have different effectiveness. We set different wavelengths for different tasks. Meanwhile, the network needs to ensure that different wavelengths do not affect each other. By setting one to baseband and the other to the carrier, the diffractive network is used to adjust optical plane wave independently. The algorithm can be considered as an efficient carrier algorithm. The ratio of baseband and carrier wavelengths is 1 : 30. The short wavelength is little influence to long wavelength and vice versa. If the phase difference of long wavelength is and the phase difference of short wavelength is , the corresponding relationship as follows:
So, the equation can be as follows:
The second term of relative to the first term can be ignored, and the equation can be shown as follows:
The multiwavelength diffractive network can be effective, and more powerful than deep neural network. Phase difference (i = 1, 2) can be obtained easily by adjusting the height on the diffractive network. Due to , then we adjust for the minor learning rate for , as well as the large learning rate for , without one impacting the other:
In this work, we apply the proposed MWDN to implement on two different datasets MNIST and MNIST-Fashion.
3.1. Model Setup
By comparing to the state-of-the-art methods with accuracy and speed, MNIST and MNIST-FASION, in this method, achieve better performance. The size of the network is set to 200 × 200, 500 × 500, and 1500 × 1500, each having a trainable height of the map. The optical network possesses two types, one for phase modulation and the other for complex modulation. The MNIST and MNIST-FASION works with different optical wavelengths, and the input is altered by the optical grating mask.
Using the backward propagation, the model is trained with two task datasets alternately and validated its effectiveness. We train the network with different learning rates for different tasks, which overcome the drawbacks of local optimum to solve. As well as, all the parameters of the network are adjusted by the gradient descent algorithm to minimize the error.
We evaluate the approach on two datasets and input information for neurons in the form of phase fed into the network. The two datasets have different data distributions, which are difficult to classify in the same network. The conventional networks require the input information to be independent and identical distribution. The task is to handle two different distribution data in the same network.
3.3. Experimental Analysis
For better performance, we set a different learning rate and different signal frequency to two datasets. The maximum half-cone diffraction angle is formulated as follows:
The light wavelength is 0.4 THz and 14.4 THz for MNIST and MNIST-Fashion. The neuron size is set to be 200 μm. The height of the map and axial distance between two successive layers are trainable. As comparing the performance of MWDN and DN methods with the single task, the results are shown in Table 1. It is clear that the performance of MWDN would improve the accuracy of 1.15% and 3.2%, independently. To evaluate the multiwavelength for multitask, so we compare the multitask to a single task in Table 2. The multitask diffractive network enables consistent performance with a single task. The result can perform well in the same parameters. The experiment set of setting 1 is the same wavelength for comparison. Setting 2 is performed by a different wavelength. The DN-FASION and DN-MNIST are evaluated by independent diffractive network.
3.4. Convergence Analysis
Figures 3 and 4 demonstrate two classifiers for two datasets, where each dataset has ten target class. One classifier is set to classify in the same region and another is set to a different region. The different frequency setting to different datasets is effective and different regions in the MWDN have high accuracy and lower loss.
Finally, we report the performance on the validation data of MNIST and MNIST_FASION, a challenging task with different datasets. Using only a single network, two datasets classification can be accomplished simultaneously. We investigate the effects of various combinations of different datasets for MWDN. The results are shown in Figures 3 and 4. If the precision continues to increase and the loss decreases, the task is convergence. We find that we can implement two classes to the same network with MWDN algorithm. Compared to other approaches that use only single dataset as input, our approach even yields a boost.
In this paper, we propose a novel and multitasks optical network named as the multiwavelength diffractive network (MWDN). Based on plane wave propagation, our method can achieve comparable accuracy against the single-task network. We successfully apply MWDN to multitasks with different datasets distribution and provide a multiwavelength method with different model sizes. In the future, we aim to develop a more effective network to achieve complex tasks and reach better performance.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors acknowledge the financial support by National Key R&D Program of China (Grant no. 2017YFE0112000) and Shanghai Municipal Science and Technology Major Project (Grant no. 2017SHZDZX01). The authors wish to thank Doctor Y. C. Sun. Furthermore, the authors would like to express sincere thanks for light source and controller supported by 3 Lights and Light-ca Technology Corporation.
T. F. Cianciulli, J. A. Lax, F. E. Cerruti et al., “Complementary value of transthoracic echocardiography and cinefluoroscopic evaluation of mechanical heart prosthetic valves,” Echocardiography, vol. 21, no. 2, p. 211, 2010.View at: Google Scholar
A. N. Tait, M. A. Nahmias, T. Yue, B. J. Shastri, and P. R. Prucnal, Photonic Neuromorphic Signal Processing and Computing, Springer, Berlin, Germany, 2013.
M. Dülk, S. Fischer, M. Bitter et al., “Ultrafast all-optical demultiplexer based on monolithic Mach–Zehnder interferometer with integrated semiconductor optical amplifiers,” Optical & Quantum Electronics, vol. 33, no. 7-10, pp. 899–906, 2001.View at: Google Scholar
A. d. Campo and C. Greiner, “SU-8: a photoresist for high-aspect-ratio and 3D submicron lithography,” Journal of Micromechanics & Microengineering, vol. 17, no. 6, pp. R81–R95, 2007.View at: Google Scholar
C. F. V. Loan, Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, PA, USA, 1992.