Abstract

One of the biggest challenges in unravelling the complexity of living systems, is to fully understand the neural logic that translates sensory input into the highly nonlinear motor outputs that are observed when simple organisms crawl. Recent work has shown that organisms such as larvae that exhibit klinotaxis (i.e., orientation through lateral movements of portions of the body) can perform normal exploratory practices even in the absence of a brain. Abdominal and thoracic networks control the alternation between crawls and turns. This motivates the search for decentralized models of movement that can produce nonlinear outputs that resemble the experiments. Here, we present such a complex system model, in the form of a population of decentralized decision-making components (agents) whose aggregate activity resembles that observed in klinotaxis organisms. Despite the simplicity of each component, the complexity created by their collective feedback of information and actions akin to proportional navigation, drives the model organism towards a specific target. Our model organism’s nonlinear behaviors are consistent with empirically observed reorientation rate measures for Drosophila larvae as well as nematode C. elegans.

1. Introduction

Biological systems, as well as many economic and social systems, are characterized by highly nonlinear, complex dynamics which reflect the high degree of adaptability that they possess to the ever-changing environment that surrounds them. Such complex adaptive systems are often modeled as large collections of interacting agents that evolve in time to produce a complicated interplay between deterministic and stochastic outputs [13]. The challenge to understand the neural circuit mechanisms underlying an organism’s constant search for optimal living conditions, constitutes an active research field [46]. Target systems such as nematodes and larvae are often studied to unveil the neural logic responsible for transforming external stimuli into directed locomotion [710]. From bacteria to vertebrates, there is an open challenge to understand the feedback loop in which sensory inputs and motor outputs coupled resulting in the organism’s locomotion. In addition to the interest in understanding living systems, the lessons learned from such studies may have useful implications for the design of more efficient cyberphysical systems comprising collections of sensors and actuators [11, 12]. Indeed, the model that we present can be seen as equivalent to such a cyberphysical system, with each component (agent) acting simultaneously as a sensor and an actuator.

In bacteria, a goal-directed behavior is realized through biased random walks where paths are extended in the direction of the stimulus (klinokinesis) [13, 14]. Organisms with highly complex nervous systems (e.g., vertebrates) perform simultaneous spatial comparisons of stimulus samples (e.g., odor intensity) that are detected by independent sensors (tropotaxis) [15, 16]. Lying in between, organisms such as fruit fly larvae detect temporal changes in the stimulus intensity in order to guide their motion [8]. The crawling larva alternates forward movements or runs with head-sweeping movements or turns [810, 17]. This is an error-correction mechanism of temporal sampling known as weathervaning or klinotaxis, and it encompasses a sensorimotor memory developing from former reorientation actions [18].

Carefully designed anatomical models of larvae following a combination of weathervaning and head casts have been proposed to explain several features of the organisms’ behavior [19] and the flow of information [20]. A key question is whether a simpler control scheme inspired by empirical findings and involving a reduced number of parameters, can produce satisfactory results. It has been found that for Drosophila larva under the influence of an odor gradient (chemotaxis), the subesophageal zone controls the selection of different behavioral programs including the regulation of transitions from runs to turns [21]. However, it has been shown empirically that these organisms actually produce exploratory routines of runs and turns in the absence of a brain, which suggests that there is a basic motility pattern produced at the neural cord [19, 22]. In addition, for thermal stimulus (thermotaxis), chordotonal neurons found in the body wall, are shown to produce different responsive patterns to thermal changes [23, 24]. These findings motivate us to develop a decentralized model approach in order to understand the underlying sensorimotor circuits in simple organisms.

Here, we present such a decentralized model that regulates the direction of motion of a system (e.g., larva) in search of a specific target, and as a result produces nonlinear motion that is consistent with that observed empirically. We stress that our model is not unique, and is purposely minimal in that it lacks a wealth of known biological details. Hence, it should be seen very much as a possible prototype. However, its value comes from the fact that, despite its minimal structure, it does produce nontrivial nonlinear behaviors, which are consistent with the observed empirical measurements. This in turn suggests that our minimal model may indeed be capturing some core principles (e.g., reward/penalty mechanism), albeit in a very crude way, which may eventually become generalizable to other organisms and systems in the future. The nonlinear output trajectories produced by our model share some interesting commonalities with those observed in klinotaxis dynamics. In addition to using temporal sampling to regulate turns, our analysis shows that the relationship between the turning rate and the optimal direction is akin to that found in the chemotaxis of organisms such as C. elegans and Drosophila larvae. Our model could hence shed light on open questions regarding the specifics of the neural circuits connecting the sensory and motor neurons in these organisms.

2.1. Description of Our Model

Consider a system comprising agents, and let be odd for concreteness. Based on the set of strategies that each agent holds, each of which is a look-up table of actions to take given a particular history of recent global outcomes, the aggregate action of these agents will dictate the trajectory of the system during the next timestep. This process then continues for all timesteps. Each agent can be considered, for example, to represent a piece of the organism’s machinery (e.g., a segment of the body or a piece of the neural system). There is a specific target or destination, and the global outcome produced at each timestep concerns whether the aggregate behavior was good or bad in terms of moving the organism toward or away from the target. There is no communication among the agents beyond the fact that they receive the same feedback of information about the global outcomes at previous timesteps. The model setup is crudely analogous to the situation of a set of N people in a canoe, without any central coordinator, with each of them deciding by themselves at each timestep whether to put their oar into the water (and hence row) on the left or right of the canoe. They then all see in which direction the canoe actually moved, and whether this was beneficial to the canoe in terms of moving toward a target, and then they each decide by themselves again and the whole process iterates. The heterogeneity of the individuals means that they each have their own set of strategies that they use to decide the action that they will take at a given timestep, given the global information about previous outcomes being good or bad in terms of the canoe moving toward its target. We shall not pursue this analogy more, though the rest of the paper can indeed be read with this analogy in mind for concreteness.

Initially, the system is located at an arbitrary position away from the target, and pointing in an arbitrary initial direction. Suppose that at the end of the timestep , the system reaches the point with an instantaneous velocity of which makes an angle with respect to the horizontal (see Figure 1(a)). The target is located at a fixed point and we define the vector to point from the system to the target at time . The angle between the vectors and serves as an indicator of the system-target alignment. At the beginning of the next timestep, the system (i.e., organism) rotates so that it now makes a new angle θ(t) with the horizontal and subsequently advances a distance ℓ along the new direction to the new location . The travelled distance ℓ per timestep is constant throughout the simulation and, hence, so too is the system’s speed, though this could of course be generalized. An additional angle is defined between the previous target vector and the new direction , which helps us determine whether the rotation was a good move for the system or not. The angles Ω and are always taken to be positive.

Figure 1(b) shows schematically the details of the rotation. At each timestep, agents decide on one action from among two possible actions: act to rotate the system clockwise (action − 1) or act to rotate the system counterclockwise (action + 1). The magnitude of the individual contribution is . This means that if any two agents select opposite actions, their contribution cancels out. By contrast, if two agents select the same action, their contribution adds up. The net change in the direction of the system is given by the sum of the agents’ individual contributions , which can be positive or negative. Here, we choose so that if all agents decide on the same action, the system will change its direction by 90 degrees. Each agent decides on a particular action based on the strategies that it holds. Each agent holds strategies that are randomly assigned to it from the entire set of strategies. Each strategy dictates an action given a particular history of prior winning groups (i.e., winning decisions). If action + 1 is better than action − 1, the winning group is 1, and it is 0 otherwise. The length of the history is known as the memory m and provides the number of timesteps that are used as input. For example, if , the last two winning groups are inputted in order to represent the last two global outcomes.

Given the binary nature of the action, there are possible winning histories corresponding to the global outcomes and hence possible strategies. For example, for the four possible winning histories are 11, 10, 01, and 00, and there are possible strategies (see Figure 1(b)). Strategies are rewarded or penalized depending on whether they predict the winning group or not. Out of its own set of strategies, each agent uses the strategy that has the highest score at that timestep. If two or more strategies have the same highest score, the agent chooses randomly one of them. For example, if the history for a given timestep is 10 and the highest scored strategy an agent holds is (strategy number 2 in Figure 1(b)), the agent will choose action − 1. However, under the same conditions, if the history is 11, the agent will choose action + 1 instead. The chosen strategy will either increase or decrease its score by one point depending on whether the action yields a better or worse system-target alignment, respectively.

To determine whether an individual action is good or not, we look at the change it produces to the direction of the system at time compared to time . A good action would improve the alignment between the vector and the vector . Therefore, if is the angle between these two vectors, the goal is to minimize . To this end, a comparison is made between , that is, before individuals perform the action that changes the direction, and defined as the angle between the new vector (i.e., after the action is taken but before the system displaces) and the target vector . Thus, the effect of the combination of individual actions is evaluated at the same position in the plane. Hence, the difference is used as a criterion to determine whether the action is good or not as follows: when , the action that has led to is the bad (losing) action and the other one is the winning action. By contrast, when , the action that has led to is the good (winning) action. Finally, if and , the action that makes smaller wins, and if and , both actions are winners.

It is important to understand the correlations among the different strategies, which in turn affect the system behavior in the model. The root of these correlations lies in the specifics of the strategy space for a given value of . When looking at the full strategy space (i.e., dimension ), we can find subsets of strategies where each pair within the subset can be either uncorrelated or anticorrelated. Uncorrelated strategies refer to pairs of strategies that would take the same actions for half of the histories and would take the opposite actions for the remaining half. If the histories occur equally often, the actions of the two agents will be on average uncorrelated. For as an example, this is the case for strategies and . Anticorrelated strategies refer to pairs of strategies that take opposite actions independent of the sequence of previous outcomes. For example, for , any two agents using strategies and do opposite actions and as a result their net effect on the system’s trajectory cancels out and will not contribute to the trajectory fluctuations. The group of agents using these strategies at a given timestep are called the crowd and the anticrowd [25] since they tend to cancel each other’s contribution.

3. Results

3.1. Sample Trajectories

Sample trajectories from our model are shown in Figure 2(a) for different values of and for a population of agents. Each panel shows three trajectories for a specific value of and . The trajectories start at the points (−40, 0), (0, 0), and (+40, 0) in length units of . For all cases the initial angle is with the horizontal, and the target is located at the point (0, 100). The number of steps is such that the system arrives to the target at the last timestep if traveling in a straight line.

For (left panel in Figure 2(a)), the strategy pool is small (4 strategies) and all strategies are in play. This results in a large-crowd effect where the majority outnumbers the minority by a large amount. Consequently, the majority action quickly becomes the wrong action and then soon the opposite action becomes the majority and subsequently the wrong action again. The outcomes move through the history space with some periodicity. This translates into a rather zigzag motion of the system in a short period and at a big inclined angle to the vertical. The zigzag motion makes the end point farther away from the target. The effect is larger when agents have access to more strategies as shown in the case. As increases, the trajectories gradually lose the zigzag behavior and drive the system more effectively towards the target. This is because the anticrowd is effectively balancing the crowd. The system moves in a more direct way towards the target. However, since is odd, it displaces from one side to the other of the optimal path with an angle that is smaller than that for .

Figure 2(b) illustrates the average time the system takes to reach the target given by , as a function of the number of agents for different values of and . In this case we have chosen an initial system-target separation equal to , so that a search time close to 103 timesteps is an indication of a more efficient path. It is found that the average search time tends to reach a steady value as grows while the fluctuations decrease rapidly as increases. This indicates that a population of agents is a good choice if we want to avoid large-population effects where the fluctuations will likely significantly affect the dynamics.

3.2. Resemblance with Proportional Navigation

A navigation strategy known as proportional navigation has the objective of maintaining the line of sight (LOS) angle fixed while the system moves towards the target [26]. LOS is the direction that maximizes the efficiency of the movement measured with respect to a fixed reference. For example, the LOS angle of the central trajectories of each panel in Figure 2 is 90 degrees, or more generally in Figure 1(a) the LOS corresponds to the direction of the vector . It has been shown that the rate of change of the LOS angle (i.e., the turning rate) is proportional to the sine of the local bearing angle which is defined as the angle from vector to vector [27]. Note that , since is defined as positive. Figure 3(a) illustrates the instantaneous turning rate per timestep and the bearing angle for three step trajectories moving towards a single target. We measure the turning rate and bearing angle for a large set of trajectories generated by our model for different values of and . We analyze 1024 trajectories for each parameter set while randomizing the initial conditions (i.e., the system’s initial position and direction).

Figure 3(b) shows our results for the dependence of the mean turning rate on the mean bearing angle. For all cases, the relationship resembles that of a sine function as predicted by the proportional navigation guidance technique. The main panel shows how by changing and , the dependence alters the roughly sinusoidal pattern. For example, large changes in the turning rate are associated with small values of and large values of , which is a result of the zigzag behavior described in the previous section. The inset illustrates the effect of different noisy level information in the angular dependence. Noise is introduced by means of a probability that the winning group is identified incorrectly at a given timestep. This could model an obscured target as well as sensor malfunction. Noise in the information tends to reduce the turning rate without changing the original pattern.

We find that the sinusoidal pattern is robust for a wide range of choices of the number of agents . Figure 3(c) shows that even for a single agent, the overall pattern is preserved while the sinusoidal shape improves when more agents are included. As found for the search time, larger effects occur for the small populations. This finding indicates that the element of competition among a fair number of agents (), tends to stabilize the system to the point that the model becomes scalable with .

3.3. Comparison to Larval Chemotaxis

A comparable turning rate pattern is found for the nematode C. elegans [28] and Drosophila larva [18] when released on a surface subject to a chemical gradient (chemotaxis). This is a form of klinokinesis and klinotaxis where the organisms are guided by the stimulus intensity (e.g., odor). Figure 4(a) uses the measurements from [18] to compare trajectories from Drosophila larva chemotaxis using a sensory gradient formed by an odor droplet of 30 nM ethyl butyrate (top) and trajectories from our model with parameters of , , and (bottom). The stimulus (e.g., odor and target) is located close to the initial position of the system so that the target region is revisited frequently by the system (organism) and the reorientation mechanism (i.e., turning rate) can be analyzed. Each panel shows the overlap of 42 independent trajectories. Using the measurements of Gomez-Marin and Louis [18], Figure 4(b) contrasts the turning rate dependence with the bearing angle for both systems. For the model, we have used a sample of 1024 trajectories with at least 100 timesteps each and we use the same parameters as in Figure 4(a). Although the amplitude for the turning rate for the model is larger than that of the organism, both patterns compare reasonably well when scaled, as shown.

4. Discussion

Though we are certainly not arguing that the neural circuit responsible for the motion in these organisms follows the same mechanism as our model, it is curious and suggestive that there are nontrivial commonalities about the resulting nonlinear motion that could provide new insights into how these neural circuits connect the sensory and motor neurons. This becomes particularly interesting for light avoidance klinotaxis in Drosophila larvae where it has been found that the sensorimotor system (i.e., alternation between crawls and turns) of the organism’s abdomen and thorax can function normally in the absence of brain activity. In addition, body wall neurons in contrast to dorsal organ neurons, have shown active response to thermal inputs in Drosophila melanogaster larvae [23]. These findings point to a potential decentralized neural machinery instead of a unified central network.

Our model can indeed be extended to account for more complex environments beyond a single target point, for example, considering a gradually (or temporally) changing signal often used to model chemotaxis. Indeed, considering a fixed-point target is a reasonable strong first step that accounts for basic but illustrative environments in phototaxis as well as chemotaxis. In addition, for a very large initial system-target separation, compared to a typical trajectory length, our current model could also be used to study one-dimensional thermotaxis.

Our work contributes to the undergoing transition of the biological sciences to a more quantitative subject and where nonlinear approximations can play a central role [29]. In particular, it provides a framework where the adaptability of a system can be tested against nonlinear interactions, both internal and external. Moreover, our work touches on the incorporation of randomness in biological processes such as perception and its potential implication in locomotion.

In summary, we have presented a decentralized multiagent model that, by using a decision-making mechanism based on strategies and scores, drives the trajectory of a system towards a specific target. Our model is scalable to any number of agents N and adaptable through a penalty-reward mechanism of the implemented strategies. The model follows a proportional navigation principle, which aims to fix the LOS angle along the trajectory. This is shown by calculating the dependence of the turning rate on the bearing angle, revealing a sinusoidal dependence. This same turning rate function is found in the klinotaxis dynamics of several organisms and compares qualitatively well with our nonlinear model.

Disclosure

The views and conclusions contained herein are solely those of the authors and do not represent official policies or endorsements by any of the entities named in this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Neil Johnson gratefully acknowledges funding under National Science Foundation (NSF) Grant no. CNS 1522693 and Air Force Office of Scientific Research (AFOSR) Grant no. FA9550-16-1-0247. Pak Ming Hui acknowledges the support of a Direct Grant for Research in 2017–2018 from the Faculty of Science at the Chinese University of Hong Kong.