Abstract

Healthcare has a trend of going hi-tech. With an aging population growing more than ever, researchers and health care providers are now relying on robots to ease the symptoms of dementia and help an aging population stay where they would like, at home. Several therapeutic robots such as Paro recently introduced in the markets are manifestation of such trends. In this paper, we propose a social robot missioned to autonomously capture images of people and feed multimedia contents to a social network or to a hospital for various social activities or for health monitoring purpose. The main technical barriers of such robots include autonomous navigation, human face detection, distance, and angle adjustment for clean and better shots. To that end, we study autonomous mapping/navigation as well as optimal image capturing technology via motion planning and visual servoing. To overcome the mapping and navigation at a crowded environment, we use the potential field path planning harnessed with two competitive potential update techniques. The robot is an agent navigating in a potential field where detected environmental significances provide sources of attractive forces, while previously occupied locations estimated by SLAM technique provide sources of repelling forces. We also study visual servo technique to optimize image capturing processes. This includes facial recognition, photographic distance/angle adjustment, and backlight avoidance. We tested several scenarios with the assembled robot for its usefulness.

1. Introduction

According to a new study from the Pew Research Center’s Internet and American Life Project, four out of five Internet users participate in some kind of group in the “real” world, compared with just 56 percent of those who do not use the Internet regularly [1]. Those figures rise to 82 percent for users of social networks and to 85 percent for users of Twitter. This means that people being socially active online will more likely be social offline as well. However, the majority of modern professionals find incredibly constrained working environment with little or no chance of social networking in daily life. The primary motivation of the paper lies in addressing the resolution as to how to help or promote social networking of those professionals so that they become more socially active not only in online but in offline domain as well. Evidenced in the last decade is the evolution of social networking technologies and popular site such as Facebook and Twitter. Recently highlighted Instagram is another manifestation of such trend. However, taking photography of our own or family is, sometime, chore and we end up missing memorable moments during the daily life. Selfie becomes popular in the society and made a big impact on social networking as an enabling device. Again, the use of it sometimes never takes us to the level we all want to be in photographic quality or quantity. As a result, we propose a self-photographic device, namely, “Dali the photographer” as a breakthrough social networking device for our daily life (see Figure 1). As interests in social robots have increased rapidly, many researches have been conducted evaluations in socially assistive robots. For example, Fasola and Matarić [2] evaluated an effectiveness of spatial language based human-robot-interaction (HRI) with elders using a fully autonomous mobile robot. Fasola and Matarić [3, 4] designed, implemented, and evaluated a socially assistive robot to engage older adults in physical exercise. Tapus and Matarić [5] evaluated a music therapy feature of a social robot that aimed to maintain attention of people with cognitive impairments.

Many studies have addressed as to how social networking could affect people’s daily life and the usage of social media, especially for women, older adults, and parents. Hampton et al. [3] found that women who use more social media report less stress than women who are nonusers. The article also states that people are more aware of events happening with their families and friends when they use social media. Pew Research Center [3] reported how seniors use social media. Nowadays, six out of ten seniors use social media, and higher-income and more highly educated seniors are more likely to use social media. However, around age 75 is when the use of social media drops and it is hard for them to use social media due to physical difficulties. Perrin [6] found that 65% of adults now use social media as a networking tool and it has been nearly tenfold increased in the past decade. Duggan et al. [3] reported how mothers use social media. They said that mothers are heavily engaged in social media, both giving and receiving support to their children. However, it has been also reported that posting multimedia contents in a social network such as Facebook is yet to be improved for mothers and elders. As a part of resolutions, Dali, the photographer, is proposed and developed to investigate the impact of a photographing social robot, not only to promote daily social networking, but also to offer a breakthrough as an aging in place technology. The primary function of Dali is to autonomously navigate around people and find human faces to take pictures. Several robotic photographers have been introduced in the market and demonstrated their usefulness. For instance, a robot photographer “Luke” was built based on Turtlebot utilizing Robot Operating System (ROS) open source autonomous navigation capabilities with the help of Kinect RGB-D camera [7]. Hsu [8] also implemented a photographer robot using Turtlebot platform and facial tracking feature based on facial landmark model. ROSBOT [9] is another ROS-based robot where a few social functions such as the following people, taking photos and uploading to photo sharing websites or online album. Microaerial vehicle (MAV) has been tested as a flyer photographer [10]. In this system, a Google’s Yellowstone tablet was used as a main control unit for autonomous navigation, pose estimation, and photographing. The success of those systems, however, is still in doubt due to several factors as listed below.(1)Not cost effective(2)Not human friendly(3)Not able to navigate in crowded environment

The first factor is, in no doubt, the most important in creating desire for a product. Some of those aforementioned examples may have been successful for the cost effective solution, but not so for the second criteria. For instance, the “Luke” seems to be cost effective solution as a robotic photographer. However, due to its height limit, it is not as human friendly as other robots such as ROSBOT, which now causes the confliction with the first criteria. Navigation in crowded environments is also an important feature for social robots. In this paper, we propose a solution to satisfy all of those criteria at the same time so that more people can be involved in online and offline social activities for happier life style.

2. Robot Mechanism Design

A social robot Dali is designed and prototyped at Phomatix, LLC. Design considerations are size, low power consumption, maneuverability, and stability, to navigate indoor environments. Figure 1 illustrates Dali that is composed of a wheel based mobile platform, a selfie stick, and a smart phone. Different smart devices such as 3D or 360° cameras can be also mounted on the mobile platform. The size and weight of the mobile platform (the bottom part) are about 250 mm in diameter and 100 mm in height and 1 kg weight. Figure 2 illustrates major components inside of the mobile platform. Heavy components such as actuation mechanisms and battery are optimally distributed so that the center of mass of the robot is located at the geometric center of the robot.

The mobile platform has four wheels: two driving wheels and two ball caster wheels. The wheelbase of the mobile platform is designed to provide stability during the navigation. Two DC geared motor with encoders are used to drive the two driving wheels. A mounting boss is located at the center to hold a selfie stick (the middle part). Four sonar sensors are equipped at the upper case of the mobile platform to detect objects, and one distance sensor is equipped at the lower case to avoid falling during the navigation. As shown in Figure 3, the compliant driving wheel mechanism is implemented to adapt various surface conditions including wood floors, carpets, and tiles.

A wheel directly coupled to a DC geared motor rotates with respect to a hinge joint and a linear compression spring at the opposite side provides a repulsive force. The main function of the spring compliance is to minimize the contact force on the front and rear ball caster to be able to move over small steps such as carpet or tiles. Figure 4 illustrates a free body diagram of a simplified compliant wheel mechanism. A static moment equation at O (hinge) is formulated aswhere and are reaction forces exerting to the wheel from the ground and force generated by the deflection of the spring, is mass of the wheel and geared DC motor, is gravity, and and are the lengths between the center of mass of the wheel and the spring and hinge point O. Values of design parameters are = 0.15 kg, = 33 mm, and = 55 mm. As expressed in (1), the reaction force can be determined by adjusting the spring force. A nominal deflection of a spring when all four wheels are on the ground is 4 mm, and each spring generates 1.16 N force from =, where is a spring constant and is a nominal deflection. In this condition, the reaction force becomes 3.4 N ( 0.3 kgf); that is, about 57% weight of the robot is applied to the driving wheel and 43% is applied to the ball caster wheel, where total robot weight is 1.2 kg. The weight balance between the driving wheel and the passive ball wheel can be determined by adjusting the spring constant, and the compliance of the wheel mechanism supports navigating uneven and bumped surfaces. Dali with this compliant wheel mechanism is validated while navigating indoor surfaces including wood floor, carpet, and tiles.

3. System Architecture

The basic philosophy of the design is to make the robot simple and affordable. To that end, a personal cellphone is used as the main brain of the robot, while the mobile platform is controlled by a low-cost microcontroller. The system hardware architecture of the proposed social robot is shown in Figure 5. An Arduino 101 by Intel functions as an embedded robot controller for low level motion control. The cellphone and robot controller work as a client and server model. The cellphone works as a client to request high-level motion requirements while the embedded computer works as a server and executes detailed motion control actions.

The cellphone software also processes camera images and provides the result of face detection to the embedded controller. This cooperative work allows patterned motions and visual servoing for robotic photographing.

Therefore, the robot itself autonomously moves and runs by a state machine (Figure 6). There are five different states: navigation, spin, manual, pattern, and ideal states. Each state is triggered either by sensor inputs or by manual commands. For instance, pattern state enables the robot to move in preprogrammed pattern such as linear, circular, arc, and rectangular patterns. Manual state allows a user to remote-control the robot using a cellphone app. Navigation state is the mode of potential field motion by sonar scanning and SLAM functions.

4. Localization

One of the most challenging functions of the Dali is to navigate in collision free fashion generating random motion in search of human face (or faces) to take photos. To that end, we use the potential field path planner whereby the Dali can find objects to avoid or approach. Our approach is simple in that the robot is an agent navigating in a potential field where detected environmental significance provides sources of attractive force, while previously occupied locations estimated by SLAM technique provide sources of repelling force (Figure 9). In order to install repelling sources for navigation and path planning, we use SLAM based localization technique. A seminal work in SLAM is the research of R.C. Smith and P. Cheeseman on the representation and estimation of spatial uncertainty in [11, 12]. Other pioneering works in this field were conducted by the research group of Hugh F. Durrant-Whyte in the early 1990s [13] by which tractable solutions to SLAM are demonstrated in the infinite data limit. We implement SLAM by using a linear system model. For the state model of a time invariant linear system such thatwhere is the state vector, is the observation vector, and is the control vector, and δ are Gaussian noise for the state estimation and observation process, respectively.

By using the nonholonomic platform as shown in Figure 7, for a given control input , where is the translational velocity and is the rotational velocity, the motion model or the odometry model becomes

Then SLAM becomes a state estimation problem of . Recursive Bayes filter is the most common in state estimation. By definition, the belief of Bayes filter will be

Using the conditional probability equation of ,where is a normalization factor for the probability in case the exceeds “1”. To be more practical,

It is known that a stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present states) depends only upon the present state, not on the sequence of events that preceded it. Therefore, by applying Markov assumption of undependability on prediction of the current state observation to the past observation if the current state of the system is given, the belief will be simplified as

If we use the law of total probability, by which , the above equation will bewhere A = and B = . If we apply the Markov assumption again on both conditional probabilities on the right hand side of the equation, and can be ignored. Taking into consideration of that cannot make any contribution to , the above equation becomesor

Now the last equation is used as a prediction step such that,

In order to increase the likelihood of the predicted state to be more consistent to the observation by a sensor, by using (4), we propose a correction step such as

The probability of in (11) is called a motion model, and the probability of in (12) is called a sensor model. If we substitute the system state model to the motion model,where and is the Gaussian noise in state estimation. here is the mean value of each state variable. In the same token, the linear observation model will become

where and is the Gaussian noise in measurement. is again the mean value of the measurement. In state estimation, the challenge lies in minimizing the noise of and for accurate state estimation. Kalman filter is known to be optimal in state estimation when the noise is in Gaussian distribution. First, we define the covariance of uncertainty, , in the state estimation such as

In order to correct the system state due to the uncertainty in equation (13), we use the following Kalman gain:

Then, the system state, , is corrected by the Kalman gain in (14) such that

Equation (17) will replace the equation of to in (13). The estimator extracts the most likely position from the possible location range matrix which corresponds to the highest probability from the probability matrix. The probability distribution is updated continuously during the recursive state estimation. Sonar sensor readings provide observer vectors to the Bayes filter. The state of the robot, then, is estimated for use in potential field motion planner as shown in Figure 8.

5. Motion Planning

Primary goal of the motion planning is to find environmental significance to autonomously navigate in a given environment. As mentioned earlier, we propose the potential field technique where detected environmental significance provides sources of attractive force, while previously occupied locations estimated by SLAM technique provide sources of repelling force. The original gradient is drawn from highest repelling on the left to the highest attraction on the right side. During the planning process, online attraction and repelling fields are updated depending on the conditions (Figure 9). Detail logics of each part in the planner shown in Figure 9 are in Algorithms 13.

do while
= SLAM
= Visual_scan()
If )
else
= Update_potential
end if
end do
+
Return xt,
/ update attractive object /
If
end if
If
end if
If
else
end if
/ update repelling object /
If
else
end if
Return

6. Robotic Photography

Dali, the social robot, utilizes a cellphone as a main control unit for robotic photography. Cellphone App as shown in Figure 10 was developed for Android operating system. The app takes input from users regarding robot’s required motions, detects human faces from video stream, and implements visual servoing in communication with the robot motion controller. The app provides several photographing modes (navigation, arc motion, circle motion, etc.) and animation features of eye for the events of searching and finding people.

The app uses Google Mobile Vision API [10] for face detection and tracking functions. The API is capable of detecting single or multiple faces. Therefore, the app can apply different photographing strategies based on the number of people. The app allows the user to set photographing parameters on desired face size and location in photograph frame. This information is used as a basis when the robot generates desired visual servoing goals. For example, in case multiple faces are detected, the app can generate a desired input to include all of the detected faces in the camera, while requesting specific patterned trajectory. Figure 11 illustrates a flow chart of the robotic photographing algorithm in communication with robot controller.

User interface provides three control modes: manual, automatic, and patterned motion modes. Face detection is enabled in navigation and patterned motion modes. Faces are detected and tracked from any duration of appearance in video stream [10], and the face detections generate asynchronous events. In response to these events, the app processes the number of faces and sizes of each face compared with the user-predefined desired parameters. These parameters include individual face size ratio compared to camera frame, location of faces in camera coordinates, and their tolerable error ranges. Although suitable camera positions are determined by these geometrical parameters not by aesthetic point of view yet, the algorithm implements a closed loop visual servoing and photographing using asynchronous face detection events.

7. Experiments

In order to verify the performance of the proposed robot, we tested 10 times of actual photo taking tasks (Table 1). Three performance measurement parameters are identified: percent coverage of the given environment, number of photos taken, and convergence to the target point from the start point. For simplicity, percent coverage is calculated by adding the path length traveled multiplied by the sensing range divided by the actual area of the space. Number of photos is actual number taken by the robot during each navigation experiment. Convergence is the true or false Boolean value depending on the success or failure in reaching the goal point. Note in Figure 12 that there is a proportional trend between the percent coverage and the number of photos taken. Therefore, better coverage assures more photos in general. Two failures in convergence test was due to a local minima around a desk and chairs. While in local minima, the robot was not able to travel to the goal in a limited time (10min). Better strategies to resolve the confliction between increase and decrease of the potential will be necessary.

To investigate a social effect of the robotic photography by Dali, a user study was conducted in 3 steps. In the first step, a participant sat on a chair in a laboratory environment as shown in Figures 13 and 14. Second, Dali was set for navigation mode to move around in the space. Once the robot recognized the target participant’s face, then the robot smoothly approaches to the participant until it is close enough to take a picture. After approaching, the robot automatically takes a picture of the seated participant’s face. Third, the picture was uploaded to the participant’s social network webpage (in this study, we used the participant’s Facebook). This user study was conducted for 2 weeks, Monday through Friday, and responses were collected from the participant’s online friends and family for the posted photos via the participant’s Facebook. The collected responses with pros and cons of experiments are summarized in Table 2.

8. Conclusion

In this paper, we proposed Dali, a social robot, whose mission is to autonomously capture images of users and feed pictures and live motion clips to social networks or to hospitals for health monitoring purposes. The robot autonomously navigates in indoor environments with collision avoidance capability. To implement full autonomy in navigation we proposed a potential field based path planning. A Kalman filter-based localization method generates a repulsive potential field, while sonar based object detection as well as visual servo based facial recognition generates an attractive potential field. We tested several scenarios with Dali including a user study conducted to evaluate the usefulness of the social robot. As a result of the user study, we found that the robotic photography has been very beneficial, especially for those who are apart from their family or friends. Some of the positive feedback from the participant’s parent stated that they enjoyed being able to see their child’s activities daily basis in far distance. It may result from the fact that real time up-to-date information in social media helps them feel more comfort, staying close, and intimate compared to the case of watching occasionally updated photos. The participant mentioned that the robotic photography made it easy for the user to keep up with social media, so that it is beneficial for a user who does not have time to manage their social media daily basis. It is addressed that autocharging capability seems to be a must-have feature to automatically activate the robot every day. In addition, conversation or Q/A function based on voice recognition and AI such as Amazon Echo will much enhance the experience of the social robot in daily life.

Data Availability

Most of the data described in the paper is available to the public with certain limitation of use. A third party is welcome to contact authors for the use of any data described in the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.