Abstract

Interactive surfaces are only just beginning to break into the market, and they still do not offer the advanced functionality demonstrated with many lab prototypes. The path from a prototype system to a finished product for use in real-world scenarios is a long one, and many obstacles must be overcome. The design of an interactive multitouch table had to address issues like optical recognition, hardware design, and ergonomics. This paper describes in detail the construction of a large, robust multi-touch table called mrT. It will show how to solve major problems of the diffuse illumination technique and other challenges of constructing a large-screen, high-resolution, self-contained interactive multitouch surface that not only serves as a development system but can be deployed in the real-world. Additionally, to further motivate some of the design decisions, especially why the diffuse illumination technology was chosen, this paper will discuss related on-going research projects on the application side.

1. Introduction

Nearly twenty years after its invention [1] in the 1960s the first marketed integrated mouse was shipped with a Xerox Star 8010 Information System in 1981. Just one year later in 1982 a new paradigm of human computer interaction was described [2] and firstly implemented in 1985 [3], the multi-touch table. Again twenty years later we are looking at a technology starting to change the way of human-machine interaction once more.

As the invention of the mouse shows, user interfaces need well-designed hardware to fulfill their purpose. Especially for emerging interface technologies, developing and improving the hardware side are just as important as considering the software side. This is certainly the case with interactive smart surfaces.

First commercial products like JazzMutant's Lemur or Microsoft's Surface are already available for consumers but tend to be insufficient for many scientific and serious professional use cases, due to their restrictions in robustness, optical precision, or screen size. Until high-end commercial devices will be widely available, there is a need to design adequate multi-touch solutions to drive the next generation of interactive applications, in many different areas from geographic data visualization and planning applications to collaborative design, games, 3d modeling, and animation. High-quality devices are needed as a testbed for inventing and refining those applications. While it is possible and practical to develop applications on not fully developed hardware, extrapolating and anticipating future developments are more complex and less convincing than using sophisticated systems.

This work will describe the construction process of an advanced multi-touch table (Figure 1) for scientific and professional use as well as emerging problems and their solutions. The mrT is a unique system regarding screen size, technical robustness, and optical precision in terms of resolution and infrared light distribution. Furthermore, this paper will present some of the latest research projects on the applicationside and how they will benefit from the use of DI technology in general and the mrT design in particular.

Although multi-touch interaction just recently started to get attention from public media, it has a scientific history. Various techniques with different characteristics exist and are actively researched today. An overview of scientific work on multi-touch interaction is given by Buxton [4]. Despite this history, published works mainly concentrate on elementary techniques and algorithms as well as prototypical designs. Commercial multi-touch systems set out to make these techniques work in the real world, but little in-depth technical information is published on these products. This section will present examples of closely related work concerning different approaches to build interactive surfaces.

The tracking technology of frustrated total internal reflection or FTIR for short, as one way to realize multi-touch interaction, was introduced by Han in 2005 [5]. Han gives quite detailed descriptions of the crafted prototype, regarding, for example, the size of about  cm, the utilized “Rosco" foil, and the camera, which has a resolution of pixels at 30 frames per second.

Gesture recognition using an FTIR table has been delineated by Kim et al. [6]. The presented system is larger ( cm) and equipped with an inclined surface. A table employing FTIR is able to detect touches on the surface very well if an appropriate overlay material is chosen; false detections of fingers above the surface hardly occur. However, the usage of markers is impossible as patterns on the surface cannot be detected.

Furthermore choosing the right overlay material is very important for recognition stability and user ergonomics, and most scientific papers in this area are unspecific in this regard. Thereby it is not trivial to build a comparable system by their description. However, there exists a large community of hobbyists and practitioners on the Internet discussing all kinds of overlay materials and reporting their findings. Still, due to the rather “chaotic” nature of this process one has to “sift” through literally hundreds and thousands of forum discussions or personal blog articles.

Using diffuse illumination or DI for short is, in contrast, more flexible in terms of surface material, thereby providing better user ergonomics. A prominent feature of DI is that finger and marker (DI allows detection of arbitrary shapes on the surface by objects reflecting IR light.) detection can be handled by employing established image processing techniques. DI also allows to track objects in some distance to the surface, allowing three-dimensional interaction to a certain extent. However, all of the above features require a well-designed illumination, projection, and tracking set-up.

The ReacTable [7] is an instance for the application of DI and was developed with usage as a musical instrument [8] in mind. The employed framework [9] provides excellent support for the usage of markers, as this was the intended goal of the approach. Although finger touch detection was not the primary focus, it is supported in current versions. While the table is certainly famous in conjunction with its application as a tangible music instrument, it has not been designed for high-resolution output and tracking.

Dietz and Leigh [10] described a multiuser touch technology based on capacitive detection in 2001. Their prototype provides chairs which are used to distinguish between specific users. Recently, tables using the Diamondtouch technology have become commercially available. Nevertheless, the standard sizes are limited to 42-inch tables and they suffer from the disadvantages of a frontal projection. Another implementation based on the same technology was crafted by Rekimoto [11]. It can detect shapes of hands and distinguish special objects equipped with capacitive tags. This surface has a size of  cm. A major advantage of the employed technology is the ability to sense some centimeters above the surface, enabling hovering input. However, capacitive detection is not able to reach the same resolution as optical tracking in terms of shapes. Furthermore, interaction is only possible with markers and objects specifically designed for capacitive tracking.

Many implementations of multi-touch devices have in common that they are rather immobile and are therefore limited to stationary use. In contrast, the Playanywhere system by Wilson [12] was built with mobility in mind. It tracks from above, using shadows and optical flow via IR. While this certainly has great potential, the current system is still in a very early stage and certainly not ready to be operated by consumers.

Thinsight, introduced by Izadi et al. [13], allows multi-touch interaction with displays thin enough to be used for mobile computers. For that purpose, several IR light sources and sensors are integrated into an LCD. This technology seems very promising for many application scenarios; however, at this point it is still in an early development stage.

Among existing commercial products providing multi-touch interaction Microsoft's Surface (http://www.microsoft.com/surface)—a medium-sized table based on DI—and the Apple iPhone (http://www.apple.com/iphone)—a small-sized smartphone—might be the most known. The recently released Apple iPad (http://www.apple.com/ipad)—a multi-touch enabled tablet computer—is also already a huge success and further establishes multi-touch interfaces in mainstream computing. Additionally, small to mid-sized LCD multi-touch displays (mostly based on capacitive sensing) can nowadays be bought from regular hardware vendors. However, these displays are still very limited in their capabilities (most can only detect 2–5 simultaneous touches). As already mentioned, information on the underlying technology of most commercial products is scarce. Therefore, an in-depth and general comparison is hardly possible with the exception of two commercial products, that is, the Microsoft Surface and the Epson xdesk (http://www.impressx.com). In addition to the presented system, the authors have access to an xdesk as well as occasional access to a Surface. Without conducting a formal study, the impression is that the presented system has a superior size, resolution, and illumination system, the latter providing a good compromise between tracking robustness and flexibility concerning advanced techniques like area-based versus point-based interaction or interaction near (slightly above) the surface, setting it apart from today's commercially available systems.

3. Table Design and Construction

The two common vision-based object/finger tracking technologies of FTIR and DI both have unique advantages and disadvantages, and implementing them still requires many case-to-case solutions and a lot of testing.

The DI technique was chosen because it makes recognition and tracking of objects as well as fingertips possible on the surface. This is achieved by illuminating the projection surface with an infrared light source from below. At the surface, it passes through a diffuser, which is required for the image projection. Any object on the surface reflects the light back through the diffuser. A camera directed at the projection surface from below can then register the reflected light.

This paper addresses many disadvantages of the DI technique commonly cited: the difficulty to achieve an even illumination of the surface, low contrast of fingertip blob images making them harder to recognize, and a greater chance of false blob detections compared to an FTIR approach. The presented system manages to achieve a largely homogeneous illumination by crossing two high-powered light sources and shadowing their direct reflections in the casing interior. Figure 2 shows a raw image taken by the IR camera in a room lit by daylight showing the achieved homogeneity of IR illumination with a depth of 8-Bit. All values are ranging between a luminosity of 7 and 123 resulting in an image median of 73. To give a better impression on illumination quality Figure 3 shows the differences from the image median of the raw image in a pseudo color representation.

A simple normalization of the light intensity, storing per-pixel minimum, and maximum intensity values, allows to use absolute luminosity, with no need to employ local gradients. Figure 4 shows an image with applied normalization. Due to high contrast resulting from homogeneous illumination the marked features in Figure 5 could be extracted by only applying three different thresholds: 110 fingertips (green), 50 hands (teal), 25 arms (blue).

Advantages of homogeneous IR illumination will also be discussed in detail in Section 5.

Multiuser collaborative scenarios demand a lot of screen real estate. A large surface requires a high resolution for adequate display of content. However, projectors with HD resolution still require a certain projection distance. The problem to address was fitting the projector into a reasonably sized casing, which was solved by directing the projection over three mirrors.

The presented multi-touch table mrT is 80 cm wide, 130 cm long, and 112 cm high. The projection surface is 130 cm (51 inches) in diagonal and shows an area of 0.72 sqm, which is nearly three times the size of Microsoft's Surface. It features Full HD ( pixels) resolution and a four-speaker audio system. Easily accessible hubs for USB, firewire, ethernet, power as well as bluetooth ensure connectivity.

The table was designed for collaborative use for up to six users [14]. To support spontaneous collaboration, the tabletop height was made to be comfortably usable in a standing position or from a bar chair. Furthermore the edge of the projection surface was kept thin, so that users do not need to bend forward, avoiding aches from bad posture. For users to stand as close as possible to the surface the table is narrower at the bottom, offering foot space. This was realized by two cuboids resting on top of each other, the upper head supporting the table surface, the lower, narrower part being the body.

To support multidevice work, two hatches were added that transform into table extensions at the far ends. Users can place their mobile devices such as laptops or peripheral devices conveniently on them, next to the surface, or even just pens, papers, and cups of tea (cf. Figure 6). From the outset, the design elements of the table are going beyond those of a lab prototype. As the design was intended to survive the use in public spaces such as conventions, it is made as self-contained as possible, with a rugged exterior and the ability of easy transport and maintenance. As a rear projection system, the projector lens shift leaves enough room for a computer at one end of the body.

Materials for casing and surface were chosen with thorough usage in mind. Within a solid frame of Abachi wood bars  mm thick are 19 mm plastic laminated flake boards. In contrast to many existing projection surfaces made of acrylic glass, the surface is made of mechanically stable glass, which is also more resistant to scratches and stains.

The interactive surface should be able to run for many hours. With the DI technique, the inside of the table can get rather hot. To ensure appropriate cooling, a ventilation system that augments the natural upward movement of warm air was used. The floor of the table has ventilation holes beneath projector, PC, and the infrared light power adapters. The latter are equipped with ventilators for additional air suction. Further, the long sides of the table support ventilation shafts beneath the infrared light boards. Warm air is exhausted from under the surface by an array of ventilators (cf. Figure 6), which push the air downwards between hatches and narrow sides. This keeps the average interior temperature at around Celsius under continuous usage.

Despite its stationary, heavy nature, the table was designed to be transported as easily as possible. This was realized by deploying electrically extensible wheels driven by linear actuators (Linak LA28). The wheels are only extended for transport. In normal use, the table stands on robust rubber pads.

4. Interaction Surface

Many DI tabletop systems use short-range projectors with wide angle lenses, but these come at the cost of distortions and color fringes. The projection capabilities of normal-range HD-projectors require a projection distance of about 150 cm for a sharp 16 : 9 image with a diagonal of 130 cm. In order to project sharp HD images onto the surface without artifacts, a s-shaped redirection of the projection cone via three mirrors was applied (Figure 7). For this it was possible to leverage the fact that many projector models (The presented system uses the Panasonic PT-AE2000E.) have a two-dimensional lens shift functionality, so it was possible to “fold” the projection efficiently and create room for further hardware, for example, a PC.

A three mirror setup is to some degree sensitive to vibrations. The presented design solves this problem by using welded and rigid metal frames mounted with rubber rings as dampers. The mirrors are fixed on these metal frames. A prior system built without frames suffered from vibrations as well as from another effect: due to temperature differences the used mirror mounting warped and therefore distorted the projection. Fixing the mirror mounting onto the metal frames solves both problems adequately.

A Eye USB camera with 1/2 CMOS monochrome sensor and  pixels was used. The view cone of the camera was directed over the three mirrors, as this reduced wide angle distortions. A Lensagon CVM45100 zoom lens was employed for an optimal mapping of the camera field of view to the projection surface. The camera resolution was flexibly adjusted for handling different trade-offs between resolution and framerate. Best results were achieved at a resolution of pixels, that is, matching the 16 : 9 aspect ratio of the surface. At this camera resolution an overall optical tracking resolution of approximately 30 ppi is achieved.

The projection screen is built as a two-layered glass “sandwich” with the diffusion foil in between. The distance traveled by the light between the diffuser and an object on the surface should be minimal. Thus the top glass layer of the sandwich is 2 mm thick, while the bottom layer carries the weight and is 6 mm thick.

The diffuser needs to fulfill several requirements: it needs to diffuse enough (visible) projector light, so that the projected image is visible. Too much diffusion makes a too dark image while too little results in a “directed” image, meaning that the light intensity of the projected image varies depending on where the user stands. The diffuser needs to be as permeable for infrared light as possible, to allow infrared light to pass through and be reflected back again. To determine the optimal trade-off between these contradicting requirements different test trials were made with rear projection screens and color filters. Rear projection screens proved to be great for displaying the visible image but were not very permeable for infrared. The LEE diffusion filter “Hollywood Frost 255” offered the best compromise.

5. Infrared Illumination

A common problem of existing DI systems is external light sources, such as daylight interfering with the surface illumination, thus decreasing tracking performance. For the interactive surface the infrared illumination was realized with high-power lights. With a strong source of infrared light the signal-to-noise ratio was significantly improved. Further, the illumination solution eliminated the need for a computationally expensive adaptive thresholding algorithm, which was replaced with a much simpler static calibration scheme, providing more stable thresholding at even lower CPU cost.

A number of 14 infrared beacons were built from 2520 IR-LEDs of the type SFH-4550, which have a small cone of emitted light. This allows to focus the light efficiently, rather than letting it pass through the body and increase reflection disruption in the camera. The LEDs are soldered onto 14 circuit boards with 180 LEDs each. The LEDs form 9 rows along each long side of mrT. Every row of LEDs is individually oriented so as to illuminate a part of the surface (Figure ).

Electrically, the LEDs are arranged to a series connection in groups of 30 (Figure 8). This enables efficient and low thermal power dissipation loss, the current regulation employing LM317 and 48 V supply voltage. The LEDs are driven for 64 mW power consumption, so the overall LED power is about 160 Watts.

As the rows of LEDs do not continue at the narrow sides of the table, these border regions lack illumination power. Furthermore, the LED arrays illuminate installed equipment on the inside of the casing at the narrow side, which yield artifacts in the camera image. Both problems were solved by attaching reflecting mirrors parallel to the narrow sides (Figure 9). This lets the LED rows continue virtually and results in homogeneous infrared illumination on the projection surface.

6. Shadowing Infrared Illumination

DI approaches to multi-touch tracking have to avoid direct reflections of infrared light sources in the view of the camera. This problem intensifies with the high-power illumination described in the previous section. Many DI approaches use acrylic glass for the projection surface, as it is relatively cheap and reflects less of the infrared light from below, which otherwise interferes with the camera tracking. For reasons described in previous sections, a glass surface was employed solving the reflection problem by shadowing direct sources of reflection with panels (Figure 10).

The arrangement of the devices is determined by some constraints. Without shadowing, the cameras receptive field is very large. The reflection of the line would limit the possible space under the surface line on the right-hand side of . This means that the IR-spotlights would have to be mounted on the right-hand side of directly under the surface. This would make it quite hard to realize a homogeneous illumination and would produce blurring effects in the IR-image because of parallax effects (although IR-rays underlie optical refraction and are bent in direction to the surface normal). Lower mounting still had to be on top of the reflected line of and this would shift the mounting position to the right. This would require a very broad surface casing frame which should be avoided because of usability mentioned above.

The solution to the problem is a shadowing panel, which borders the cone of the projector and establishes camera blind space below the surface. The illumination has to be emitted from the opposite half of the space under the surface, because the space between panel and surface would not be reachable with a direct illumination. This solution works only for spotlights were LEDs are adjustable (bendable) in the plane depicted in Figure 10. The LEDs have to be laid out on the spotlight boards accordingly.

The optical axes of two spotlights LEDs are shown in Figure 10. They project under the surface equidistant at Points . At first view, this seems to yield an inhomogeneous illumination, because of decreasing density of the projection areas in direction to the surface border, caused by increasing projection length. But at a closer examination the illumination power from adjacent LED columns also increases in this direction, assuring constant density.

The length of the surface is divided into sections of two (left and right) spotlights with LEDs and one section in between. Thus the length of a section is given by

The position of the projection of an LED indexed by is given by .

All Points are aligned with the surface. Thus their -components are zero. Their -components are negated because of the coordinate system. They are calculated by

The LEDs projection on the surface should have a certain size. It should be big enough to enable blurring of single LEDs illumination variances, and it should be small enough to prevent unwanted reflections in the mrT housing. This is also a question of power efficiency.

For LEDs the strength of illumination depends on the angle of view relative to the optical axis. The angle should border the range, where the dominant part (e.g., 90%) of the illumination is emitted. Optimally, the LED type should be selected in dependence of its projection angle and the projection distance, in a way that the projection on the surface shows a radius of approximately . This size yields a good blurring and prevents missing illumination at due to the disruption of continuity of LED arrangement at the border. Figure 10 shows the optimal setting for LED. The range of the projection cone is bounded by the (dotted) line . The projection reaches the projection point of the first LED of the opposite IR-spotlight. This line determines the lower edge of the shadowing panel. The first equation expresses that is located on the line :

The shadowing panel borders the projection cone. Therefore is also located on :

The camera view geometry is given by a reflection at . Applying the equal alternate, incidence, and reflection angle , the following constraint can be stated:

The lower corner of the IR-spotlight is located on and : The upper corner of the spotlight is located on accordingly: In the setting, the IR-spotlight is arranged perpendicular to the surface, and the length between and is :

Arrangements with other angles are possible. In this case, the length can be portioned on the differences in and by applying Pythagoras' rule.

From equations Point can be calculated. The structure of devices for the opposite side can be calculated by mirroring, except the values and , which are negated in the opposite setting.

7. Applications

More traditional single-point graphical user interfaces, whether they are controlled by touch or the usage of a pointing device like a mouse, did not have to deal with the question of how to handle multiple cursors. While this already is true for a single user, it is an even bigger challenge to deal with more than one user at the same time. Most of the applications in the market today have not been built with multiple simultaneous users in mind. Making these applications suitable for multi-touch tabletop interfaces demands for approaches that deal with concurrent application control.

Many different research directions for possible applications are currently pursued. Additionally, there is still interesting research on the tracking side. DI technology in contrast to FTIR and capacitive approaches makes it possible to some degree to track objects or fingers not only directly on the surface but also up to some distance over the surface. Although this still needs further investigation, first tests with the hardware in this regard have already been very successful. On the application side this feature could enable very interesting interaction possibilities by introducing a “hover” state in addition to just “touch” or “no-touch”.

Already many researchers have presented very interesting work. The following paragraphs will give a brief overview of current works to give an impression of the variety of possible applications. Again, it is important to stress that these applications are currently tested only on self-conceived hardware devices not providing heterogeneous hardware capabilities. These approaches can benefit from the high-resolution output of the presented hardware and from the tracking related performance due to its sophisticated illumination. Future application iterations could also benefit from possible “hover” and user identification features.

Before listing some very interesting end-user applications, it is important to take a look at current research that deals with the fundamentals of multi-touch interaction and more general questions of integrating multi-touch with application frameworks.

Hancock et al. [15], Benko et al. [16], and Moscovich and Hughes [17] (all in 2006) have investigated fundamental problems and solutions to multi-touch interaction. Hancock et al. [15] investigated different rotation and translation techniques for 2d manipulation of virtual objects on a multi-touch table. Benko et al. [16] presented techniques to overcome the typical occlusion and accuracy problems of multi-touch interaction that result from the physical size of fingers compared to typical GUI elements like buttons. Moscovich and Hughes [17] present interesting approaches and application ideas, most notably in the area of animation, to leverage the additional degrees of freedom offered by multi-touch input by using several fingers for concurrent control. In a more recent work Cao et al. [18] employed the optical recognition of the contact shape of an object on an interactive surface for calculating virtual forces to enable physically “plausible” interaction. While most of the works still deal with 2d interaction on multi-touch displays, recently, Hancock et al. [19] and Reisman et al. [20] have started to investigate not just 2d but 3d interaction on multi-touch surfaces.

Simple “ad hoc” translation and rotation techniques as well as more sophisticated techniques were successfully realised on the presented hardware. The advanced methods were based on approximation methods as well as on affine transformations into a variety of frameworks and API, for example, OpenSceneGraph (OSG), Microsoft XNA, Adobe Flex/Flash, and Blender. Generally, an integration of touch processing based on the TUIO protocol is easy to do from a technical point of view. The real challenge is the concurrent handling of several events and handling the communication between several tasks inside an application that have to share those events. Here more general research might provide the best practices and techniques in the future. Moreno et al. [21] have started research in this direction, providing some insights into how they integrated the Ogre3D graphics engines with multi-touch.

Among other topics, two interesting applications of multi-touch surfaces, especially in combination with tangibles, that is, markers, and possible extensions like hovering detection, and user identification, are certainly 3d modeling [22] and animation [23]. Moscovich et al. [24] demonstrated how multi-touch can provide an intuitive access to animation and might even increase creative freedom. Gingold et al. [25] already proved the suitability of multi-touch for texturing, which is an important step in the 3d modeling pipeline.

Other examples for possible applications include digital painting [26], manipulation of graphs and tag clouds [27], collaborative design activities [28], “virtual tinkering” [29], or entertainment games [30] as well as serious games, for example, health games [31].

8. Conclusion and Outlook

This paper on the design and implementation of a large interactive table demonstrated how existing multi-touch techniques can be used and improved for a large, ergonomic, and rugged design. The presented work emphasized how real-world issues such as uncertain lighting conditions and wear and use reflect the requirements of many users and their work practice. For this end, several technical problems appearing in DI systems were solved, such as the use of multiple mirror setups and homogeneous illumination. The presented interactive surface is evenly illuminated by high-powered IR-spotlights, producing clear and high-contrast images. As shown, these images can be processed with simple static thresholding mechanisms. To minimize reflections the IR-spotlights were shadowed, and an analytical description for finding the correct spotlight position was developed. As a large surface requires high resolution, this work shows how to “fold” an HD-image over three mirrors to produce a self-contained rear-projection solution, which is still portable.

The design as well as the construction of a system like mrT is a time-consuming task. Approximately 80 man-days were spent for design and construction. Building a complex system like the presented one for the first time demands a certain overhead, which only applies once. The overhead for the mrT system was 37 man-days. Building a similar system would take around 40 man-days without the mentioned overhead. A simplified version of the mrT design could be build in even less time of approximately 30 man-days of work.

A feature in development is a sensor system that will be able to detect the amount and position of users, enabling user position-aware applications [14]. With such an approach collaborative experience can be improved even further.

It is important to make the step from lab prototypes to working prototypes that are able to be used “out there”. The presented design supports collaborative work with the multi-touch table at the center of a multiuser working process. Applications that fully leverage the possibilities of multi-touch systems will most probably become a major research and development field in the future. The fundamental basis for this is well-designed, deployable hardware, just as the mrT prototype is the basis for the future work on smart multi-touch interfaces and applications.

Acknowledgment

This work was partially funded by the Klaus Tschira Foundation.