Abstract
Mouse picking is the most commonly used intuitive operation to interact with 3D scenes in a variety of 3D graphics applications. High performance for such operation is necessary in order to provide users with fast responses. This paper proposes a fast and reliable mouse picking algorithm using graphics hardware for 3D triangular scenes. Our approach uses a multilayer rendering algorithm to perform the picking operation in linear time complexity. The objectspace based raytriangle intersection test is implemented in a highly parallelized geometry shader. After applying the hardwaresupported occlusion queries, only a small number of objects (or subobjects) are rendered in subsequent layers, which accelerates the picking efficiency. Experimental results demonstrate the high performance of our novel approach. Due to its simplicity, our algorithm can be easily integrated into existing realtime rendering systems.
1. Introduction
Mouse picking, as the most intuitive way to interact with 3D scenes, is ubiquitous in many interactive 3D graphics applications, such as mesh editing, geometry painting and 3D games. In many Massive Multiplayer Role Playing Games (MMRPGs), for instance, thousands of players compete against each other, and the picking operation is frequently applied. Such applications require picking to be performed as fast as possible in order to respond to players with a minimum time delay. In recent years, programmable graphics hardware is getting more and more powerful. How to make full use of the coprocessors in the picking operation becomes important.
The WYSIWYG method, which takes advantage of graphics hardware to rerender scene objects into an auxiliary frame buffer, was first proposed by Robin Forrest in the mid1980s and used in 3D painting by Hanrahan and Haeberli [1]. In their method, each polygon is assigned a unique color value which is used as an identifier. Given the cursor position on the screen and the id buffer, the picked position on the surface can be found by retrieving data from the frame buffer. However, this approach has weaknesses for complex scenes in that all objects in the view frustum must be rerendered. This may take a long time for complex scenes and therefore lower the picking performance. By integrating the WYSIWYG method and hardware bilinear interpolation [2], Lander presented a method to calculate the exact intersection information, that is, the barycentric coordinate in the intersected triangle. By setting additional color values with (normalized with floatingpoint precisions) to the three triangle vertices respectively, he calculated the barycentric coordinate by interpolation after the rasterization stage. However, the computed barycentric coordinate is in the projected screenspace but not in the objectspace, which may restrict its application.
In this paper, we propose a simple, fast and reliable picking algorithm (FRMP) using graphics hardware for 3D triangular scenes. By combining the multilayer culling approach of Govindaraju et al. [3] with a GPUbased implementation of Möller and Trumbore's rayintersection test [4], the picking can be performed in linear time complexity. Our approach has the following features.
(1)It is fast—our approach is 2 to 14 times as fast as the traditional GPUbased picking one. (2)It is reliable—our approach performs the operation in objectspace, and the exact intersection information can be computed. (3)It is parallel—the raytriangle intersection detection is implemented as a geometry shader. (4)It is simple—our novel approach operates directly on triangular meshes and can be easily integrated into existing realtime rendering systems.The rest of the paper is organized as follows. Section 2 reviews some related work. Section 3 describes our new algorithm, whereas experimental results and discussions are presented in Section 4. We conclude the paper and suggest future work in Section 5.
2. Related Work
Intersection detection is widely used in computer graphics. The mouse picking operation can be performed by an ordinary rayobject intersection test and accelerated by lots of schemes for high efficiency.
The methods for interference detection are typically based on bounding volume data structures and hierarchical spatial decomposition techniques. They are Kd trees [5], sphere trees [6, 7], AABB trees [8, 9], KDOPs trees [10], and OBB trees [11]. The objects (triangles) are organized in clusters promoting faster intersection detection. The spatial hierarchies are often built in the preprocessing stage and should be updated from frametoframe when the scene changes, which is not appropriate in most cases for mouse picking.
Hardware occlusion queries are also used in collision detection for large environments to efficiently compute all the contacts at high frame rates by Govindaraju et al. [3, 12, 13]. These GPUbased algorithms use a linear time multipass rendering algorithm to compute the potentially colliding set. They even achieve interactive frame rates for deformable models and breaking objects. In their method, the objects (triangles) list can be traversed from the beginning up to the end and thus no spatial organization (KD and other trees) are required. The WYSIWYG method for mouse picking, which was first proposed by Robin Forrest in the mid1980s and used in 3D paint by Hanrahan and Haeberli [1] and further studied by Lander [2], AkenineMöller and Haines [14], belongs to this class. Its efficiency is high in many cases. However, it has limitations as discussed in the introduction section. CPU methods for picking objects were introduced by [15] in the Direct3D platform and by [16] in the OpenGL platform. However, their efficiency decreases dramatically as the number of input primitives increases. Motivated by the multilayer culling approach of Govindaraju et al., we do not construct a time consuming hierarchy. Instead, we use a multilayer rendering algorithm to perform a linear time picking operation. In this paper, we perform the exact objectspacebased raytriangle intersection test [4] in a geometry shader by taking advantage of its geometric processing capability. The overall approach makes no assumptions about the object's motion and can be directly applied to all triangulated models.
Some acceleration techniques for realtime rendering need to be applied in our method. Triangle strips and view frustum culling were introduced by [17, 18], respectively. It is possible to triangulate the bounding boxes of objects as strips and to cull away objects that are positioned out of the view frustum. Hardware occlusion queries for visibility culling were studied by [19–21]. GPUbased visibility culling is also important in our algorithm.
3. Hardware Accelerated Picking
Our mouse picking operation takes the screen coordinate of the cursor and the scene to be rendered as input, and outputs the intersection information, such as object id, triangle id, and even the barycentric coordinate of the intersection point. In this section, we first present an overview of our algorithm and then we discuss it in detail.
3.1. Algorithm Overview
Our FRMP method exploits the new features of the 4th generation of PCclass programmable graphics processing units [22]. Figure 1 illustrates the algorithm workflow. The overall algorithm is outlined as follows.
The novel multilayer rendering pass on programmable graphics shaders is outlined below:
(1)Transform the pervertex position to the view coordinate system in the vertex shader. (2)Perform the objectspacebased raytriangle intersection test in the geometry shader, output a point with picking information if the triangle is intersected. The  and components of the intersection point are set to 0, and the component is assigned as the depth value of the point. Then the point is passed to the rasterization stage. (3)Output the picking information directly in the pixel shader.3.2. New Features in the Shader Model 4.0 Pipeline
The Shader Model 4.0 fully supports 32bit floatingpoint data format, which meets the appropriate precision requirement for general purpose GPU computing (GPGPU). The occlusion query can return the number of pixels that pass the testing, or just a boolean value indicating whether or not any pixel passes the testing. In our case, we only need the boolean result that whether some objects are rendered or none are rendered.
The Geometry Shader, which is first introduced into the shader model 4.0 pipeline, takes the vertices of a single primitive (point, line segment, or triangle) as input and generates the vertices of zero or more primitives. The input and output primitive types need not match but they are fixed for the shader program. We use a triangle as the input primitive, as the raytriangle intersection detection needs to be implemented here. We get a point as output. If the intersection test is passed, a point primitive with intersection information is returned. If the test is failed, no point is output.
3.3. Intersection Test in the Geometry Shader
In this section, we present the rayintersection test introduced by Möller and Trumbore [4]. We implement the algorithm in a geometry shader by taking advantage of its geometric processing capability.
A ray, , is defined by an origin point, , and a normalized direction vector, . Its mathematical formula is shown in (1): Here the scalar, , is a variable that is used to generate different points on the ray, where values of greater than zero are said to lie in front of the ray origin and so are a part of the ray and negative values lie behind it. Also, since the ray direction is normalized, a value generates a point on the ray that is distance units away from the ray origin.
When the user clicks the mouse, the screen coordinates of the cursor are transformed through the projection matrix into a viewspace ray that goes from the eyepoint through the point clicked on the screen and into the screen.A point, , on a triangle is given by the explicit formula (2). where is the barycentric coordinate, which satisfies and . The point of intersection between the picking ray, , and the triangle, , satisfies the equation , which yields: An illustration of a ray and the barycentric coordinate for a triangle are shown in Figure 2. Denoting , and , the solution to (3) can be easily obtained by using Cramer's rule [23]: As a result, the intersection information is obtained by solving (4). As this process is independent of the triangles, we can parallelize it in graphics hardware. This equation is adapted with optimizations since the of a matrix is an intrinsic function in the High Level Shading Language (HLSL). The intersection test is conducted in the view space and if it is passed, we output a point primitive. The  and components of its position coordinate are 0 because the render target used in our algorithm is only onepixel in size. The component is the depth value which is obtained by transforming the distance value into the projection space. The GPU will automatically add a primitive id as the triangle identifier in the Input Assembler Stage. In addition, the barycentric coordinate value and the object id are also obtained from the picking information. The pseudocode in the geometry shader is presented in Algorithm 1.
3.4. MultiLayer Visibility Queries
We use a multilayer rendering algorithm to perform linear time intersection tests, taking advantage of the 4th generation of PCclass programmable graphics processing units. The overall approach makes no assumption about the object's motion and is directly applicable to all triangulated models.
First of all, we set a sized texture as a render target after the view frustum culling. Instead of rendering the actual triangles, we then render the bounding boxes of the visible objects. We issue a boolean occlusion query for each object during this rendering pass. As we know, the render state controls whether to clip primitives whose depth values are not in the range of or not; the render state determines whether to perform the depth testing or not. After the view frustum culling, there are some objects intersected with the nearplane or the farplane of view frustum. The depth values of some vertices may not be in the range of . In order to collect all the possible intersected objects for the next layer, we set and to . If any occlusion query is passed, the corresponding object may intersected with the picking ray and thus its actual triangles will be rendered; otherwise, it is pruned. Since a large number of objects are not intersected during this step, we can greatly reduce the rendering time compared with the WYSIWYG method, which requires us to render all the objects.
Second, we render the bounding boxes of all subobjects whose corresponding occlusion query returns . Again we issue a boolean occlusion query for each subobject during this rendering pass. Since some systems need to handle large models, which may not fit entirely into the GPU memory, we group adjacent local triangles to form a subobject and prune the potential regions considerably as suggested in [3].
Next, the actual triangles of the unpruned subobjects are rendered. We only issue one occlusion query for all the triangles during this step. We would like to get the exact intersection result after this step. Triangles outside the view frustum are discarded, and only the closest triangle is needed. Thus the render states and are reset to .
Lastly, if the occlusion query passes, the triangle with the minimal distance from the eyepoint is picked and its intersection information can be retrieved from the sized render target texture. This causes an additional delay while reading back data from the graphics memory to the system memory. In the WYSIWYG method, we need to lock the windowsized texture to get the picking information but this is slow when the window size is large. Actually our novel algorithm only needs to store the information in the smallest sized texture. If the occlusion query fails, we need not read the data from the render target because we know that nothing has been picked. In the WYSIWYG method, however, one cannot know if anything has been picked until one reads the corresponding data from the texture.
4. Experimental Results and Discussion
Our algorithm takes the screen coordinates of the cursor and the scene to be rendered as the input, and outputs intersection information, such as object id, triangle id, and even the barycentric coordinate of the intersection point. Now our algorithm can be used with platforms which support Direct3D 10 APIs. We have incorporated our FRMP method into a Direct3D 10based scene graph library and tested it on four scenes in order to evaluate its efficiency for different scene types. All tests were conducted on a PC with a 1.83 GHz Intel Core 2 Duo 6320 CPU, 2 GB main memory, an NVIDIA Geforce 8800 GTS GPU, 320 MB graphics memory, and Windows Vista 64bit Operating System.
4.1. The Test Scenes
The four test scenes comprise of an arrangement of a toy elk model (3290 polygons), a Venus model (43 357 polygons), 2000 randomly rotated teapots (12.64 M polygons) and 10 000 randomly rotated tori (8 M polygons), all are in resolution of pixels. The test scenes are depicted in Figure 3.
(a)
(b)
(c)
(d)
The toy elk scene only has 3290 triangles, while the Venus scene consists of large number of triangles. Both are simple cases to handle for the picking operation as only one object is used and is not occlusion culled. These two scenes were tested in order to evaluate the efficiencies in simple cases. Such cases may occur in mesh editing or geometry painting applications.
The teapots scene with 12.64 M triangles and the tori scene with 8 M triangles are complex cases and are designed to rotate randomly from frametoframe. They can offer good occlusions as most of their objects are occluded in most instances.
4.2. Comparison of the Results
For each test scene, we report the processing times of our fast and reliable mouse picking (FRMP) algorithm in comparison to the CPU implementation of our algorithm, and to the traditional GPU method (WYSIWYG) (see Figure 4). Note that in our tests we have picked an object. Had we not done so, our algorithm would have performed even better than the competition. This is because when no bounding box intersects with the picking ray, our approach will not render the actual triangles and return directly.
(a)
(b)
(c)
(d)
As we can see from a number of scene statistics shown in Table 1, our method can produce a speedup of more than two as compared to the traditional WYSIWYG method. In the toy elk scene, our method was 2469 miliseconds faster than the CPU method, while the WYSIWYG method was 3554 miliseconds slower than the CPU method. That is because the whole windowsized texture data needs to be read back to the main memory to check the intersection even for small models. In the Venus scene, as the triangle number is increased, our method and the WYSIWYG method produce a speedup of 22.831 and 3.549, respectively. Even in the teapot scene and in the torus scene, our method maintained a good speedup over the WYSIWYG method. If a very large model cannot be loaded into the video memory in its entirety, then our GPUbased algorithm seems to be slower than the CPUbased approach. Fortunately such occurrences are rare in many realtime applications.
5. Conclusions and Future Work
We have presented a novel algorithm for intersection tests between a picking ray and multiple objects in an arbitrarily complex 3D environment using some new features of graphics hardware. The algorithm in this paper is fast, more reliable, parallelizable, and simple. Our algorithm is applicable to all triangulated models, making no assumptions about the input primitives and can compute the exact intersection information in objectspace. Furthermore, our FRMP picking operation can achieve high efficiency as compared with traditional methods. Due to its simplicity, our algorithm can be easily integrated into existing realtime rendering applications. Our FRMP picking approach is of relevance to interactive graphics applications.The presented approach still leaves some room for improvement and for extensions. For instance, alternative acceleration techniques for realtime rendering may be applied to our FRMP method. Moreover, additional hardware features will be useful with the progress of the graphics hardware. In the future, we would like to extend and to apply our technique to the generic collision detection field.

Acknowledgments
The authors would like to thank the Cybergames '08 conference and special issue reviewers for their dedicated help in improving the paper. Many thanks also to Xiaoyan Luo, Charlie C. L. Wang, and Feifei Wei for their help and their valuable advice. The models used for the test in our paper can be downloaded from http://shapes.aimatshape.net/. This work was supported by the National Natural Science Foundation of China (Grant nos. 60533080 and 60833007) and the Key Technology R&D Program (Grant no. 2007BAH11B03).