Abstract

A fundamental element of stereoscopic and/or autostereoscopic image production is the geometrical analysis of the shooting and viewing conditions in order to obtain a qualitative 3D perception experience. This paper firstly compares the perceived depth with the shot scene depth from the viewing and shooting geometries for a couple of shooting and rendering devices. This yields a depth distortion model whose parameters are expressed from the geometrical characteristics of shooting and rendering devices. Secondly, these expressions are inverted in order to design convenient shooting layouts yielding chosen distortions on specific rendering devices. Thirdly, this design scheme provides three shooting technologies (3D computer graphics software, photo rail, and camera box system) producing qualitative 3D content for various kinds of scenes (real or virtual, still or animated), complying with any prechosen distortion when rendered on any specific multiscopic technology or device formerly specified.

1. Introduction

The three-dimensional display is on its way to becoming the next evolution in the image industry. These 3D displays hold tremendous potential for many applications in entertainment, information presentation, reconnaissance, tele-presence, medicine, visualization, remote manipulation, and art. Both research and business in multiscopic display are increasing. We dispose now of numerous multiscopic rendering systems with or without glasses. Different technologies support all these systems; stereoscopy with colorimetric or temporal mixing such as anaglyph [1, 2], occultation and polarization [3], for example for projections with glasses as in some movie theaters; autostereoscopy [4, 5] such as parallax barrier and lenticular lens; for example, for 3D advertising billboards, autostereoscopic displays or lenticular printing.

As shown in Figure 1, the different rendering modes and kinds of scenes to be shot are well-known, but all these systems need content; up to now, there is no 3D shooting system specifically designed to acquire a qualitative 3D content. Some works [69] consist in calculating a right eye image from a left eye image and so obtain a 3D content from a 2D-plus-depth. The main disadvantage of these methods lies in the lack of information in occluded areas which is impossible to overcome in a generic way. Yet, to comply with our demand of qualitative 3D content we focus on multi-shooting technologies. Other works [10, 11] define the projective relations between the images shot by multi-cameras in order to calibrate the different cameras and then to reconstruct the 3D shot scene from these multiple views. There is no link with any viewing device since the target is a reconstruction module. In our case, flat multiscopic viewing requires a simplified shooting layout also called “rectified geometry”.

Moreover, some works have been done to improve the control of the viewer's 3D experience in stereoscopy and computer graphics fields [12, 13]. They usually compare shooting and viewing geometries in order to choose a shooting layout fitting a given depth range in virtual space to the “comfortable” depth range of the display. We believe that choices that can be made in the shooting design may be richer than a simple mapping of depth and could differ for each observation position in the multi-view case. This requires a detailed model and a precise analysis of possible distortions for multiscopic shooting/viewing couple. Indeed, such a model will provide the characteristics of shooting which will ensure the chosen distortions on the chosen viewing device. If some authors have described the transformation between the shot and the real scene [12] in the stereoscopic case, none of them has been interested in producing an analytic multi-observer and reversible model allowing to pilot the shooting for all kind of possible distortions. Thus, we suggest a solution to produce 3D content according to the chosen rendering mode and the desired depth effect.

So we will explain how to model and quantify the depth distortion from given rendering and shooting geometries and also from a chosen rendering device and a desired depth effect, how to design the appropriate shooting layout.

This article introduces an complete analysis of the geometrical quality of 3D content based on distortion analysis by linking shooting and viewing geometries. Starting from the prior and related knowledge (i.e., viewing and shooting geometries), we will show remaining problems and model the possibilities of depth distortions between the scene perceived by a viewer and the initially shot scene. Next, we will present a shooting layout design scheme ensuring a desired depth effect (controlled depth distortion or perfect depth effect) upon a pre-determined rendering device. Finally, we will introduce three shooting technologies (which are patent pending) complying to this scheme and, thus, achieving qualitative 3D content on formerly given rendering device: 3D computer graphics software, photo rail and camera box systems. We will present these prototypes and some of their results.

2.1. Viewing

3D image rendering, with or without glasses, is known to require “stereoscopic” or “autostereoscopic” devices. All these devices make a spatial, colorimetric and/or temporal mixing over a single region of interest (ROI area physically filled by the displayed image on the rendering device) of so-called “initial images” of one scene shot from several distinct viewpoints. These systems allow to optically and/or temporally separate the images reaching each eye of one or more viewers. In case of stereoscopic systems, both images are emitted in a single optical beam independently of the viewer's position in this beam [1, 2, 14]. However, autostereoscopic systems separate the images in several distinct optical beams, organized for example, in horizontal “range” of images ( and ) [4, 5]. We can also imagine optical beams organized in both horizontal and vertical ranges. Then we dispose of a matrix disposition of optical beams ( and ), each one transporting a different image. Thus, all known devices broadcast alternately and/or simultaneously images ( and ) within one or several optical beams in such a way that both eyes of a correctly-placed viewer get different consistent images (i.e., initial images and not combinations of them). Thereby the viewer's brain rebuilds his depth perception by stereoscopy [15]. Even if the human visual system has a tolerance as for epipolar alignment, ideal positions within this tolerance correspond in particular to eyes line parallel to the display's rows. Despite this human tolerance, we calculate our images in such a way that they have a perfect epipolar alignment for well-placed eyes line.

So let's analyse the geometry of these devices (the “viewing geometry” (Figure 2) which will constrain the compatible shooting layout.

A 3D rendering device mixes images sharing out a ROI of dimension (width) and (height). Each image (image's index ) is supposed to be “correctly” visible (without much mixing with others) at least from the chosen preferential position . These positions are aligned upon lines parallel to the rows of the ROI located at distance , , from the device ROI. The preferential positions are placed on those lines according to their second index in order to guarantee that a viewer whose binocular gap is (often identical to human medium binocular gap 65 mm, but possibly different according to the expected public: children, etc.), with eyes line parallel to the device rows, would have his right eye in and his left eye in . The right eye in would catch image number , while the left eye in would catch image number knowing that with being the way gap between the indexes of images composing the visible consistent stereoscopic couples with binocular gap at distance . Hence, associated left and right eye preferential positions and verify and .

We also define the lines positions vertically (because viewers of various sizes use the device) by which represents the “overhang”, that is, vertical gap of eyes positioning compared with the ROI center . If we do not know , we use a medium overhang corresponding to a viewer of medium size, which has to be chosen at design stage. Assuming and are stereoscopic homologous for images and , their perception by the right and left eye of a viewer from and leads this spectator's brain to perceive a 3D point . The viewing geometry analysis is expressed thanks to a global reference frame , chosen at the ROI center , with parallel to its rows and turned towards the right of the spectators, and parallel to its columns and turned towards the bottom.

2.2. Shooting

In order to “feed” such devices with 3D content, we need sets of images from a single scene acquired from several distinct and judicious viewpoints and with specific projective geometry as the rendering upon flat multiscopic devices involves coplanar mixing of these images. This major issue is well known in multiscopy.

The image viewing is achieved according to distorted pyramids whose common base corresponds to the device ROI and the tops are the viewer's eyes or positions. Given that vision axes are not necessarily orthogonal to the observed images area (ROI), the viewing of these images induces trapezoidal distortion if we don't take into account this slanted viewing during the shooting. This has an immediate consequence in order to achieve depth perception. If the trapezoidal distortions are not similar for the two images seen by a spectator, the stereoscopic matching by the brain is more delicate, or even impossible. This reduces or cancels the depth perception. This constraint, well-known in stereoscopy is called the “epipolar constraint”.

Solutions (also called toe-in camera model) of convergent systems have been proposed [16, 17], but such convergent devices manifest the constraint presented above. So, unless a systematic trapezoidal correction of images is performed beforehand (which might not be desirable as it loads down the processing line and produces a qualitative deterioration of the images) such devices do not afford to produce a qualitative 3D content. As demonstrated by [18, 19], we must use devices with shooting pyramids sharing a common rectangular base (off-axis camera model) and with tops arranged on a line parallel to the rows of this common base in the scene. For example Dodgson et al. use this shooting layout for their time-multiplexed autostereoscopic camera system [20].

Thus, aiming axes are necessarily convergent at the center of the common base and the tops of the shooting pyramids must lie on lines parallel to the rows of the common base. Figure 3(b) shows a perspective representation of such a shooting geometry. This figure defines the layout of the capture areas (), and the centers () and specifies a set of parameters describing the whole shooting geometry completely. Figures 3(c) and 3(d) show top and full-face representations of this geometry, respectively.

The shooting geometry analysis is expressed using a shooting global reference frame chosen centered at the desired convergence point (which is also the center of the common base of the scene) and oriented in such a way that the two first vectors of the reference frame are parallel to the main directions of the common base of the scene and so, parallel to the main directions of the capture areas. The physical size of is and . Furthermore, the first axis is supposed to be parallel to the rows of the capture areas and the second axis is supposed to be parallel to the columns of these areas.

The pyramids, representative of a shooting layout, according to the principles previously explained to resolve the known issue, are specified by

an optical axis of direction, optical centers (i.e.,: principal points) aligned on one or more () line(s) parallel to the rows of the common base (so on direction), rectangular capture areas .

These capture areas must be orthogonal to , so parallel one to another and parallel to and to centers lines (which are defined by their distances from , along , along and along ). These capture areas are also placed at distances along , along and along from their respective optical center . Their physical size is given by and . They are centered on points in such a way that lines defining the axes of sight are convergent at . The centers and must be on same “centers line” and with a spacing of ( and ).

Such a shooting layout is necessary to obtain a depth effect on multiscopic device. Nevertheless, its does not ensure that the perceived scene will not be distorted relative to the shot scene. Non distortion implies that viewing pyramids are perfect counterpart of shooting pyramids (i.e., have exactly same opening and main axis deviation angles in both horizontal and vertical directions). In case of pyramids dissimilarity, the 3D image corresponds to a complex distortion of the initially acquired scene. This can be desirable in some applications to carry out some special effects, as it can be undesirable in others. This requires, that shooting and viewing must be designed in a consistent way whether we desire a depth distortion or not. Let's now model those distortion effects implied by a couple of shooting and viewing geometries.

3. Distortion Analysis and Model

In this section, we consider that we use perfect sensors and lenses, without any distortion. This assumption implies some issues which will be presented for each derived technology.

Thanks to the two previous sections we can link the coordinates , in the reference frame , of the point of the scene, shot by the sensors previously defined, with the coordinates in the reference frame , of its counterparts seen by an observer of the viewing device placed in an preferential position.

Assuming that the scene point is visible on image number , its projection verifies

Knowing that , center of , verifies

the relative position of the scene point 's projections in the various images are expressed as

As the images are captured behind the optical centers, the projection reverses up/down and left/right axes, and the implicit axes of the images are opposites of those of the global shooting reference frame . Moreover, the images are then scaled on the whole ROI of the rendering device. This relates projections of to their “rendered positions” on the ROI:

Remarking that and , is expressed in the reference frame as

By this time, and assuming was visible on both images and , we notice that and lie on the same row of the ROI. This fulfills the epipolar constraint and thus permits stereoscopic reconstruction of from and according to

Thus, after some calculus, the relation between the 3D coordinates of the scene points and those of their images perceived by a viewer may be characterized under homogeneous coordinates by

The above equation can be seen as the analytic distortion model for observer position which matches the stereoscopic transformation matrix given in [12]. As such this model clearly exhibits the whole set of distortions to be expected in any multiscopic 3D experience, whatever the number of views implied or the very nature of these images (real or virtual). It shows too that these distortions are somehow independent one from another and may vary for each observer position . The following detailed analysis of this model and its further inversion will offer a novel multiscopic shooting layout design scheme acting from freely chosen distortion effects and for any specified multiscopic rendering device.

The above model exhibits some new parameters quantifying independent distortion effects. Those parameters may be analytically expressed from geometrical parameters of both shooting and rendering multiscopic devices. Their relations to geometrical parameters and impact on distortion effects are now presented

control(s) the global enlarging factor(s), control(s) the potential nonlinear distortion which transforms a cube into a pyramid trunk according to the global reducing rate possibly varying along , control(s) width over depth relative enlarging rate(s), or horizontal/depth anamorphose factor, control(s) height over width relative enlarging rate(s), or vertical/horizontal anamorphose factor, control(s) the horizontal “shear” rate(s) of the perceived depth effect, control(s) the vertical “shear” rate(s) of the perceived depth effect by an observer whose overhang complies to what is expected,

Thus we have defined the depth distortion possibilities using the previously established shooting and viewing geometries. Moreover, this model makes possible to quantify those distortions for any couple of shooting and viewing settings by simple calculus based upon their geometric parameters.

4. Shooting Design Scheme for Chosen Distortion

One can use any multiscopic shooting device with any multiscopic viewing device while giving an effect of depth to any well-placed viewer (3D movie theater for example) but Section 3 shows that distortions will not be similar for each couple of technologies. In this section, we will design the shooting geometry needed to obtain, on a given viewing device, a desired distortion: whether perfect depth or chosen distortion effect of shot scene.

Knowing how distortions, shooting and viewing parameters are related, it becomes possible to derive shooting layout from former distortion and viewing choices.

We will describe two shooting layout design schemes complying to this use of the distortion model:

a generic scheme allowing for a precise control of each distortion parameter, a more dedicated one of huge interest as it is focused on “non distortion” or “perfect depth”, allowing at most control of global enlarging factor(s) as any other distortion parameter is set to its “non distortion value”.
4.1. Controlled Depth Distortion

To define the shooting layout using this scheme, we control global enlargement (by ) and 4 potential depth distortions:

(1)when , a global nonlinearity which results in a deformation of the returned volume onto a “pyramid trunk” (as varies along axis), (2)when , a sliding or “horizontal shear” of the returned volume according to the depth, (3)when and/or when the real overhang of the observer differs from the optimal overhang, a sliding or “vertical shear” of the returned volume according to the depth and (4)when and/or , an anamorphose producing uneven distentions of the 3 axis ( versus for and versus for ).

The depth controlled-distortion is obtained by adjusting the enlarging factor(s) and the distortion parameters (and so ), , , and . This last condition on is more delicate because it depends on the height of the viewer which inevitably affects the effective overhang towards the device. So the chosen vertical sliding can be reached only for observer whose overhang is defined in the viewing settings for this observation position.

Thus, given the viewing settings and the desired distortion parameters, the shooting parameters can be calculated as follows:

This depth controlled-distortion scheme allows to obtain the parameters of a shooting layout producing desired 3D content for any rendering device and any depth distortions combination.

4.2. Perfect Depth Effect

A particular case of the depth controlled-distortion is a perfect depth effect (depth perception without any distortion compared with the depth of the shot scene). To produce a perfect depth effect (whatever the enlarging factor(s) ), we should configure the shooting in order to avoid the 4 potential distortions. This is obtained by making sure that the distortion parameters verify , , , and . The last condition is more delicate, as it can be assured only for an observer complying to the defined overhang.

In case of shooting for perfect depth effect, the shooting parameters can be calculated as below:

This particular case is very interesting for its realism, for example: in order to convince financiers or deciders, it may be important to give the real volumetric perception of a building, or a mechanical piece, in a computer aided design (CAD) application, or medical visualization software, in a surgical simulation application.

5. Derived Shooting Technologies

5.1. 3D Computer Graphics Software

Thanks to these design schemes, we have created 3 different technologies to shoot 3D scenes: multi-viewpoint computer graphics software, photo rail and camera box system. These products have been developed under the brand “3DT 5 Solutions” and patents are pending for each of then. As shown in Table 1, we have developed 3 solutions to obtain qualitative photo or video content for any relevant kind of scene still or animated, real or virtual). We use anaglyph to illustrate our results even if their viewing on paper or 2D screen media is not optimum because the images have been calculated to be rendered on specific devices.

The common goal of our 3D computer graphics pieces of software is to virtually shoot different scenes, as photo of still or animated scene as well as computer animations of animated scene. Thanks to the previous shooting design scheme, we are able to place the virtual sensors around a usual monocular camera according to the chosen viewing device in order to obtain the desired depth effect. In this case, virtual cameras are perfect and there is no issue with distortions due to sensors or lenses.

Thus we have, by now, developed plugins and software to visualize and handle in real-time files from CAD software such as AutoCAD, Archicad, Pro/Engineer, etc. as well as medical data, such as MRI. We are going to apply this technology to other virtual scenes.

In those software pieces, we choose the rendering device parameters and the desired depth effect, and the software computes and uses its virtual shooting layout. It is possible to record different rendering devices and depth effect distortions; then to switch easily between these devices and these distortions.

Those pieces of software currently handle scenes up to 7 million polygons at interactive rate.

Figure 4 shows an example of images of Parametric Technology Corporation (PTC) piece (a engine) shot as the software was tuned to achieve a perfect depth effect on an autostereoscopic parallax display 57 (optimal viewing distance 4.2 m) (Note ).

Figure 5 shows medical data (image of an aneurysm). The software was tuned for autostereoscopic parallax display 57 (optimal viewing distance 4.2 m) and a perfect depth effect is obtained, as it is usual with medical data in order to allow the most definite and efficient interpretation possible by the surgeon (Note ).

5.2. Photo Rail

The goal of this photo rail is to shoot a 3D photo of still scene. By using the photo rail (Figure 6) with its controlling software it is possible to control both the usual operations of a professional digital camera and its movement along a linear axis parallel to its sensor rows.

This allows us to carry out any shooting configuration whatever the chosen depth distortion settings and viewing device if we crop the needed capture area in each digital photo in order to comply to the needed shooting geometry. With this Photo Rail, there is a possibility of distortion due to the digital camera, but distortions will be consistent for all images and of negligible magnitude, as it is professional equipment. We have not tried to correct those possible distortions but such a work could be done easily.

For example, Figure 7 illustrates the shooting of a room in “Palais du TAU” [21] in Reims with a perfect depth effect for autostereoscopic parallax display 57 (optimal viewing distance 4.2 m). We made a 3D shooting of a big hall with a significant depth (Note ). To realize this sequence, we positioned the convergence point at 4.2 m from the photo rail. Moving the digital camera along the rail, taking pictures and storing them took us 39.67 s.

5.3. Camera Box System

The goal of this camera box system is to shoot different scenes, as photo of still or animated scene as well as video of animated scene. Thanks to the previous shooting design method, we know how to realize a camera box system containing several couples of lenses and image sensors in order to produce simultaneously the multiple images required by an autostereoscopic display with a desired depth effect. As these couples are multiple, their induced distortions can be different. We have introduced a couple-by-couple process of calibration/correction based upon the model of Zhang [22].

We have already produced two prototypes of camera box system delivering multi-video stream in real-time (25). Their layout parameters have been defined for no distortion of specific scenes (see below) and set at manufacturing:

The 3D-CAM1 (Figure 8(a)) which allows to shoot a life size scene (ratio ) of the bust of a person to be viewed on an autostereoscopic parallax display 57 (optimal viewing distance 4.2 m). We have chosen to obtain a perfect depth effect. According to numerous viewers both novice and expert, the 3D perception is really qualitative. Unfortunately, such an experiment is impossible to produce on 2D media (Note ). Figure 9 shows some results. Figure 9(a) corresponds to a set of 8 simultaneously shot images, Figure 9(b) is the result after their autostereoscopic mixing and Figure 9(c) is anaglyph of images indexed 3 and 4.The 3D-CAM2 (Figure 8(b)) enables to shoot small size objects (in the order of 10–20 cm) and to display them on an autostereoscopic lenticular display 24” (optimal viewing distance 2,8) with an enlargement factor set to . We have chosen to obtain a perfect depth effect. Here again the 3D perception is really qualitative according to our numerous visitors. You will find some illustrations in Figure 10. Figure 10(a) corresponds to a set of 8 simultaneously shot images and Figure 10(b) gives the result in anaglyph of images indexed 3 and 4 (Note ).

6. Conclusion

This work firstly models geometrical distortions between the shot scene and its multiscopically viewed avatar. These distortions are related to geometrical parameters of both the shooting and rendering devices or systems. This model enables quantitative objective assessments on the geometrical reliability of any multiscopic shooting and rendering couple.

The formulas expressing distortion parameters from geometrical characteristics of the shooting and rendering devices have afterwards been inverted in order to express the desirable shooting layout yielding a chosen distortion scheme upon a chosen rendering device. This design scheme a priori insures that the 3D experience will meet the chosen requirements for each expected observer position. Such a scheme may prove highly valuable for applications needing reliable accurate 3D perception or specific distortion effects.

This design scheme has been applied to 3 different kinds of products covering any needs of multi-viewpoint scene shooting (real/virtual, still/animated, photo/video).

This work proposes several perspectives. We take an interest in the combination of real and virtual 3D scene, that is 3D augmented reality. This allows to combine virtual and real elements or scenes. In the CamRelief project, Niquin et al. [23] work on this subject and present their first results concerning accurate multi-view depth reconstruction with occlusions handling.

We are developing a configurable camera with geometric parameters possibly flexible in order to fit to a chosen rendering device and a desired depth effect. Thus, we could test different depth distortions for a same scene. Moreover, we could produce qualitative 3D content for several rendering devices from a single camera box.

We will need to do some experiments on a fussy subject as we have to validate that the perception is geometrically conform to our expectations. This will require a significant panel of viewers but also to define and set up the perception test which will permit to precisely quantify the distances between some characteristic points of the perceived scene.

Note. The 3D content has been produced for autostereoscopic display. Obviously, it can only be experimented with the chosen device and in no way upon 2D media such as paper or conventional display. Nevertheless, anaglyphs help the reader to notice the depth effect on such 2D media.

Acknowledgments

We would you like to thank the ANRT, the French National Agency of Research and Technology for its financial support. The work reported in this paper was also supported as part of the CamRelief project by the French National Agency of Research. This project is a collaboration between the University of Reims Champagne-Ardenne and TéléRelief. We would you like to thank Michel Frichet, Florence Debons, and the staff for their contribution to the project.