Abstract

Multiresolution hierarchy based on features (FMRH) has been applied in the field of terrain modeling and obtained significant results in real engineering. However, it is difficult to schedule multiresolution data in FMRH from external memory. This paper proposed new multiscale feature model and related strategies to cluster spatial data blocks and solve the scheduling problems of FMRH using spatial neighborhood. In the model, the nodes with similar error in the different layers should be in one cluster. On this basis, a space index algorithm for each cluster guided by Hilbert curve is proposed. It ensures that multi-resolution terrain data can be loaded without traversing the whole FMRH; therefore, the efficiency of data scheduling is improved. Moreover, a spatial closeness theorem of cluster is put forward and is also proved. It guarantees that the union of data blocks composites a whole terrain without any data loss. Finally, experiments have been carried out on many different large scale data sets, and the results demonstrate that the schedule time is shortened and the efficiency of I/O operation is apparently improved, which is important in real engineering.

1. Introduction

Terrain model is widely applied in actual engineering, such as film, game, and simulation. One of the grand challenges is that it requires huge amounts of data set, in order to guarantee its precision. Especially in recent years, with the rapid development of spatial visualization technology, the requirement for terrain details is higher and higher; therefore, the terrain scene is more and more complex. Owing to the massive data, it is very difficult on the establishment, management, and index of terrain model in the scientific computing visualization engineering.

The capacity of RAM is limited for storing huge amounts of data set; therefore, the support of external storage is needed, when rendering a large scale of terrain. According to the requirements of scene, it is important to dynamically load data into the memory interactively for realizing to render the scene in real-time visualization engineering. However, there is a serious speed difference between RAM and external storage, so that the interactive operation between them frequently makes a big influence to the efficiency of the system directly. It has become a bottleneck for the process of large scale data. As a result, how to establish the model with huge amounts of data set and provide effective scheduling method is one of the research hotspots.

The pyramid is a kind of data model composed by multiresolution levels [1]. It can provide all kinds of resolution of the data without real-time sampling, because the model generates the different resolution data levels of data elevation model (DEM) in advance. It has been widely applied to commercial software. However, there are two aspects of drawbacks.(1)The pyramid model increases the storage space of data; that is to say, a lot of data is repeated on the different layers of it. Hence, decreasing the amount of loaded data is at the cost of consuming external storage.(2)There is still mass of redundant data in the data set which has been loaded in memory. No matter the terrain is plain or rugged, in the classical method, the data resolution for each level is uniform, and interlaced sampling is adopted in the various adjacent levels, which does not consider the terrain geometry feature.

Lindstrom and Silva [2] proposed the dynamic data loading method using the function of mirror file. This method is extremely simple, and it takes advantage of operation system to divide data pages automatically. Dai et al. [1] brought up a method of incremental data update, which dynamically updated local terrain data, according to the move of viewpoint and the geometry center of the data page. Li et al. [3] adopted the technique of incremental horizon to eliminate the invisible part of the model.

In order to further improve the efficiency of data schedule, grid simplified model is expressed as data stream by many scholars [4]. Space-filling curve including Hilbert curve [5], Z curve [6], and Π Curve [7] in terrain model is a common approach of linearization for data in external storage. It still takes additional time on the index of external storage even if those methods improve the efficiency of data schedule in different ways. In recent years, a new function based on cluster analysis has been proposed to simplify the data model in external storage, and improve the efficiency of data schedule [8, 9]. But the out-of-core methods based on cluster almost aimed at irregular data set. However, the primary research object in terrain modeling engineering is large scale data set of grid model; therefore, the methods mentioned above are not capable of disposing the large scale data in real engineering. In addition, the parallel algorithm using hardware to deal with out-of-core terrain data was proposed. But it is not applied widely, because of its requirement for high quality hardware [10].

This paper utilized a multiresolution out-of-core model based on geometry terrain feature. There is the following advantages It consists of physical model and logic model, in which the physical model is original full resolution data, so the model is independent of the data scale. The logical model is a multiresolution hierarchical structure on physical model, which establishes the index of data blocks, according to the geometry feature of terrain. It decreases the amount of data loaded to memory while keeping the external storage constant.

In this paper, we proposed a strategy of scheduling data, which is based on logic model after the cluster analysis. It decreases the time cost of searching the target data in external storage, by clustering the data blocks whose static geometry errors are similar. Meanwhile, we put forward the method that sorts the data blocks in the cluster via space-filling curve and presents encoding strategy on multiresolution terrain model. It resolves two primary problems One is to map terrain data blocks in two-dimension structure into the form of one dimension. The other is to solve the problem of how to intercept the data blocks in cluster for the multiresolution model in a local area.

As we all know, the effect of 3D modeling based on vectors is usually good; however, there are a lot of problems such as the process of large scale data and accelerative rendering. In recent years, few scholars have achieved significant research progress in the following aspects.

Level of details (LOD) is a classical simplification method for multiresolution model. There are three categories of algorithms for LOD, that is, triangulation algorithm based on quadtree [11], simplified algorithm based on adaptive grids [12], and progressive mesh algorithm [13].

Triangulation algorithm based on quadtree is to divide one data block into four blocks according to a screen error and viewpoint. Its advantage is that it built the hierarchy using Quad-tree code, which is easy in solving the crack problem of adjacent blocks on the common edges. The representative method contains algorithm based on constrained Quad-tree [14], simplified method based on implicit restricted quadtree [15], and so on. The advantage of this kind of algorithm is that it adopts the efficient strategy to refine layers from the top to the bottom. However, a mass of information of each quadtree node needs to be stored. Thus it consumes massive storage resources.

The simplified algorithm based on adaptive regular grid evaluates whether model data should be eliminated according to model error and then determined to divide grid area adaptively [12].

Progressive mesh algorithm adopts embedded grids to remove crack between layers by setting “skirt” for different multiresolution layers [13]. The literature [16, 17] takes advantage of geometry clipping to optimize algorithm that it creates data buffer in order to accelerate real-time roaming of terrain scene. Meanwhile, Sun et al. [18] described the technique to compress the texture based on mipmap structure. Although this kind of algorithm improves model precision, it needs higher CPU consumption. So there are difficulties for general 3D vector model, when dealing with large scale data sets.

Cluster analysis is also a valid strategy of solving problems in construction of multiresolution model. It is widely applied in many areas. The literature [19, 20] described the method of solving problems of computing coordinates mean by using cluster analysis.

When rendering multiresolution terrain model, it is necessary to deal with huge scale data. Jeong et al. [21] proposed a new method to render the complex scene. It is hard to imagine rendering such complex scene in early years. However, with the development of graphic hardware, GPU has become an important computing resource to deal with huge scale data. The literature [22, 23] shows more details using this new method.

3. Cluster Analysis Based on Similar Error

3.1. Cluster with Similar Error

Multiresolution hierarchy based on features (FMRH) is a structure on the basis of physical model. According to the structure of FMRH proposed in the literature [24], a mass of redundant data was eliminated from the view of terrain features, so that the multiresolution model could be stored in the minimum capacity of external storage. However, the efficiency of scheduling the data in external storage would not be improved even though the amount of physical data decreased. In fact, the information of level of data blocks is not inherence with its errors; that is to say, the errors of data blocks in the same level may not be similar. The error in current data block is only related to its source data block or its sub-data block, because the hierarchical structure proposed in our paper is based on a realization of depth-first traversal. As shown in Figure 1, , and , are the blocks on the same branch in FMRH, and they satisfy the following relation:

The error of nodes in the same layer such as and is uncorrelated, while the one of and is possibly great, so it cannot load data into memory as the unit of layer. Hence, it is necessary to traverse the whole hierarchical structure in external storage to load data. As a consequence, selecting data blocks of multiresolution on different levels increases the burden of rendering scene in real time.

A cluster algorithm, which satisfies spatial constraint and maximum static error of closure based on FMRH, is raised in this section. The main idea of this algorithm is to make cluster partition for nodes which own similar errors in external storage and propose the method of spatial index for each cluster. According to this strategy, it achieves multiresolution terrain data of the current scene rapidly for each frame, every error threshold, position of viewpoint and visible area, without traversing the entire FMRH structure of terrain.

According to the literature [24] and formula (1) in this paper, the static spatial error threshold can be computed, and a series of error thresholds will be gained by adjusting the parameter . The bigger the threshold, the smaller the amount of extracted data and, the lower the resolution of model; otherwise, the larger the amount of data, the higher the data resolution for memory. Let and be the maximum and minimum static spatial error threshold. equals to when establishing the FMRH; namely, . is the basic condition which reserves the profile feature of terrain, so it adopts in the experiment. The threshold aims at guaranteeing the required higher resolution data in memory by FMRH, while the goal of threshold is to provide the lower resolution data which maintains the profile feature of terrain with the lowest requirement of the simplified model. Thus, the static error in all data blocks of FMRH ranged from to .

Definition 1. Let be the static error threshold, and satisfy the constraint: . Assume the set and each is in the set of data block in different layers; then define as a cluster based on and (similar error cluster, SEC), where it satisfies the following constraints: (1), where can be the index of every spatial layer in the structure of FMRH,(2), where denotes an entire closed terrain space.

3.2. Cluster Algorithm Based on FMRH

A cluster instance is described as Figure 2, and Figure 2(a) is a cluster, which satisfies the constraints in Definition 1 as well. The data blocks marked within a transparent rectangle build the terrain area in Figure 2(b). These data blocks distribute on four levels in FMRH and satisfy the uniform error threshold. These data blocks obviously construct a closed space area which is to be proved in Section 3.3.

According to algorithm of construction of FMRH [24], it records the static error of closure of each data block in FMRH when constructing the FMRH model. Hence, the work is to traverse the structure of FMRH and put data blocks which own similar static error into the proper clusters, according to their static error. In order to satisfy the second constraint in Definition 1, traversing the structure of FMRH by depth-first strategy is better. More details of the cluster algorithm based on FMRH are shown as follows (Algorithm 1).

Input: The structure in external storage of FMRH.
Output: The set of cluster .
Description: This algorithm generates the set of cluster
        by traversing the structure of FMRH with depth-first order,
        according to the relationship of static error and threshold.
(1) Initialization:
    (1.1) According to Definition 1, set
         ,
         and make
        
         .
    (1.2) Clear the clusters in aggregation .
(2) Recall the function of Traverse_FMRH (FMRH, , 0).
(3) Generate the index file of clusters, and save the index of data block for each cluster.
(4) The end.
Traverse_FMRH (FMRH, current_Block, level_S)
Function Description: Load static error of data blocks in
FMRH, and decide which cluster it belongs to.
Parameter Description: FMRH is a multi-resolution
        level of structure based on features. The parameter
        current_Block is the current data block. The parameter
        level_S is the index of current data block in the level.
(1) Load the static error of current_Block from FMRH as current_Block.CovSE.
(2) Decide which cluster current_Block belongs to:
    (2.1) Construct intervals according to error
       thresholds   .
    (2.2) Each threshold interval corresponds a cluster from to :
     (2.2.1) Decide the size of current_Block.CovSE, if it meet two constraints:
      ;
       add data block current_Block into the cluster ,
      if there is no ancestor data block of current_Block
      including in the cluster .
     (2.2.2) When current_Block.leaf is false, then, get four data
       blocks in the next level
       from current_Block in FMRH, and recall the function of
       DeepTraverse_FMRH(FMRH, , level_S + 1).
(3) The end.

According to Algorithm 1, the data block in the same interval is aggregated into one cluster, while these different resolution data blocks distribute in different layers, and their union covers the terrain space of FMRH.

3.3. Proof of Cluster Space Closeness Proposition

The majority of data block with different resolution on the same branch distributes into different clusters by depth-first traverse, and the static error satisfies formula (1) on a branch from the top to the bottom in FMRH. The special case is that the error of data blocks, which are partly adjacent or distributed in closed layers, are similar on the same branch, so that they locate in the same error threshold interval. As you can see in Figure 3, three branches are marked by black dotted lines. Assume that there is small difference between and ; and are probably classified into one cluster, only through error threshold which data block belongs to. But it goes against the second constraint in Definition 1; in fact there is an overlapped partition in space between and.

In order to solve this problem, the current_Block is not classified as cluster , if there is ancestor data block in the current cluster , in Algorithm 1. Thus, there is no overlapped area in the covered place of different data blocks for each cluster, and the union of all data blocks constructs the completed closed terrain. The following proof attempted to demonstrate the inevitability in theory.

Theorem 2. Let be the set of cluster generated by Algorithm 1; then for any , is held, where denotes a completed terrain closed in the space.

Proof. Use induction method to prove the theorem. Let be the maximum spatial layer in FMRH.
When , there is only one spatial layer in .
According to Definition 1, is consisting of single data block, so that the space is obviously closed.
Assume the theorem holds for any and , when . So that for a layer of data block is increased (denoted as Path A) on branch of each or remaining the branch (denoted as Path B), when . In the case of Path B, since there is no change on branch, all clusters, generated on the basis of , have already been closed in the space, when . As a consequence, the closeness proposition holds. The remaining work is to prove spatial closeness proposition for the branch of Path A in cluster.
Without loss of generality, we chose any branch on Path A (shown as Figure 3) for any cluster . According to the method of depth-first traversal in Algorithm 1, for the branch on Path A from the top to the bottom, the current data block , where , shown in Figure 3, is divided into two categories. If the ancestor data block of is included in, it will no longer be in, before traversing the layer. On the contrary, and its three brother data blocks are listed into , conducted by recursive traversal for four sub-data blocks in step . For the former situation, keep constant; thus the closeness proposition is held. and its three brother data blocks are overlapped with their father data blocks, so that the latter one still keeps the closeness proposition hold. This ends the proof of Theorem 2.

4. Index of Levels of Data Blocks in Cluster

The union of entire data blocks in each cluster composites a closed space of terrain through our algorithm. When loading data, a proper cluster is obtained accurately as long the scope of error threshold as is ascertained. Therefore, it realizes multiresolution data loading with no redundancy, which does not need to search and judge further. However, one frame roamed in a scene just requires a small part of data to load in memory. Thus, the work of encoding and sorting data blocks in each cluster by characteristics of space is necessary in order to establish the relationship of multiresolution model of level and data blocks in cluster. It realizes the index of local data blocks rapidly as well.

As a matter of fact, the reconstruction of large scale grid data is applied widely in terrain visualization, volume rending, and matrix operation, and its advantage is that it verifies data access locally. The best strategy of linearization of multidimensional data is using the technique of space-filling curves. The reconstructed data can be intercepted as the rule of filling, according to the requirements, while the closeness proposition in space is still held. This characteristic is corresponded to the goal of indexing the feature cluster and takes advantage of the local proposition of space-filling curves. It can also realize the interception of data in any scope.

4.1. Strategy of Hilbert Space-Filling Curves

This paper focuses on the space-filling curves of two-dimension, because the data of elevation can be mapped into the a tow-dimension space. The characters of these curves are listed as follows. They not only verify to traverse every data block in the space, but the data blocks are adjacent in local area of filling curves. It is easy to form the triangle strips rendering real time, and the public vertices can be used more than once. It takes advantage of the modern graphic hardware resource. These papers adopted Hilbert curves to guide the process of index coding for data blocks in cluster, so that it improves the efficiency of data loading of terrain.

The basic approach of generating Hilbert curves, formed with the process of subdividing the current square to four small ones and connecting the center of these squares, is recursion. The literature [5] proposes a serial of indexes for Hilbert curve along different coordinate’s axes. However, the Hilbert curve cannot be put into use directly, because data blocks in each cluster are presented with multiresolution.

It is necessary to consider the level information of data blocks, in order to apply it to Hilbert curve in the multiresolution model. The train of thought is as follows. First of all, ascertain the filling order of children data blocks where its number is less than or equal to four, which is determined by their father data blocks, by using the level information of data block in cluster. The data block traversed forms a multiresolution filling curve in a certain arrangement from the top to the bottom, by executing the process of above judgments. The details are stated as follows.(1)Divide the data block into four types in cluster as type I, II, III, and IV. This division plays two roles in classifying the data blocks. On one hand, it can make sure the sequence in space-filling curve of the sub-data block is of current data block. On the other hand, it takes advantage of self-similarity of Hilbert curve to summarize rules of production. Therefore, the type of sub data block is ascertained, and it can be used for recognizing where the position of data block of next layer in the curve is.(2)Sort sub-data blocks, according to the current type of data block. For type I, sort sub-data blocks as mark 0 to 3. For type II, sort sub-data blocks as mark 0 to 3. For type III, sort sub-data blocks as mark 0 to 3. For type IV, sort sub-data blocks as mark 0 to 3.(3)Let the type of current data block be , and the type of its child’s data block is probably , , , or . In order to make sure of the type of its child’s data block, the following rules of production are proposed.

Rule 1. If then , , , and .

Rule 2. If   then , , , and .

Rule 3. If then , , , and .

Rule 4. If then , , , and .

4.2. Algorithm of Generating Multiresolution Model Using Hilbert Space-Filling Curve

The necessary condition that each cluster can be sorted through space-filling curve is proved as follows. Due to Definition 1 and Theorem 2, there is only one data block on the same branch in the same cluster, and the set of all data blocks in a cluster composites a closed space area. Thus, cluster is a set with no intersections and spatial closeness data blocks. An instance of cluster is shown as follows.

By traversing data blocks in cluster from layer , top layer in cluster, to layer , the lowest layer in cluster, it forms space-filling curves. Data blocks are sorted in cluster, because each cluster is produced by the manner of depth-first recursion. The principle is that it backtracks to the original data blocks when a branch traverses all of the data blocks in current cluster, and then executes the depth-first recursive operation of their brother data blocks. As a result, adjacent data blocks selected in cluster are the nearest ones in FMRH space. As shown in Algorithm 2, taking an example for a cluster, more details of generating space-filling curves are described.

Input: Cluster .
Output: Space-filling curve (code arrangement of data block) CodeString.
Description: This algorithm generates the space-filling curve
   which contains all data blocks in cluster , by judging
    the type and position of data blocks in cluster .
(1) Initialization:
      (1.1) Define the top layer and bottom layer in cluster  .
      (1.2) Extract the first and the last data block from cluster  .
         According to this, decide the type of data block in layer .
      (1.3) Clear CodeString.
(2) Recall the function DeepTraverse_Cluster ( , , ).
(3) Modify the index file of cluster, and save the new index code
of data blocks in cluster.
(4) The end
DeepTraverse_Cluster (currentBlock, Type, )
Function description: As Hilbert space filling curve,
        generate code of data block in cluster.
Parameter description: currentBlock is the current data block,
        Type is the type of current data block, is
        the layer of current data block.
(1) Get the four data blocks   , and of currentBlock.
(2) Sort   , and   , as the type of Type
    (shown in Figures 4(a)–4(d)).
(3) Judge the data block is whether in cluster :
      (3.1) If is a data block in cluster , then encode
         by the order after sorting, and add it into
         CodeString, where , and 3.
      (3.2) If is not a data block in cluster , then ascertain
         the type of by using rules, and make the recursion
         furthermore, where , and 3.
    (3.2.1) If Type is I, then the types of , , ,
          and are Type III, Type I, Type I, and Type IV.
    (3.2.2) If Type is II, then the types of   ,
          and are Type IV, Type II, Type II, and Type III.
    (3.2.3) If Type is III, then the types of , and
           are Type I, Type III, Type III, and Type II.
    (3.2.4) If Type is IV, then the types of , and
           are Type II, Type IV, Type IV, and Type I.
    (3.2.5) According to the new order, judge data
           blocks on next layer
           For to 4:
         If , then recall
         DeepTraverse_Cluster ( , Type , + 1),
         where Type denotes the new type of data block .
(4) The end of function.

An instance of coding cluster is as follows. Taking the area of left bottom corner as an example, it contains 11 data blocks , , , , , , , , , , and . Checking the first and the last data blocks of the original cluster, their types can be deduced, supposing the type of the top layer is type III. Therefore, the type of data block is type I, according to rule 3 in Algorithm 2. As sorting its sub-data block by type I, , , , , , and are coded directly as 0 and 1, because they are data blocks in cluster. As for and , the algorithm examines their sub-data blocks furthermore because they are not in the cluster. The types of and are type I and type IV, respectively, because of the type of is type I.

The sub-data blocks, , , and of are not changed after sorting because the type is type I. In the same way, the sub-data blocks , , , and of have been changed after sorting as , , , and ; therefore, the codes for them are , , , and . As a result, the filling curve in area of the left bottom corner consists of , , , , , , , , , , and .

4.3. Update Model with Large Scale Data Sets

The out-of-core management framework allow the update of original data sets and expansion of new data sets. The steps of updating the physical model are as follows.(1)Dividing new data source as square physical data block on the basis of the form of . For the irregular shape data source, the manner of mending boundaries is adopted to form many physical data block files. Then arranging physical file blocks, it remains the original irregular size of terrain, which satisfies the condition of for each file size. (2)Storing a physical block used head file and data body file. Replacing the old data file with new data file, if it is the one to be updated, and then modifying the head file; otherwise, adding new head file and body file into physical model. For each physical block, establishing the corresponding logical model, and adding cluster analysis. The steps of updating logical model are as follows.(1)Construct the structure of FMRH.(2)Get the set of cluster , and make cluster analysis for FMRH.(3)Establish the adjacent cluster set for each couple of neighboring physical file blocks.

5. Multiresolution Data Schedule

The majority of data schedule methods is view-dependent, which computes the error of simplified model at the same time to schedule data, so that it realizes loading multiresolution model, in order to decrease amount of loaded data in memory. The efficiency of the operation for I/O is low, because the burden of schedule program is too heavy. The multiresolution model is established directly with calculation, as long as it is achieving the error threshold of model, because the schedule program is based on multiresolution logical model and data block cluster proposed previously. Figure 4 describes the framework of data schedule based on the above thinking. The majority of works has been completed already in Sections 3 to 4. The questions still need to be solved are as follows.(1)Establishing the relationship between screen error of view-dependent model and the static error of closure, so that it ascertains the error threshold of loaded data in memory, according to the requirements for memory, and determines to which cluster the data source belongs. (2)Locating and gaining the real data block according to the index of cluster in physical model.

5.1. Searching Strategy of Target Cluster

The view-dependent simplification technology is that it selects the proper multiresolution terrain model according to the position and direction of viewpoint. Finer layer is used for terrain near the viewpoint, and coarser layer for the area is abundant to the viewpoint. We adopt the static error of closure to realize the multi-resolution model in external storage, because there is no information of viewpoint, when organizing the data in external storage. In the schedule program, determining the needed physical data block in practice, which takes advantage of static error, should be calculated according to the error of screen to select the proper index of cluster. The relation between the screen error of view-dependent model and the static error of closure is set up in this section. Let be the dynamic error threshold of viewpoint model, and let be the static error threshold. The relation is stated below:

Let be the number of pixels of screen in a unit distance, and let be the nearest distance from the viewpoint to screen. Applying formula (2), set the value of , such as 0.5, 1, 1.5, or 2, which are the values of pixel. So the required static error threshold of current multiresolution scene can be obtained. Then the index of cluster is ascertained based on the size relation between the cluster static error threshold preestablished in advance and that in . Assume that the arrangement of static error threshold in cluster which is is in the interval between and , so that the multiresolution data block recorded by cluster loads into memory.

5.2. Data Prefetching and Strategy of Incremental Data Schedule

For data prefetching operation, double-threads pattern is adopted, in order to render the terrain system in real time and guarantee roaming the scene coherently. Incremental data schedule is used to decrease the amount of data loaded in memory, by taking advantage of the correlation of different frames in space and time. The main thread is used to render foreground scene, and the other one realizes preschedule of data blocks stored in external storage.

As for single file, apply the index of cluster in Section 5.1 to ascertain the selected cluster . Determine the code of the beginning and the end data block in cluster , according to the required position and size of data in the current scene. For example, assuming that the codes of the beginning and the end block are 10 and 24, this part of multiresolution data clips the required data and loads into memory, through searching the related file of physical data block from the structure of FMRH.

The aim is to locate data blocks in which files for the multifile data schedule operation. It is easy to determine the required data blocks, by traversing latitude and longitude of physical data blocks contained in it and position relation of current scene according to the index of head file of each data source. There are only five kinds of position relation as shown in Figure 5, because data blocks are classified by rules. So a set of cluster corresponds with each physical data block. The method in Section 5.1 selects the proper cluster for each physical data block in the set of cluster and implements the same process for each as single file. In addition, it is very important to match them using the adjacent cluster of the boundary of two files. Finally, we prepare the gotten data by the file mapping technology in data buffer.

Most operation of data schedule is for single file; the amount of data for one frame in roaming scene is not large. The maximum number of files is four, when it satisfies the condition of multifile (illustrated in Figure 5), so that it creates four file mappings at most. This question can be solved using file mapping API supported by VC. Another simple method, which can improve the efficiency of data schedule and decrease the amount of loaded data, is using incremental data schedule. A method which is similar to the literature [25] is adopted, which forms the supplication data block of strip shape on the direction of which viewpoint changes and puts them into a data buffer, in order to support data for next frame. The difference in our method against the above method is based on multiresolution data strips of the FMRH structure. But the method supposed by the literature [25] is based on the nested strip-shaped data structure. Specifically, each strip is the same resolution data block with no relation to terrain features. So the amount of loaded data is much more than the incremental data schedule. More details on data prefetching and the strategy of incremental data schedule are described in Algorithm 3.

Input: The dynamic error of view-dependent   ,
       data scale of current data of scene , data
       position: longitude LONG, latitude LAT.
Output: Pre-fetch data set , and load file mapping buffer.
Description of function: Locate physical data block, and
        load data buffer in memory through the threshold of screen
        error and the position of scene, as the requirement of
        the amount of data in memory.
(1) Initialization: Clear data set .
(2) Traverse the head file of data source (physical model),
and determine which data block belongs to it
according to (LONG, LAT), meanwhile, record the number of involved files.
(3) For each file from to fileN, do the process as follows:
//If it is a single file, then execute once,
otherwise, execute more than once
       (3.1) According to and Formula (2), compute the error
         threshold needed by current scene.
       (3.2) Get cluster , and let the arrangement of static error
           thresholds of be: ,
            , determine which interval belongs to,
           and define the cluster which meets the requirements as .
       (3.3) If is less than fileN, then go to step 3.
(4) According to the moving direction of viewpoint, update the
value of LONG and LAT, and establish fileN file mapping buffers.
(5) If fileN is 1, then determine the initial position of
(according to LONG, LAT), and intercept data block .
(6) If fileN is greater than 1, then determine the position
      of cluster , and let be  , and fileN:
      (6.1) Judge the relation of the file of two adjacent data block
     (as Figures 5(b), 5(c), and 5(e));
      (6.2) Distribute data blocks in , and form .
          According to the position relation Z-shape filling adjacent
          data block cluster after that.
(7) If the intersection of  and the existed data in data
      buffer is empty, then load , otherwise load ,
      namely load data incrementally.
(8) The end.

6. Results and Discussion

We have researched on ten data sources and analyzed results in our experiment, in order to verify the model based on huge dataset in external storage and the validity of scheduling. These data sources contain GTOPO30 global elevation data, the elevation data of Jilin province, Zhujiang, and Tian-chi (crater) lake of the Changbai Mountains in China, Colorado Grand Canyon, Mount Rainier, Crater Lake, Puget Sound, Seattle, and Yakima in America. The sum amount of elevation data is about 8,111,735,217, and the one of texture is 21.37 G. We used a PC with a 3.0 GHz Pentium 4 CPU, 1 GB system memory, and a Geforce 5900fx graphic card. We also used VC++ as the IDE for software developing.

We have experimented with the amount of loaded data in memory and schedule time for 10 datasets by setting error threshold 1 and 2.5, respectively. The resolution of screen is , where the clipped window size of GTOPO30 and Zhujiang is , the one of Jilin province, Tian-chi crater, Colorado Grand Canyon, Puget Sound, and Mount Rainier is , and the one of Seattle, Yakima, and Crater Lake is .

Table 1 describes the result of the time and the amount of loaded data, when screen error is 0.5 pixels. There are four situations. The first one is loading data from physical block model directly. Without loss of generality, levels of detail are adopted as well, which is similar to pyramid model, but the data is the same resolution loaded in the same time. The second one is loading multiresolution data block through the FMRH structure in our logic model. The third one is loading data block by using the hierarchical cluster to index FMRH structure. The fourth one is loading data block by using cluster to index FMRH structure, with encoding and arranging nodes in cluster through space-filling curve.

Three results can be obtained from Table 1. On one hand, compared with using physical files, the data quantity in memory and the time of data schedule are reduced apparently after using the logic structure of FMRH. On the other hand, data quantity loaded in memory is invariant; however, the time of data schedule is shortened obviously, after using the data schedule of cluster analysis. In addition, it is significant to use space-filling curve on the base of cluster analysis, so that the time of data schedule is further shortened. The main reason for this result is that the physical model has been simplified according to the ups and downs of terrain by the structure FMRH. So a mass of redundant data is eliminated before loading in memory. It ensures loading small scale of multiresolution data into memory each time, so that the time of data schedule is decreased. Another reason for this result is that the cluster analysis is taken for data block of similar errors to the structure of FMRH when establishing the logic model. Locating the suitable cluster for data block, when data schedule is preceded, shortens the time of searching data block and realizes the high efficient data schedule. Finally, it is an efficient way which is proper to file mapping, taking advantage of space-filling curve to code and arrange data blocks in cluster. Meanwhile, owing to space-filling curve is local, the method can retrieve the local data blocks rapidly.

With the threshold of screen error increased, the quantity of data loaded in memory and schedule time decreases obviously. Data in Table 1 is the result of loading in memory at the first time, but in fact the incremental data schedule technology is adopted when it roams the scene. As a result, with the rate of frame increased, the time of data schedule and I/O response is decreased furthermore. With screen error threshold 2.5, data quantity loaded in different frames is shown in Table 2. Data quantity in first frame and the time of loading data are large, and the ones in 10th, 100th, and 1034th frame tend to be stable. Compared to the first frame, data quantity in other frames in memory is decreased obviously, and the efficiency of data schedule is increased significantly, because it adopts incremental data schedule.

7. Conclusions

This paper proposed the strategy of rapid data schedule based on cluster analysis, which improves the efficiency of I/O operation. It is very significant to expand terrain modeling method for large scale data in application of real engineering. More details follow.

Firstly, the cluster analysis method based on FMRH is proposed to solve the linearization problem of multiresolution terrain model. In details, each cluster satisfies two constraints. One is that data blocks owned similar error are classified into one cluster; the other is that the multiresolution data blocks in cluster covered the whole closed terrain spatial zone. Because data needs to satisfy some screen error threshold when it loads into memory, the first constraint conforms to the data requirement of simplified model in memory. The second constraint verifies loading data in each cluster as a unit, so that the situation of data block loss does not appear.

Moreover, the proposed strategy sorts each cluster through space-filling curve and encodes it. Because the nodes within the cluster are closed in space and obtained through the depth-first traversal. Small adjustments and local updating on the clustering can be mapped to form a one-dimensional sequence of contiguous spatial data blocks. The advantage of those strategies is that there is no need traversing the whole data set.

Finally, our strategy solves the problem of file-block splicing and adopts incremental data scheduling strategy to further reduce the amount of data loaded into memory when roaming the scene. As a result, our method improved the scheduling efficiency, which is important in real engineering. In our future work, the theory and idea about scheduling [26], planning [27], and phase transitions [28] will be introduced to improve the efficiency of terrain modeling further.

Acknowledgments

The authors are grateful for the helpful comments and suggestions of the reviewers, which have improved the presentation. This work is supported by the Research Fund for the Doctoral Program of Higher Education of China (no. 20100043120012) and supported by the national Natural Science Foundation of China for Young Scholars (no. 41101434).