Hardware Support for Adaptive Subdivision Surface Rendering M. B´ooy M. Amoryy M. Doggett yyy J. Hirche yyy W. Strasseryyy y Dept. of Electronic and Computer Engineering. University of Santiago de Compostela. Spain yy Dept. of Electronic and Systems. University of La Coru~na. Spain yyy WSI/GRIS, University of T¨ubingen. Germany Abstract Adaptive subdivision of triangular meshes is highly desirable for surface generation algorithms including adaptive displacement mapping in which a highly detailed model can be constructed from a coarse triangle mesh and a displacement map. The communication requirements between the CPU and the graphics pipeline can be reduced if more detailed and complex surfaces are generated, as in displacement mapping, by an adaptive tessellation unit which is part of the graphics pipeline. Generating subdivision surfaces requires a large amount of memory in which multiple arbitrary accesses are required to neighbouring vertices to calculate the new vertices. In this paper we present a meshing scheme and new architecture for the implementation of adaptive subdivision of triangular meshes that allows for quick access using a small memory making it feasible in hardware, while at the same time allowing for new vertices to be adaptively inserted. The architecutre is regular and characterized by an efficient data management that minimizes the data storage and avoids the wait cycles that would be associated with the multiple data accesses required for traditional subdivision. This architecture is presented as an improvement for adaptive displacement mapping algorithms, but could also be used for adaptive subdivision surface generation in hardware. 1 Introduction A significant rendering bottleneck for high performance graphics systems is the CPU to graphics processor bus. High quality surfaces require the transmission of millions of triangles over this bus reducing the performamce of the system. Reducing this communication requirement has recieved significant attention in recent years. Different attempts to improve the communication problem have been developed including: compression [5], simplification of the images complexity [3], interactive multiresolution meshes [6], and displacement mapping [7]. Displacement Mapping is an effective technique for encoding the high levels of detail of surface models through the utilization of coarse triangle meshes together with displacement maps. Extending the hardware rendering pipeline to be capable of handing displacement maps as geometric primitives will allow highly detailed models to be constructed without requiring a large amount of triangles to be passed from the CPU to the graphics pipeline. The displaced subdivision surface strategy proposed in [8] combines the non-adaptive Loop subdivision scheme [9] with the displacement map representation. The original mesh is simplified and the displacement map accordingly sampled in order to be sent to the rendering pipeline. Although in [8] the details for rendering the surface using hardware is not presented, extensions have to be added to the pipeline to subdivide the coarse mesh according to the Loop algorithm and afterwards, to apply the displacements to the resulting coordinates. A simple and regular hardware implementation for the Loop subdivision algorithm has been proposed in [2]. This solution exploits the piecewise polynomial structure of the Loop algorithm with the application of forward difference techniques. This hardware can be easily extended for other non adaptive subdivision algorithms which also generate piecewise polynomial surfaces but it can not be generalized to any adaptive subdivision scheme. Using Displacement Mapping to generate detailed triangle meshes can generate a large number of triangles for rendering. To reduce this number of triangles, an adaptive strategy was presented in [4]. Each triangle is conditionally subdivided as a function of the displacement map information. The hardware proposal is based on the tessellation of individual edges of each triangle by means of the local information of each edge. Using an adaptive subdivision the number of triangles to be rendered are reduced, but neighbouring triangle vertex information is not used in each subdivision step. Smoother surfaces can be obtained if more than just the local information is employed for the adaptive subdivision procedure. This results in the final detailed mesh being obtained in a more coherent manner in accordance with the displacement map surface. This requires that each triangle to be tessellated uses the information of the adjoining triangles. In order not to send each triangle more than once from the CPU to the graphics pipeline, its information should be maintained in the hardware unit. On the other hand, this information has to be stored taking into account that multiple data has to be accessed simultaneously per subdivision step. In this paper we present a methodology and an architecture for the implementation of the adaptive subdivision of surfaces in which neighbour information is employed. The solution we propose starts by grouping the neighbouring triangles in the CPU before sending them only once to the graphics pipeline. Each group is fully processed in the new hardware unit before sending the following group. The efficient management of the data we propose permits the simultaneous access to all the neighbour information required for each partitioning step, avoiding wait cycles. This is due to the data distribution in the memories and to the simple and compact mesh connectivity specification proposed. The mesh representation strategy, together with the regular hardware structure developed, makes this solution interesting not only in the displacement map environment, but also in any algorithm where an adaptive tessellation using neighbouring triangle vertices is required. In the next section, we present the adaptive tessellation algorithm based on neighbour information and the strategy employed to minimize the storage requirements. In Section 3 we introduce the methodology we propose for the efficient data management and, in Section 4, the architectural model we have employed for its implementation. In Section 5 we evaluate the algorithm. Finally, in Section 6 we present the main conclusions of the work. 2 Adaptive Tessellation based on Neighbour Information Displacement mapping requires a coarse triangle mesh that approximates the surface to be modeled with a displacement map containing the finer geometric detail. If the base triangle mesh has a coarser resolution than the displacement map, the mesh should be retessellated according to the displacement map. To improve the previous Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. HWWS '01 Los Angeles, CA USA ACM 2001 1-58113-407-X...$5.00 33 Vb (a) l Va b b k j i h g f e d c a k Interior Edges d1 = number neighb(a)P i=1 d(a neighb(a i))=number neighb(a) d2 = number neighb(b)P i=1 d(b neighb(b i))=number neighb(b) New point: Vab = d1 d1+d2 coord(a) + d2 d1+d2 coord(b) Border Edges New point: Vbk = 1 2 coord(b) + 1 2 coord(k) (b) Figure 1: (a) Neighbour information required for vertex insertion. (b) Coordinates computation Standard Graphics Pipeline Graphics Pipeline Unit Adaptive Tessellation CPU Grouping (a) 1 L start=NULL; 2 B = NULL; 3 v0= select a vertex(mesh); 4 while (vertices)f 5 build group(v0); 6 list updating(L start B); 7 if(L start==NULL) v=select a vertex(B); 8 else v= select last vertex(L start); 9 v0 = find group center(v); 10 g (b) Figure 2: (a) Generic Structure of the System (b) Grouping algorithm adaptive tessellation algorithm implemented in hardware [4], not only local information of each edge to be split should be considered, but the neighbour information as well. Smoother surfaces can be obtained using masks such as those employed in subdivision surfaces algorithms [10], Loop [9] and modified Butterfly [11]. For the example in Figure 1(a) these algorithms split the edge between vertices a and b (vertex Vab) taking into account not only the information of the extreme vertices a and b but also the information of vertices c d l. The utilization of these subdivision algorithms for the adaptive scheme implies the recomputation of the original displacement map values. This is due to the fact that the computed coordinates do not lay on the original triangles so the initial displacement map would not be appropriate for the processed mesh. If the displacement map is not recomputed, other division masks could be proposed. Figure 1(b) shows our initial proposal for the coordinates computation. For the edge between two interior vertices aand bthe coordinate of the new point Vab inserted in the edge is computed through the weighted sum of the a and b coordinates. The weights associated with a (d1) and b (d2) are obtained as a function of the average distance (d) of the neighbour vertices to the vertices aand b respectively. In case at least one vertex of the edge is on the border of the mesh we propose the utilization of the current working edge information, that is, d1 = d2 = 1 2 . For example, in the figure the vertex k is on the border, so the new vertex Vbk implies computations based on local information. In this paper the non recomputation of the displacement map was considered so the new mask was used. In any case, it is important to note that the methodology and architecture we proposed are generic and valid for all masks employing the first order neighbour infor- mation. Each triangle to be tessellated requires the information of the adjoining triangles to be processed. In order not to increase the communication between CPU and graphics pipeline the computation of the full fine mesh is computed using the new hardware added to the graphics pipeline. The solution we propose is schematically presented in Figure 2(a). The neighbour triangles have to be grouped in the CPU in order to be sent to the graphics pipeline. These groups of triangles are processed in an adaptive tessellation unit and then final rendering is performed by a standard graphics pipeline. The number of triangles in each group has to be determined as a function of the storage requirements during the tessellation operation. The utilization of only local information to the edge on the borders (Figure 1(b)) avoids possible cracking effects in the joining among groups but, obviously, the number of border edges have to be minimized in order to reduce the subdivisions in which only local information is employed. In the following subsections the proposed grouping algorithm and its connectivity representation are presented. A detailed analysis can be found in [1]. 2.1 Grouping Algorithm In this section the grouping algorithm to subdivide the full mesh into small groups is presented. These groups are sent from the CPU to the graphics pipeline where the adaptive tessellation is processed (Figure 2(a)). The grouping algorithm we propose reduces the number of bounding triangles, covers the full mesh in an efficient way and minimizes the amount of information to be sent. On the other hand, the mesh representation employed simplifies the storage and data management in the adaptive tessellation unit. The group construction algorithm is based on the selection of a central vertex v0 and the contiguous concentric triangle strips around this central vertex. The central vertices are selected in such a way that successive processing groups are contiguous and without holes (non-sent tri- 34 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 1 2 1 2 3 4 5 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 2324 2526 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 434445 46 (a) 444546 Level_Counter -> Child -> (b) 1 9 1521 9 5 3 1 3 3 2 2 3 1 0 3 3 2 3 1 3 3 2 3 3 2 2 3 2 1 2 3 4 5 6 7 8 9 10 111213141516171819202122232425 262728293031323334353637383940414243 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sibling -> 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Vertex -> Figure 3: (a) Group example (b) Group representation 3 15 16 17 III I II IV V V1 2V 3V 3 15 16 New_vertex1 New_vertex2 i i i 1 2 3 New_vertex3 (b) (c) 1 2 3 4 5 6 7 8 9 1011 12 13 14 15 16 17 19 20 21 22 2324 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 434445 46 47 55 57 59 60 61 63 64 65 67 68 81 82 83 84 85 86 87 88 58 53 72 73 74 75 76 7778 80 79 56 54 48 49 50 51 89 90 18 251 2 3 4 5 11 12 13 14 15 16 17 18 19 26 2827 29 30 31 32 33 34 35 36 (a) Figure 4: (a) Scheduling example (b) Notation (c) Example of tessellated group angles) between them. The collection of all identified groups constitutes a global group, already sent to the graphics pipeline, characterized by a list B of vertices in the boundary. The search for each central point v0 is based on the utilization of a set of starting points from the list B of vertices in the boundary. Specifically, each time a new group is computed, two new starting points are identified: the non-interior common vertices between the previous global group and the new group. Each starting point is employed as a seed for the identification of the central point of a new group. Taking as a reference this starting point each central point candidate v0 is analyzed in order to build up an adequately sized group contiguous to the global group. The basic algorithm [1] is outlined in Figure 2(b). For the first processing group a random central vertex v0 is selected (line 3). Then, the group of triangles around this vertex is selected (line 5) and the list of working starting points Lstart and the list B of vertices in the boundary of the global group are updated (line 6). To construct another group a starting point has to be selected (lines 7 and 8) and the next central vertex v0 identified (line 9). This grouping process (from lines 4 to 10) is repeated whileever there are still non processed triangles remaining in the mesh. 2.2 Group Representation In this subsection the group representation we propose is analyzed. It is based on the identification of strips of triangles around the central vertex v0 and the identification of their connectivity. For example we will look at the group of triangles of Figure 3(a). The central vertex v0 = 1 is surrounded by a first strip of triangles defined by the lists f1g and f2 3 9 10g meanwhile the second strip is defined by the interior list f2 3 9 10g and the exterior f11 12 24 25g and so on. These lists are indicated in consecutive rows of the Vertex list of Figure 3(b) where the number of elements in each row is stored in the Level Counter list. In order to specify the vertex connectivity two additional lists are included: Child, that specifies the connectivity among rows of vertices, and Sibling, that specifies the connectivity inside a row. In the i-th row of the Child list the connectivity between the ith and i + 1-th rows of the Vertex list is specified. Specifically, it stores the number of vertices of the i+ 1-th row that are connected to each vertex of the i-th row. Note that two contiguous vertices of the i-th row are connected to two contiguous groups of children of i + 1-th row with a common child between them. To clarify let us considerer the example of Figure 3. The connectivity among f2 3 9 10g and f11 12 25g (second and third rows in Vertexlist) is given by children = f5 3 1 3 3 2 2 3 1g (second line in Child list). This means that the vertex 2 (second row of Vertex list) is connected with the first five points in the third row (11 12 13 14 15), the following vertex 3, with three children, is connected to the last child of vertex 2 (vertex 15) and another two children (16 and 17). The following vertex, 4, with one child, is connected to vertex 17, and so on. To complete the description, the connectivity among vertices in the same row of the Vertex list has to be specified. It is enough to indicate the connectivity of each vertex with the following vertices on the row (connectivity with the previous points can be easily deduced). This information is stored in the Sibling list. If the i-th vertex of a row has connectivity 1 this indicates that it is connected to the i+1-th vertex of the same row. In the same way, a connectivity 0 indicates that the vertex is non-connected to the following vertices. On the other hand, a connectivity value j > 1 indicates that the i-th vertex is connected to the non contiguous i + j-th vertex. If the i-th vertex has connectivities 1 j1 j2 jn with jr 6= 0 1 (r = 1 n) then it is connected to the fi+1 i+j1 i+jng-th 35 0 1 1 1 1 1 1 1 2 1 2 2 2 1 2 2 2 2 2 2 2 2 48 1 47 49 2 3 4 5 6 7 8 9 10 5950 53 55 58 57565451 Figure 5: Detailed example of vertex classification procedure Test Ordering Strip Memory From CPU Processor Vertex Tesellated Mesh Figure 6: Block diagram of the architecture vertices of the row. As an example let us consider the connectivity of the fourth row f26 27 45 46g of the Vertex list. The sibling connectivities of 26 31 is 1, that is, each vertex is connected with the following one. But the connectivity of vertex 32 is 1 2 because it is connected to vertices 33 and 34. 3 Towards Hardware Implementation of Adaptive Tessellation The adaptive tessellation implies conditional subdivision of each edge [4]. In our new proposal, each new vertex computation implies the utilization of not only the coordinates of the extreme points of the edge, but also the corresponding coordinates of the neighbours of these vertices. So the neighbours information has to be stored in such a way that it can be efficiently accessed. Moreover and taking into account that the adaptive tessellation has to be performed repeatedly over the full mesh, the resulting mesh obtained in each iteration has to be efficiently stored in order to be reused in the following iteration. In this section the data storage and management we propose to solve this problem is presented. The following presentation focuses on the way of covering the mesh first and the storage method employed in order to recompute it follows. It is important to note that the algorithm we propose is based on the efficient mesh representation previously presented. According to this representation, the mesh is traversed through the computation of the edges of the triangles along each strip before starting with the following strip [1]. The strings of vertices are computed sequentially through the analysis of all vertices in each row. For each vertex v of each string all the children c0 cchildren;1, that is, all the connected vertices of the following string of vertices are analyzed. Firstly, the computations related with the edge connecting vertex v and the first child c0 are performed. After this, two edges have to be analyzed, the edge connecting the root vertex v with the new child c1 and the edge connecting the children c0 and c1. To clarify let us consider the example of Figure 4(a) where the adaptive tessellation of the group of Figure 3 is considered. The vertex 1 (first list) and its children (2 3 4 5 ) are analyzed first. After this the second list f2 3 4 5 g is processed starting from the first vertex 2 and its corresponding children (11 12 13 14 15). Let us suppose that these edges have been already processed (indicated in the figure with dark lines) and let us consider the computations related with the vertex 3 and its corresponding children (15 16 17). The edges to be computed and the order of computation are detailed in the Figure 4(a). The first edge to be computed (label I) involves the edges 3 ; 15. The following pair of edges to be computed are edge 3 ; 16 (label II) and edge 15 ;16 (label String Edge (3-15) Edge (3-16) 1 1 1 2 2 3 4 2 3 4 3 14 15 16 17 15 16 17 4 31 32 33 33 Table 1: Example of required neighbour information III). Then, the last pair of edges related with vertex 3 are computed: edge 3 ;17 (label IV) and 16 ;17 (label V). An important characteristic of this way of covering the mesh can be outlined from this example: the neighbours required to compute one edge are mostly reused to compute the following edge. For example, the necessary neighbours to compute the edges 3;15 and 3 ;16 are indicated in Table 1. Note that four groups of data, from four consecutive strings of vertices, are required for each edge. For this specific example new data is not required for the computation of the second edge. This method of reemploying the data permits, as shown later, a reduction in the number of accesses to the memory where the data is stored. To complete the presentation the tessellated mesh storage method is analyzed [1]. The new tessellated mesh has to be stored according to the concentric triangles strips notation previously presented. The algorithm we propose is basically a vertex classification in which the vertices are classified in strings. Each time an edge is processed, the new vertex candidate coordinate is determined and conditionally inserted. After this, the surrounding vertices are assigned to a string. In Figure 4(b) the notation and algorithm we propose is briefly introduced. The first child of vertex 3, that is, vertex 15, is being processed. Once the midpoint, New vertex1, is computed a test [4] has to be made in order to determine if the splitting is performed. The decision is labeled as i1, where i1 = 0 if there is no splitting and i1 = 1 if the edge is split. When the second child of vertex 3 (vertex 16) is considered, two new vertices have to be computed, New vertex2 with decision i2 and New vertex3 with decision i3. Once the decisions have been performed a classification procedure has to be realized, that is, the vertices that delimit the concentric triangle strips have to be identified. This requires the application of simple string classification rules (tabulated in [1]) where each vertex is classified on a string as a function of the previously computed vertices and their classification. Similar rules are also applied to compute the sibling and child connectivity. To clarify the algorithm let us considerer the mesh of Figure 3 and the tessellation indicated in Figure 4(c). The new vertices originated during the tessellation operation are indicated in small labels. 36 0 0 0 0 00 0 0 0 0 00 9 3 3 3 2 2 3 15 0 3 2 1 3 3 1 3 3 2 3 3 2 2 3 2 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 0 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1/2 1/2 . . . . . . . . . 000 001 010 011 100 000 001 010 011 100 000 001 010 011 100 0000 0000 0001 0010 0010 0011 0100 MASK AND BARREL SHIFTER MASK AND BARREL SHIFTER MASK AND BARREL SHIFTER Fc0 Fc1 Fc2 Fc3 Fs0 Fs1 Fs2 Fs3 Fv1 Fv2 Fv3 Fv0 1 11 2 12 13 14 15 16 17 18 19 20 3 4 5 6 7 8 9 10 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 1000000000 0111111111 1111111111 1111100000 0000011111 1111111111 1111110000 Mask Address String 0 String 1 String 2 String 3 Address 21 Mask ADDRESS MAP VERTEX MEMORY CHILD MEMORY SIBLING MEMORY Figure 7: Memory unit architecture Six strips of triangles were identified and marked in different grey levels. As an example of string classification let us analyze the processing of the original first string of triangles delimited by lists f1g and f2 10g of Figure 4(c) detailed in Figure 5. In this figure the vertices related with the original mesh are indicated in grey boxes, the new vertices in thinner boxes and the non inserted vertices with empty boxes. The string identifier1 of each vertex, obtained with the classification rules in [1] is indicated as a number over the corresponding box. The classification method has an easy intuitive interpretation: the string identifier of all roots of a vertex are analyzed so that the minimum value plus one is assigned as its string identifier. As an example, all the children of the vertex 1, of the string 0, are assigned to the string 1. 4 Adaptive Tessellation Architecture Figure 6 shows the block diagram of the architecture for adaptive tessellation. The architecture has four modules: 1) Memory unit, 2) Vertex processor, 3) Test unit and 4) Ordering strip unit. The incoming mesh is stored in a Memory unit in order to be processed. This Memory unit is also employed for the storage of the intermediate processed meshes and also for the management of the data. It provides four consecutive strings of vertices (see Table 1 example) to the second unit (Vertex processor) in such a way that the wait cycles of this unit are minimized. The strings of vertices are processed by the Vertex processor where the coordinates of the vertex candidates are obtained. These coordinates are delivered to the Test unit [4] to decide if each edge is split. The decisions together with the corresponding coordinates (if the splitting is performed) are introduced in a final unit, the Ordering strip unit, where the vertices are reordered in strings of concentric vertices. The final mesh is stored in the first unit in order to be processed again. The recursive processing of the mesh is repeated a given number of iterations (defined by the user) or until the mesh is fully tessellated accordingly to the displacement map. Due to length constraints in the following subsections only the structure of the Memory and Order- 1The row number of the vertex in the current mesh group data structure. ing strip units are briefly introduced although an extended revision of all modules can be found in [1]. 4.1 Memory Unit To process each edge of the mesh, the Vertex processor requires the information of all the neighbours around the extreme vertices of the edge. All edges are processed covering a strip of triangles before starting with a new strip. This implies that all data of the four contiguous strings of vertices (see Table 1 example) will be required consecutively before starting with a new strip of triangles. Then, the data has to be stored in such a way that an easy access to the strings of vertices could be developed. The architecture we propose is depicted in Figure 7. Three different memories (Vertex, Child and Sibling) are employed to store the vertices, the corresponding children and the sibling information. As an example the data indicated in the figure corresponds to the group example of Figure 3. In order to optimize the storage utilization we propose m+ 1 word memories where m is the maximum number of vertices connected to a given one. The information has to be stored in a coherent way in the three memories. The best storage would imply that the information associated with consecutive vertices of a string would be in consecutive positions of the memory. But this distribution is possible only for the initial mesh (coming from the CPU) as during the subdivision the vertices of each string are not obtained in a consecutive way. A set of memories (indicated as address maps in the figure) to store the addresses of successive sections corresponding to the same string is employed. Specifically, the address map of the data corresponding to each string i of vertices is stored in a String i memory. The reason for this storage scheme is detailed in the next subsection. As usual the number of vertices in each string section does not fit with the word size of the memory, so a mask should be employed to indicate the part of the word that stores the correct information. For example, the string 2 of vertices of Figure 7 is stored in addresses 0001 with mask f1111111111g and 0010 with mask f1111100000g. In the vertex memory the information relative to vertices f11 12 25g is identified. Taking into account that the neighbours information can be 37 . . . 0001110000 0000 1111000000 1111111000 0110 0001 . . . 1011 0110 0000110000 0000011000 . . . String 4 0011 1001 1010 0110 1111000000 1111000000 0000010000 0000100000 01100000000100 1000000000 1011 . . . String 5 10001111111100 . . . String 6 01110000001110 1 59 2 4 50 5 7 53 8 55 9 58 10 47 3 48 49 6 51 54 56 57 11 60 61 63 65 68 67 15 16 90 17 18 20 19 72 21 73 22 24 78 75 74 76 23 77 79 27 82 28 46 80 26 81 83 84 29 30 85 31 88 32 89 86 87 33 34 35 36 37 38 39 40 41 42 43 44 0000 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 25 12 64 13 14 0001 Mask Address String 0 1000000000 0000 String 1 1111111110 0010 String 2 0000 0101 0110000000 1110000000 String 3 0100 1010 0101 1010 0101 1011 0111 1111110000 1111100000 0000010000 0000001100 0000001000 0111000000 1111110000 0000100000 01010001100000 1001 ADDRESS MAP VERTEX MEMORY Figure 8: Example of storage of a tessellated mesh reemployed for successive computations (see Table 1 example) the memory system can be optimized through the utilization of a system of FIFO (first-in-first-out) queues. As four strings of vertices are involved in each edge computation, four FIFO queues have to be employed. Specifically, three different sets of four FIFO queues are required for keeping the vertex (Fv), the children (Fc) and the siblings (Fs) information. To select the adequate part of the m+1word memory (mask information) a set of normalization units (barrel shifters) are employed. The FIFO system, together with the reutilization of the data assures an efficient management avoiding any wait cycles for the Vertex processor [1]. 4.2 Ordering Strip Unit The ordering strip unit performs the vertex strings classification and computes the children and sibling connectivities accordingly to the simple rules tabuled in [1]. The strips of triangles are processed so the new vertices have to be classified and adequately stored in the memory system. Due to length constrains we will focus the presentation on the memory information updating procedure. For the computation of each strip of triangles the strings of vertices of the tessellated system are obtained in sections. This can be observed in Figure 5 were the computation of consecutive triangles of the strip generates a non complete string of vertices with a identifier 2, that is, the string 2 is obtained in sections. Specifically, the computation of each triangle generates vertices of at most two strings (together with the already computed root vertices) and, it can be ensured that the computation of the contiguous triangle of the same strip will maintain the continuity of, at least, one of the vertex strings. Then, two FIFO queues can be employed for storing the one/two strings under construction. Once a string is broken, the content of the FIFO queue is stored in the memory and the corresponding address map updated. A procedure was developed [1] to reserve memory positions for storing the addresses of consecutive strings sections not already computed. As an example the final memory distribution after computing all the group (Figure 4(c)) is indicated in Figure 8. The different sections of strings stored in the vertex memory are indicated with dashed boxes. According to this storage, the address map memories are coherently updated in such a way that each string i of vertices can be reconstructed by reading the consecutive addresses (of the vertex memory) indicated in the String i memory. The same reconstruction operation can be performed over the memories of children and siblings in order to obtain the Child and Sibling lists. As an example let us reconstruct the string 3 corresponding to the tessellated mesh. In order to do this the seven addresses indicated in the corresponding String 3 memory have to be accessed. The stored information corresponding to the consecutive addresses is: f11 60 61 63 65 68g, f86 87 33 34 35g, f18g, f37 38g, f20g, f40 41 42g, f75 74 76 23 77 79g, which corresponds to the fourth string of vertices of Figure 4(c). 5 Results The methodology we propose permits the communication requirements between the CPU and the graphics pipeline to be reduced by performing complex surface generation in hardware. The classification unit is mainly compounded of memories and FIFO queues and simple additional hardware (adders, substractors, minimum and maximum selectors) for the vertex classification into strings. This reduces the communication problem with a reasonable increase in hardware requirements. The surface generation possibilities achieved through the introduction of this unit contrast with its simplificity. Let us suppose a group with ntriangles is to be processed a number i of iterations. The proposed system ensures that each edge is computed only once and that the processing rate is one edge per cycle. If the border of the group is not taken into account, each edge is shared by two triangles which implies, in the first iteration, the computation of n 3 2 edges. This is performed in n 3 2 cycles. If all the triangles are fully subdivided in each iteration, 4i n triangles are finally obtained. This leads to the maximum computational rate of the system given by: Rmax = 4i 3 2 i;1P j=0 4j triangles=cycle On the other hand the minimum computational rate is obtained if no triangles are subdivided in any iteration: Rmin = 1 3i=2 triangles=cycle The computational rate values of any group is in the interval [Rmin Rmax]. As an example, for 3 iterations Rmin = 0:22 38 (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 9: Donut, Volker's head and Crater images. (a), (d), (g) shows the original coarse meshes, (b) (e), (h) the adaptively subdivided meshes and (c) (f), (g) final meshes with displacements. and Rmax = 2:03 rates are obtained. Note that the Rmin value corresponds to a non practical case in which no subdivision is performed, which means that the original mesh is very detailed. The system also has no recursive structure (only for the recomputation of the groups), so it can be pipelined with a minimum cycle period. We have implemented our adaptive tessellation solution (Figure 6) through a software simulation. We have rendered several models to demonstrate the performance of the solution, the advantages/disadvantages of an adaptive tessellation based on neighbour information and the repercussions of the grouping strategy. In the following results 3 iterations per mesh were employed. Three different models have been rendered using the algorithm and are shown in Figure 9. Specifically, we have worked with a donut displacement over a flat square plane (Figures 9 (a),(b) and (c)), the Volker Blanz's head displacement map over a cylinder mesh (Figures 9 (d),(e) and (f)) and the Crater Lake also over a flat square plane (Figures 9 (g),(h) and (i)). In the first column (Figures 9 (a),(d) and (g)), the original coarse meshes are depicted, in the second column (Figures 9 (b),(e) and (h)) the meshes subdived according to the displacement map are shown and on the third column (Figures 9 (c),(f) and (i)) the final images after applying the displacements are indicated. While the original meshes, to be sent from the CPU to the graphics pipeline, have 200 triangles the final meshes have 1242, 4389 and 6379 triangles respectively. This method allows for high quality images to be obtained with very low transmision requirements. The adaptive structure of the subdivision can be observed in all images. For example the donut is less subdivided in regions close to the top where no detail is required. Note that the transition between high and low detailed regions is soft, resulting in smooth surfaces. The images correspond to the utilization of only 2 rings. This size was selected as a tradeoff between image quality and memory requirements as only slight differences are obtained using a greater number of rings per group. On the other hand, improvements are expected for other coordinate computation schemes alternative to the one proposed in Figure 1(b). Due to the repercussions on the memory requirements of the graphics pipeline the number of triangles generated per group has to be analyzed. In Figure 10 the number of triangles per group when 2 rings are employed is detailed. In Figure 10(a) the number of original triangles (information sent from the CPU to the pipeline) per group for the two original meshes employed are indicated. It can be deduced that the grouping algorithm we propose permits coverage of the original image with a balanced number of triangles per group. Obviously, when the number of non sent triangles is lower (high group number) the possibility of constructing a group with the full rings is lower, so lower numbers of triangles are ob- 39 square cylinder Numberoftriangles Group Group (a) (b) Donut Crater VolkerHead Numberoftriangles 4 6 8 10 12 142 30 25 20 0 200 400 600 800 1000 1200 2 4 6 8 10 12 14 15 10 5 0 Figure 10: Number of triangles per group (of 2 rings) (a) Original mesh (b) Subdivided mesh tained. On the other hand, in Figure 10(b) the number of triangles generated with the adaptive tessellation procedure are indicated. It is important to note that these results permit the estimation of the memory requirements. As we previously mentioned a greater number of rings would increase the memory size, but without important quality results. 6 Conclusions In this paper we have presented the architecture for an additional unit of the standard graphics pipeline to enable the rendering of subdivision surfaces in hardware. Our proposal permits the adaptive subdivision of each triangle based on the neighbour information of each edge. The method is based on the grouping of neighbour triangles in the CPU to be sent once to the graphics pipeline. The architecture we propose also permits a quick access to all the required information so that no wait cycles are required for each subdivision. To reduce the memory requirements a new mesh grouping scheme was presented. The grouping process we propose minimizes the number of exterior triangles and ensures a good covering of the full mesh at the same time. This grouping scheme could also be employed in other applications as, for example, non adaptive subdivision surface rendering. The architecture is simple and regular, and is mainly composed of a memory and a set of FIFO queues. The efficient memory storage method employed together with the new mesh codification proposed, permits the exploitation of the temporal and spatial locality of the data. The storage is performed in such a way that all the neighbours information required for computing each edge subdivision can be accessed simultaneously. As the methodology and architecture proposed permits an efficient management of irregular meshes the scheme can be easily extended to other applications in which meshes have to be recursively computed and stored. The biggest advantage of our proposal is the development of a new memory scheme for the storage of irregular meshes in which the access to multiple neighbour information does not require wait cycles. This permits the hardware implementation of adaptive subdivision surface rendering employing neighbouring information. Acknowledgements This work was partially developed during the stay of Dr. M. Amor and Dr. M. B´oo in the WSI/GRIS Department, University of T¨ubingen, Germany (stay supported by "Secretar´ia Xeral de Investigaci´on e Desenvolvemento", Xunta de Galicia, Spain). The work has been also partially funded by the SFB grant 382 of the German Research Council (DFG). Thanks to Volker Blanz for use of the digitized model of his head. References [1] M. Amor, M. B´oo, M. Doggett, J. Hirche, and W. Strasser. A Meshing Scheme for Memory Efficient Adaptive Rendering of Subdivision Surfaces. Technical Report WSI-2000- 21, Universit¨at T¨ubingen, WSI/GRIS (http://www.gris.unituebingen.de), 2000. [2] S. Bischoff, L.P. Kobbelt, and H-P. Seidel. Towards Hardware Implementation of Loop Subdivision. In SIGGRAPH/Eurographics Hardware Workshop, pages 41­50, 2000. [3] J. Cohen, M. Olano, and D. Manocha. Appearance-Preserving Simplification. In SIGGRAPH 98 Conference Proceedings, Annual Conference Series, pages 115­122. ACM SIGGRAPH, 1998. [4] M. Doggett and J. Hirche. Adaptive View Dependent Tessellation of Displacement Maps. In SIGGRAPH/Eurographics Hardware Workshop, pages 59­66, 2000. [5] S. Gumhold and W. Strasser. Real Time Compression of Triangle Mesh Connectivity. In SIGGRAPH 98 Conference Proceedings, Annual Conference Series, pages 133­140. ACM SIGGRAPH, 1998. [6] I. Guskov, K. Vidimce, W. Sweldens, and P. Schr¨oder. Normal Meshes. In SIGGRAPH 00 Conference Proceedings, Annual Conference Series, pages 95­102. ACM SIGGRAPH, 2000. [7] V. Krishnamurthy and M. Levoy. Fitting Smooth Surfaces to Dense Polygon Meshes. In SIGGRAPH 96 Conference Proceedings, Annual Conference Series, pages 313­324. ACM SIGGRAPH, 1996. [8] A. Lee, H. Moreton, and H. Hoppe. Displaced Subdivision Surfaces. In SIGGRAPH 00 Conference Proceedings, Annual Conference Series, pages 85­94. ACM SIGGRAPH, 2000. [9] C. Loop. Smooth Subdivision Surfaces Based on Triangles. PhD thesis, University of Utah, Dept. of Mathematics, 1987. [10] P. Schr¨oder and D. Zorin (Organizers). Subdivision for Modeling and Animation. In SIGGRAPH 00 Course Notes. ACM SIGGRAPH, 2000. [11] D. Zorin, P. Schr¨oder, and W. Sweldens. Interpolating Subdivision for Meshes with Arbitrary Topology. In SIGGRAPH 96 Conference Proceedings, Annual Conference Series, pages 189­192. ACM SIGGRAPH, 1996. 40