PV227 GPU programming Marek Vinkler Department of Computer Graphics and Design PV227 GPU programming 1 / 56 Tools shaders are just strings → any editor you desire, RenderMonkey (http://developer.amd.com/ resources/archive/archived-tools/ gpu-tools-archive/rendermonkey-toolsuite/), FX Composer (https://developer.nvidia.com/fx-composer), OpenGL Shader Designer (http: //www.opengl.org/sdk/tools/ShaderDesigner/), and many more, mostly discontinued, shader programming got diverse, only IDEs for specialized tasks. PV227 GPU programming 2 / 56 Tools NVIDIA NSight, for Registered developers, AMD CodeXL, directly downloadable, gDEBugger (http://www.gremedy.com/), directly downloadable, up to OpenGL 3.2 VS2010, use what is already there, syntax highlighting, IntelliSense. PV227 GPU programming 3 / 56 Project setup create folder H:\PV227 (not Desktop, Documents, . . . ), crate subfolders Templates and Final, unzip the libraries into the Templates folder (optionally also to the Final), unzip the source codes into these folders, launch the projects with Ctrl-F5 (keeps the console open). PV227 GPU programming 4 / 56 GLUT multiplatform windowing system for OpenGL, not updated, alternatives exist: FreeGLUT (http://freeglut.sourceforge.net/), download built libraries at http://www.transmissionzero.co.uk/software/ freeglut-devel/. PV227 GPU programming 5 / 56 GLEW library for accessing OpenGL core and extension functionality, download built libraries at http://glew.sourceforge.net/. PV227 GPU programming 6 / 56 VS2010 setup Project properties → Set All Configurations: VC++ Directories, Include Directories: \freeglut\include;\glew-1.10.0\include; Library Directories: \freeglut\lib;\glew-1.10.0\lib\Release\Win32; Debugging, Environment: PATH=\freeglut\bin;\glew- 1.10.0\bin\Release\Win32; PV227 GPU programming 7 / 56 VS2010 setup Syntax highlighting: Tools → Options, Text Editor → File Extension, add vert, geom, frag with Microsoft Visual C++ syntax, update usertype.dat in the VS2010 directory C:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE. PV227 GPU programming 8 / 56 Workflow same operation exactly once for every vertex/patch/primitive/fragment, independent states, no communication, program is for the entire pipeline, data can be passed between shaders. PV227 GPU programming 9 / 56 Workflow Which shader to use for a given task? depends on the modified data, per vertex → vertex shader, per patch → tessellation shader, per primitive → geometry shader, per fragment → fragment shader, no idea → compute shader. PV227 GPU programming 10 / 56 Workflow Which shader to use for a given task? may depend on special properties of the processors: cancel computation → fragment or geometry shader, some build-in functions are defined only for certain processors. PV227 GPU programming 11 / 56 Workflow Shaders replace entire fixed pipeline. If we want to modify the vertex transformation behaviour, we also have to write code for lighting, texture generation, . . . This may be tedious when small changes are desired. In bigger projects you usually rewrite it anyway ;-). PV227 GPU programming 12 / 56 Vertex processor Replaces the following fixed functionality: Vertex transformation by modelview and projection matix. Texture coordinates transformation by texture matrices. Transformation of normals to eye coordinates. Rescaling and normalization of normals. Texture coordinate generation. Per vertex lighting computations. Color material computations. Point size distance attenuation. PV227 GPU programming 13 / 56 Vertex processor Fixed functionality applied to the result: Perspective division on clip coordinates. Viewport mapping. Depth range scaling. View frustum clipping. Front face determination. Culling. Flat-shading. Associated data clipping. Final color processing. PV227 GPU programming 14 / 56 Vertex processor Figure: Scan from OpenGL Shading Language 3rd edition PV227 GPU programming 15 / 56 Input data vertex attributes (user-defined), uniforms (built-in, user-defined), textures, special built-in variables. PV227 GPU programming 16 / 56 Attributes user-defined per vertex data, consist of a number of indexed locations called current vertex state, limited number of attributes, attributes are set with glVertexAttrib family of functions, one indexed location can hold a quadruple, matrix attributes are stored in column-major order in succesive attribute locations, the same value can be set for all vertices (that do not have it otherwise specified). PV227 GPU programming 17 / 56 Attributes void glBindAttribLocation(GLuint program, GLuint index, const GLchar ∗name); program − the handler to the program. index − index of the generic vertex attribute to be bound. name − string containing the name of the vertex shader attribute variable to which index is to be bound. Used before linking to set the attribute name-index pairing. Automatic assignment of index+1, [index+2, [index+3]] for matrix name. Reserved variables (gl_*) must not be bound this way. May set the pairing of attributes from the same array for different shaders consistently. PV227 GPU programming 18 / 56 Attributes GLint glGetAttribLocation(GLuint program, const GLchar ∗name); program − the handler to the program. name − string containing the name of the vertex shader attribute variable to be queried. Used after linking to get the attribute name-index pairing. For matrix name the returned index is for the first column (index+1, [index+2, [index+3]]). For non-existent attributes or reserved variables (gl_*) −1 is returned. PV227 GPU programming 19 / 56 Attributes void glEnableVertexAttribArray(GLuint index); void glDisableVertexAttribArray(GLuint index); index − index of the generic vertex attribute to be enabled/disabled. Enabled/disable vertex attributes for use in the draw calls. By default all generic attributes are disabled. PV227 GPU programming 20 / 56 Attributes void glVertexAttribPointer (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const GLvoid ∗pointer); void glVertexAttribIPointer (GLuint index, GLint size, GLenum type, GLsizei stride, const GLvoid ∗ pointer); index − index of the generic vertex attribute to be modified. size − the number of components of the generic attribute (1|2|3|4) . type − the type of each component. normalized − whether fixed−point data should be normalized. stride − byte offset between consecutive vertex attributes. pointer − offset of the first attribute in the buffer bound to GL_ARRAY_BUFFER target. Specifies the location and format of vertex attributes. The I variant passes integer attributes unchanged. PV227 GPU programming 21 / 56 Vertex arrays and buffers All attributes are bound to a single vertex array object (VAO). This VAO consists of a number of buffers holding the individual attributes. The VAO holds all the information for the draw call e.g. glDrawArrays or glDrawElements. PV227 GPU programming 22 / 56 Vertex arrays and buffers 1 GLuint vao ; 2 3 / / Create the VAO 4 glGenVertexArrays (1 , &vao ) ; 5 glBindVertexArray ( vao ) ; 6 7 / / Create buffers f o r our vertex data 8 GLuint buffers [ 2 ] ; 9 glGenBuffers (2 , buffers ) ; 10 11 / / Vertex coordinates buffer 12 glBindBuffer (GL_ARRAY_BUFFER, buffers [ 0 ] ) ; 13 glBufferData (GL_ARRAY_BUFFER, sizeof ( ve rt ic es ) , vertices , GL_STATIC_DRAW) ; 14 glEnableVertexAttribArray (VERTEX_COORD_ATTRIB) ; 15 g l V e r t e x A t t r i b P o i n t e r (VERTEX_COORD_ATTRIB, 4 , GL_FLOAT, 0 ,0 ,0) ; 16 17 / / Index buffer 18 glBindBuffer (GL_ELEMENT_ARRAY_BUFFER, buffers [ 1 ] ) ; 19 glBufferData (GL_ELEMENT_ARRAY_BUFFER, sizeof ( faceIndex ) , faceIndex , GL_STATIC_DRAW) ; 20 PV227 GPU programming 23 / 56 Vertex arrays and buffers (cont.) 21 / / Unbind the VAO 22 glBindVertexArray (0) ; 23 24 . . . 25 26 / / Render VAO 27 glBindVertexArray ( vao ) ; 28 glDrawElements (GL_TRIANGLES, faceCount ∗3 , GL_UNSIGNED_INT, 0) ; PV227 GPU programming 24 / 56 Uniforms user-defined: read-only in all shaders, constant per draw call, changed per primitive at most (not recommended for performance), can be initialized inside the shader, location indices are assigned during link, limited number of uniforms (both build-in and user-defined), uniforms can be grouped into named blocks. PV227 GPU programming 25 / 56 Uniforms all variables outside named block are in default block, sampler variables must be in default block, cannot be used for another program, advantageous for variables tied to an individual shader/program. PV227 GPU programming 26 / 56 Uniforms GLint glGetUniformLocation(GLuint program, const GLchar ∗name); program − the handler to the program. name − string containing the name of uniform variable to be queried. Returns the memory location of a uniform variable. Must be called after linking the program (location may change with each link). Not usable for structures, arrays, subcomponents of vectors and matrices. For structures and arrays, its elements can be set with “.” and “[]”. For non-existent uniforms or reserved names (gl_*) −1 is returned. PV227 GPU programming 27 / 56 Uniforms during link uniforms are set to 0, their value can be modified only when their program is used, the values are preserved when the program is switched off and on, uniforms are set with glUniform family of functions. PV227 GPU programming 28 / 56 Uniforms void glUniform{1|2|3|4}{ f | i | ui }(GLint location , TYPE v); location − the location of the uniform variable. v − 1|2|3|4 component value of the uniform. void glUniform{1|2|3|4}{ f | i | ui}v(GLint location , GLsizei count, const TYPE∗ v); location − the location of the uniform variable. count − number of array elements to be specified. v − array of values to be loaded. void glUniformMatrix{2|3|4|2x3|3x2|2x4|4x2|3x4|4x3}fv(GLint location, GLsizei count, GLboolean transpose, const GLfloat∗ v); location − the location of the uniform variable. count − number of matrices to be specified. transpose − load from row major order? v − array of values to be loaded. PV227 GPU programming 29 / 56 Uniforms types and sizes of the uniform variables must match the functions, locations for array elements and other variables cannot be computed: loc("A[n]") != loc("A")+n. PV227 GPU programming 30 / 56 Uniforms 1 uniform s t r u c t 2 { 3 s t r u c t 4 { 5 f l o a t a ; 6 f l o a t b [ 1 0 ] ; 7 } c [ 2 ] ; 8 vec2 d ; 9 } e ; 1 loc1 = glGetUniformLocation ( prog , "e . d" ) ; / / v a l i d : vec2 2 loc2 = glGetUniformLocation ( prog , "e . c [ 0 ] " ) ; / / i n v a l i d : s t r u c t 3 loc3 = glGetUniformLocation ( prog , "e . c [ 0 ] . b" ) ; / / v a l i d : array 4 loc4 = glGetUniformLocation ( prog , "e . c [ 0 ] . b [ 2 ] " ) ; / / v a l i d : array element 5 6 glUniform2f ( loc1 , 1.0 f , 2.0 f ) ; / / v a l i d : vec2 7 glUniform2i ( loc1 , 1 , 2) ; / / i n v a l i d : not ivec2 8 glUniform2f ( loc1 , 1.0 f ) ; / / i n v a l i d : not f l o a t 9 glUniform2fv ( loc3 , 10 , &f ) ; / / v a l i d : b [ 0 ] (+10) 10 glUniform2fv ( loc4 , 10 , &f ) ; / / i n v a l i d : out of range 11 glUniform2fv ( loc4 , 8 , &f ) ; / / v a l i d : b [ 2 ] (+8) PV227 GPU programming 31 / 56 Samplers only glUniform1i and glUniform1iv can be used to load samplers, the loaded value is the index of the texture unit to be used, the same unit cannot be loaded into samplers of different types. PV227 GPU programming 32 / 56 Special built-in variables gl_VertexID – implicit vertex index passed by e.g. DrawArrays, gl_InstanceID – implicit primitive index passed by instanced draw calls e.g. glDrawArraysInstanced, PV227 GPU programming 33 / 56 Output data special built-in variables, varying variables (user-defined), PV227 GPU programming 34 / 56 Special built-in variables in vec4 gl_Position; homogeneous position in clip space (modelview, projection), must be set, used by the rest of the pipeline, in float gl_PointSize; size of the rasterized points, must be set if points are rasterized, in float gl_ClipDistance []; array of distances to user clipping planes, must be set if user clipping is enabled. PV227 GPU programming 35 / 56 Varying variables passed from vertex processor to rasterizer, anything can be passed, more variables can be outputed than used by follow-up shader, interpolation type can be set, limited number of interpolated values. PV227 GPU programming 36 / 56 Vertex processor example Project triangle! Rotate and project triangle! PV227 GPU programming 37 / 56 Geometry processor Optional (no fixed pipeline equivalent). Receives assembled primitives, outputs zero (culling) or more primitives. May receive adjacency information. The type of input and output primitives need not match (triangles → points). Designed for moderate geometry amplification, not tessellation. PV227 GPU programming 38 / 56 Geometry processor Input primitives: points, lines, lines_adjacency, triangles, triangles_adjacency. Output primitives: points, line_strip, triangles_strip. PV227 GPU programming 39 / 56 Input data interpolated varying variables (built-in, user-defined), uniforms (built-in, user-defined), textures, special built-in variables. PV227 GPU programming 40 / 56 Varying variables build-in and user-defined varying variables for each vertex, in the form of array of structures (user-defined or gl_PerVertex), definition must match vertex shader. PV227 GPU programming 41 / 56 Uniforms defined the same way as for vertex shader, can be the same set of variables as in vertex shader, no need to setup uniforms for each shader, limited number of uniforms (both build-in and user-defined). PV227 GPU programming 42 / 56 Output data same output as the vertex shader, definition of primitives, special built-in variables, varying variables (user-defined). PV227 GPU programming 43 / 56 Fragment processor Replaces the following fixed functionality: Texture environments and texture functions. Texture application. Color sum. Fog. PV227 GPU programming 44 / 56 Fragment processor Fragment shader does not change the following operations: Texture image specification. Alternate texture image specification. Compressed texture image specification. Texture parameters that behave as specified even when a texture is accessed from within a fragment shader. Texture state and proxy state. Texture object specification. Texture comparison modes. PV227 GPU programming 45 / 56 Fragment processor Figure: Scan from OpenGL Shading Language 3rd edition PV227 GPU programming 46 / 56 Input data interpolated varying variables (built-in, user-defined), uniforms (built-in, user-defined), textures, special built-in variables. PV227 GPU programming 47 / 56 Varying variables in vec4 gl_FragCoord; window coordinate position (xy), fragment depth (z), in bool gl_FrontFacing; whether the fragment originated from front facing primitive, in vec2 gl_PointCoord; position of the fragment (only for point primitives), user defined varying variables, definition must match vertex/geometry shader. PV227 GPU programming 48 / 56 Uniforms defined the same way as for vertex/geometry shader, can be the same set of variables as in vertex/geometry shader, no need to setup uniforms for each shader, limited number of uniforms (both build-in and user-defined). PV227 GPU programming 49 / 56 Output data special built-in variables, user-defined output, PV227 GPU programming 50 / 56 Special built-in variables out float gl_FragDepth; replaces fragment depth (can be also discarded), fragments x,y position cannot be changed, PV227 GPU programming 51 / 56 User-defined output output color or discard fragment, multiple buffers may be updated. PV227 GPU programming 52 / 56 User-defined output void glDrawBuffers(GLsizei n, const GLenum ∗bufs); n − number of render targets. bufs − array of output buffers . sets the output rendering targets, void glBindFragDataLocation(GLuint program, GLuint colorNum, const char ∗name); program − the handler to the program. colorNum − the color number to bind the user−defined varying out variable to. name − the name of the varying out variable whose binding to modify. the index of the target as specified in glDrawBuffers, also possible to set from shader code. PV227 GPU programming 53 / 56 Fragment processor example Shade triangle! PV227 GPU programming 54 / 56 “Advanced” example Rotate triangle. Set varying attribute (color). Draw inverse color. PV227 GPU programming 55 / 56 Build-in constants values accessible from OpenGL API by glGet, give minumum value for OpenGL conforming implementation. 1 const i n t gl_MaxVertexAttribs = 16; 2 const i n t gl_MaxVertexUniformComponents = 1024; 3 const i n t gl_MaxFragmentUniformComponents = 1024; 4 . . . glGetIntegerv(GL_MAX_{VERTEX|GEOMETRY|FRAGMENT} _UNIFORM_COMPONENTS, &nComponents); glGetIntegerv(GL_MAX_VARYING_FLOATS, &nFloats); glGetIntegerv(GL_MAX_VERTEX_ATTRIBS, &nAttribs); glGetIntegerv(GL_MAX_DRAW_BUFFERS, &nBuffers); PV227 GPU programming 56 / 56