One disclosed embodiment includes memory allocation methods for use by a graphics processing unit in rendering graphics data for display. The method includes receiving a buffer attachment associated with a first rendering pass. The hardware prerequisites for operation of the first rendering pass is determined. The method also includes receiving an indication to not allocate system memory for the received buffer attachment. Thereafter, it may be determined whether the received buffer attachment will be loaded from or stored to by the subsequent rendering passes. If it is determined that the buffer attachment will be accessed by the subsequent rendering passes, an error message may be generated indicating that system memory must be allocated. If it is determined that the buffer attachment will not be accessed by the subsequent rendering passes, the buffer attachment is rendered without allocating system memory.
|
1. A method of allocating memory during graphics processing, the method comprising:
receiving a buffer attachment associated with a first rendering pass;
determining hardware prerequisites for operation of the first rendering pass;
receiving an indication not to allocate memory on a system memory for the received buffer attachment;
determining whether the received buffer attachment will be loaded from or stored to by a subsequent rendering pass;
allocating, in response to determining that the received buffer attachment will not be loaded from or stored to, sufficient memory space on an on-chip memory;
rendering, in response to determining that the received buffer attachment will not be loaded from or stored to, the received buffer attachment without allocating memory on the system memory for the received buffer attachment; and
generating, in response to determining the received buffer attachment will be loaded from or stored to, an error message indicating that memory must be allocated on the system memory for the received buffer attachment.
8. A non-transitory computer readable medium comprising instructions stored thereon to support graphics processing, wherein the instructions, when executed, cause one or more processors to:
receive a buffer attachment associated with a first rendering pass;
determine hardware prerequisites for operation of the first rendering pass;
receive an indication not to allocate memory on a system memory for the received buffer attachment;
determine whether the received buffer attachment will be loaded from or stored to by a subsequent rendering pass;
allocate, in response to a determination that the received buffer attachment will not be loaded from or stored to, sufficient memory space on an on-chip memory;
render, in response to the determination that the received buffer attachment will not be loaded from or stored to, the received buffer attachment without allocating memory on the system memory for the received buffer attachment; and
generate, in response to a determination the received buffer attachment will be loaded from or stored to, an error message indicating that memory must be allocated on the system memory for the received buffer attachment.
14. An apparatus, comprising:
a processing device comprising a central processing unit (CPU) and a graphics processing unit (GPU);
a system memory;
an on-chip memory;
a processor embedded in the processing device, configured to execute program code stored in the system memory to:
receive a buffer attachment associated with a first rendering pass;
determine hardware prerequisites for operation of the first rendering pass;
receive an indication not to allocate memory on the system memory for the received buffer attachment;
determine whether the received buffer attachment will be loaded from or stored to by a subsequent rendering pass;
allocate, in response to a determination that the received buffer attachment will not be loaded from or stored to, sufficient memory space on the on-chip memory;
render, in response to the determination that the received buffer attachment will not be loaded from or stored to, the received buffer attachment without allocating memory on the system memory for the received buffer attachment;
and generate, in response to a determination that the received buffer attachment will be loaded from or stored to; an error message indicating that memory must be allocated on the system memory for the received buffer attachment.
2. The method of
3. The method of
4. The method of
5. The method of
dynamically allocating sufficient memory space on the on-chip memory.
6. The method of
monitoring a remaining available memory space on the on-chip memory;
suspending the rendering of the received buffer attachment when the remaining available memory space on the on-chip memory reaches a specific threshold;
allocating, in response to suspending the rendering of the received buffer attachment, additional memory space on the on-chip memory for the rendering of the received buffer attachment; and
resuming, in response to allocating additional memory space on the on-chip memory, the rendering of the received buffer attachment.
7. The method of
9. The non-transitory computer readable medium of
10. The non-transitory computer readable medium of
11. The non-transitory computer readable medium of
12. The non-transitory computer readable medium of
13. The non-transitory computer readable medium of
monitor a remaining available memory space on the on-chip memory;
suspend the instructions that, when executed, cause the one or more processors to render the received buffer attachment when the remaining available memory space on the on-chip memory reaches a specific threshold;
allocate, in response to suspension of the instructions that, when executed, cause the one or more processors to render the received buffer attachment, additional memory space on the on-chip memory to perform the instructions that, when executed, cause the one or more processors to render the received buffer attachment; and
resume, in response to the allocation of additional memory space on the on-chip memory, the instructions that, when executed, cause the one or more processors to render the received buffer attachment.
15. The apparatus of
16. The apparatus of
17. The apparatus of
18. The apparatus of
19. The apparatus of
monitor a remaining available memory space on the on-chip memory;
suspend the execution of the program code stored in the system memory to render the received buffer attachment when the remaining available memory space on the on-chip memory reaches a specific threshold;
allocate, in response to suspension of the execution of the program code stored in the system memory to render the received buffer attachment, additional memory space on the on-chip memory to execute the program code stored in the system memory to render the received buffer attachment; and
resume, in response to the allocation of additional memory space on the on-chip memory, the execution of the program code stored in the system memory to render the received buffer attachment.
20. The apparatus of
|
This application claims the benefit of U.S. Provisional Patent Application No. 62/349,024; filed Jun. 12, 2016; and entitled TEXTURE NOT BACKED BY REAL MAPPING; the entire contents of which is incorporated herein by reference.
This application claims the benefit of U.S. Provisional Patent Application No. 62/444,279; filed Jan. 9, 2017; and entitled ELIMINATING OFF SCREEN PASSES USING MEMORYLESS RENDER TARGET; the entire contents of which is incorporated herein by reference.
The inventions disclosed herein relate to the field of graphic processing. More specifically, but not by way of limitation, it relates to memory allocation techniques for use by graphic processing units in rendering graphic data for display.
Computers and other computational devices typically have at least one programmable processing element that is generally known as a central processing unit (CPU). They frequently also have other programmable processors that are used for specialized processing of various types, such as graphic processing operations which are typically called graphic processing units (GPUs). GPUs generally comprise multiple cores or processing elements designed for executing the same instruction on parallel data streams, making them more effective than general-purpose CPUs for algorithms in which processing of large blocks of data is done in parallel. In general, a CPU functions as the host and hands-off specialized parallel tasks to the GPUs.
Vendors and standards organizations have created application programming interfaces (APIs) that make graphic data-parallel tasks easier to program because of the high level of developer programming interaction. Graphic application executed on the computational devices convey description of a graphic scene by invoking application programming interface (API) calls to GPUs in order to render an image for display.
Sometimes several rendering passes may be employed prior to committing a frame buffer's content for display. The multiple rendering passes are employed to incrementally move the data toward its displayable format. For example, effects such as lighting, shadows, reflections, specular illumination may be sequentially applied to the same graphic element. An on-chip memory may be used by a GPU to store the intermediate results temporarily while the data is also backed up in the system memory. Later rendering passes in a sequence of rendering passes may access the intermediate results stored in the system memory for further computation. As the foregoing application demonstrates, a more efficient memory allocation approach is needed in rendering graphic data by GPUs.
One disclosed embodiment includes memory allocation methods for use by a graphic processing unit in rendering graphic data for display. The method includes receiving a buffer attachment associated with a first rendering pass, where the hardware prerequisites for operation of the first rendering pass are determined. The method also includes receiving an indication to not allocate system memory for the received buffer attachment. Thereafter, it may be determined whether the received buffer attachment will be loaded from or stored to by the subsequent rendering passes. If it is determined that the buffer attachment will be accessed by a subsequent rendering pass, an error message may be generated indicating that system memory must be allocated. If it is determined that a subsequent rendering pass will not access the buffer attachment, the buffer attachment is rendered without allocating system memory.
In one embodiment, in response to the determination that the subsequent rendering passes do not access the received buffer attachment, memory space in system memory is dynamically allocated for rendering the received buffer attachment. In one embodiment, to avoid partial rendering, the method includes monitoring a remaining allocated space in the system memory, suspending the rendering of the received buffer attachment when the remaining available memory space reaches a specific threshold, allocating additional memory space for the rendering of the received buffer attachment, and resuming the rendering of the received buffer attachment.
A graphic processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer typically intended for output to a display. A GPU is efficient at manipulating computer graphic and has a highly parallel structure that makes it more efficient than a general-purpose computer processor (CPU) where processing of large blocks of data is done in parallel.
Embodiments described in more details below provide a more efficient memory allocation technique in processing graphic data for display. More specifically, an embodiment of the disclosed subject matter describes allocating only on-chip memory, without a system memory backup, for the buffer attachments that are renderable in one rendering pass. Features of the disclosed subject matter allow a reduction of the bandwidth traffic and memory usage in rendering graphic data by GPUs.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the invention. References to numbers without subscripts or suffixes are understood to reference all instance of subscripts and suffixes corresponding to the referenced number. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.
As used herein, the term “a computer system” can refer to a single computer system or a plurality of computer systems working together to perform the function described as being performed on or by a computer system. Similarly, a machine-readable medium can refer to a single physical medium or a plurality of media that may together contain the indicated information stored thereon. A processor can refer to a single processing element or a plurality of processing elements, implemented either on a single chip or on multiple processing chips.
It will be appreciated that in the development of any actual implementation (as in any development project), numerous decisions must be made to achieve the developers' specific goals (e.g., compliance with system- and business-related constraints), and that these goals may vary from one implementation to another. It will also be appreciated that such development efforts might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the design an implementation of systems having the benefit of this disclosure and being of ordinary skill in the design and implementation of computing systems and/or graphic systems.
Referring to
As illustrated in
Computer System 100 may also have Device Sensors 124, which may include one or more of: depth sensors (such as a depth camera), 3D depth sensor(s), imaging devices (such as a fixed and/or video-capable image capture unit), RGB sensors, proximity sensors, ambient light sensors, accelerometers, gyroscopes, any type of still or video camera, LIDAR devices, Global Positioning Systems (GPS), microphones, CCDs (or other image sensors), infrared sensors, thermometers, etc. These and other sensors may work in combination with one or more GPUs, digital signal processors (DSPs), or conventional microprocessors along with appropriate programming so the sensor outputs may be properly interpreted and/or combined and interpreted.
Device Sensors 124 may capture contextual and/or environmental phenomena such as time; location information; the status of the device with respect to light, gravity, a magnetic field (e.g., a magnemometer); and even still and video images. In addition, network-accessible information, such as weather information, may also be used as part of the context. All captured contextual and environmental phenomena may be used to provide context to user activity or information about user activity. For example, in accessing a gesture or the expression or emotion of a user, the contextual information may be used as part of the contextual analysis. Computer System 100 may react to environmental and contextual actions and reflect a reaction in real-time on the display system through use of the Graphic Hardware 106.
Where volatile RAM is included in Memory 112, the RAM is typically implemented as dynamic RAM (DRAM), which requires continuous power in order to refresh or maintain the data in the memory. Graphic Hardware 106 may be special purpose computational hardware for processing graphic and/or assisting Processor 116 in performing computational tasks. In some embodiments, Graphic Hardware 106 may include CPU-integrated graphic and/or one or more programmable GPUs.
Storage Device 114 is typically a magnetic hard drive, an optical drive, a non-volatile solid-state memory device, or other types of memory systems, which maintain data (e.g. large amounts of data) even after power is removed from the system (i.e., non-volatile). While
Turning now to
In one example, Desktop Workstation 210 may be a developer system, distributing a graphic application to Server 230, which in turn may distribute the graphic application to multiple devices 212, 214, and 216, each of which may employ a different GPU as well as other different components. Upon launch of the graphic application, one action performed by the application can be creation of a collection of pipeline objects that may include state information, fragment shaders, and vertex shaders.
As noted above, embodiments of the subject matter disclosed herein include software. As such, a description of common computing software architecture is provided as expressed in a layer diagram in
Returning to
Referring again to
Above the O/S services layer 385 is an Application Services layer 380, which includes SpriteKit 361, Scene Kit 362 Core Animation 363, and Core Graphic 364. The O/S services layer represents higher-level frameworks that are commonly directly accessed by application programs. In some embodiments of this disclosure the O/S services layer may include graphic-related frameworks that are high level in that they are agnostic to the underlying graphic libraries (such as those discussed with respect to layer 385). In such embodiments, these higher-level graphic frameworks are meant to provide developer access to graphic functionality in a more user- and developer-friendly way and to allow developers to avoid work with shading and graphic primitives. By way of example, SpriteKit 361 is a graphic rendering and animation infrastructure made available by Apple Inc. SpriteKit 361 may be used to animate two-dimensional (2D) textured images, or “sprites.” Scene Kit 362 is a 3D-rendering framework from Apple Inc. that supports the import, manipulation, and rendering of 3D assets at a higher level than frameworks having similar capabilities, such as OpenGL. Core Animation 363 is a graphic rendering and animation infrastructure made available from Apple Inc. Core Animation 363 may be used to animate views and other visual elements of an application. Core Graphic 364 is a two-dimensional drawing engine from Apple Inc. Core Graphic 365 provides 2D rendering for applications.
Above the application services layer 380, there is the application layer 375, which may comprise any number and type of application programs. By way of example,
In evaluating O/S services layer 385 and applications services layer 380, it may be useful to realize that different frameworks have higher- or lower-level application program interfaces, even if the frameworks are represented in the same layer of the
With reference again to
Referring now to
Data Bus 405 connects different elements of the Computing System 400 including CPU 401, System Memory 430, and Graphic Processing System 403. In an embodiment, System Memory 430 includes instructions that cause CPU 401 and/or Graphic Processing System 403 to perform the functions ascribed to them in this disclosure. More specifically, Graphic Processing System 403 can receive instructions transmitted by CPU 401 and processes the instructions to render and display graphic images on Display 402.
System Memory 430 may include Application Program 431 and GPU Driver 432. In an embodiment, Frame Buffer 424 is also located on System Memory 430. In an embodiment, Application Program 431 includes code written using an application programming interface (API). API includes a predetermined, standardized set of commands that are executed by associated hardware. Application Program 431 generates API commands to render an image by one or more shading engines of GPU 420 for display. GPU Driver 432 translates the high-level shading programs into machine code shading programs that are configured for each of the shading engines, e.g. Vertex Shader 421, Geometry Shader 422, and Fragment Shader 423.
Graphic Processing System 403 includes GPU 420, On-Chip Memory 425 and Frame Buffer 424. In one embodiment, CPU 401 transmits API commands to GPU 420 to render graphic data and store rendered images in Frame Buffer 424 to be displayed on Display 402. In an embodiment, a frame of graphic data is divided into multiple tiles. Each tile may be rendered to On-chip Memory 425 space by GPU 420. Upon completion of all tiles of a frame, Frame Buffer 424 may output the image to Display 402.
GPU 420 can include a plurality of multiprocessors that are configured to execute a large number of threads in parallel. In an embodiment, each of the multiprocessors are configured as a shading engine that includes one or more programmable shaders. Each shader engine executes a machine code shading program to perform image rendering operations. In an embodiment according to
In one embodiment, deferred rendering techniques may be used to render images. In deferred rendering the step of shading pixels is decoupled from geometry computations. In the first stage, geometrical information (position vectors, color vectors, normal vectors and/or specular values) of an image is computed in a pixel-by-pixel basis and stored on On-chip Memory 425 (G-buffer). Next, using the stored geometrical information a deferred shader can operate on each of the pixels of the image just before displaying a scene.
The On-Chip Memory 425 is typically used to store shading data. On-chip Memory 425 provides fast access and reduces latency of the shading engines in the pipeline. However, On-chip Memory 425 takes up valuable die area and it is relatively expensive in terms of geometry.
Sometimes several rendering passes may be employed prior to committing content to Frame Buffer 424 for display. The multiple rendering passes are performed to incrementally move the data toward its displayable form. For instance, effects such as lighting, shadows, reflections, and specular illumination may be sequentially applied to the same graphic element. Alternatively, the output of a first rendering pass is transmitted to a second rendering pass for further computation. Typically, On-chip Memory 425 may be used to temporarily store rendering results of a rendering pass. This data is then backed up in System Memory 430. According to one embodiment, render results do not need to be backed up to System Memory 430 when data in On-chip Memory 425 does not need to be propagated from one rendering pass to the next. Reducing the usage of the system memory 430 not only improves bandwidth usage, it also saves memory space. Various embodiments described in further details below disclose methods for allocating system memory only where it is necessary.
Dependent Rendering Passes
In an embodiment, a frame of graphic data is divided into multiple tiles. Each tile may be rendered in one or more rendering passes by GPU 420 into one or more render targets. In general, a “render target” or a “buffer attachment” is an allocated memory space in which the GPU draws pixels for an image being rendered. On-chip memory 425 is usually large enough to hold data associated with one tile at each given time. An image may be composed of content from a plurality of render targets. For example, the GPU 420 may render a plurality of buffer attachments comprising texture data, color data, depth data, and stencil data and integrate the content to generate an output image. In one embodiment of the disclosed subject matter, a sequence of dependent rendering passes may be performed in processing a graphic element. Referring to FIG. 5, the dependency between two Rendering Passes A and B are illustrated. Rendering Pass A may render Buffer Attachments 501-504. The buffer attachments may hold certain type of graphic data. For example, Rendering pass A may render multiple attachments for color, depth, or stencil texture. In an embodiment, Rendering Pass B may rely on at least some of the outputs generated by Rendering pass A. The Render Targets that will be accessed by the subsequent rendering passes must be backed up in System Memory 530. For instance, in case of creating a shadow map of a scene, first a single depth map is generated. Subsequently, an image is created with multi-pass rendering, generating a shadow map for each light source. Therefore, the generated depth map must persist to be accessed by later rendering passes.
Initially, Rendering Pass A performs its computations in On-chip Memory 540. However, On-chip Memory 540 is not large enough to permanently store the rendered data. At each given time, a tile's worth of data is stored in On-chip Memory 540. Render targets are conventionally backed up from On-chip Memory 540 to System Memory 530. Intermediate data (e.g., depth map information) stored in System Memory 530 may be accessed during subsequent rending passes to add lighting, for example. However, there may be data backed up in System Memory 530 that is not required for subsequent rendering passes. The data that is rendered in a single pass, i.e. it is not loaded from or stored to by subsequent passes, does not need to be stored in System Memory 530. For example, a depth buffer is used to ensure the visibility of fragments, primitives and geometry is respected. So the depth buffer is needed to produce the right image but often times it is not needed by the subsequent rendering passes. A depth buffer can be an example of a render target with no need for a system memory allocation.
When single pass rendering targets only exist in On-chip Memory they are called “memory-less render targets” herein. In an embodiment, memory-less render targets may be flagged so no memory space in the system memory is allocated for them. In an embodiment, memory-less render targets are identified by programmers using API commands. The system may validate, at render pass creation, whether the memoryless flag is appropriate. In an embodiment, the graphic processing operation disclosed will automatically perform a dependency analysis of different rendering passes in order to determine whether a render target is appropriately flagged as memory-less render target. In an embodiment, memory-less render targets are identified on a per-attachment basis when creating a render target. The memory-less render target is then attached to a rendering pass as an attachment point.
Referring back to
Referring to
Conventionally, the amount of memory allocated for rendering operations are predetermined. Sometimes the allocated memory space is too small to complete the operation because predicting required memory space with perfect precision is difficult. Therefore, a sequence of partial rendering operations are adopted. The partial rendering results are frequently stored and accessed using system memory. Such operation is referred to as splitting the process by hardware, which is very expensive and undesirable.
For example, a system architecture may adopt two phases of execution: 1—vertex level processing and 2—pixel level processing. In the first step, all the geometry is classified into the system memory. The part of the system memory allocated for this purpose may also be called “pram buffer”. The size of the pram buffer is determined in advance. In the second step, the geometry is read from the pram buffer and converted into pixels. After rasterization, shaders are invoked to shade the pixels. When the pram buffer size is not sufficient to complete the vertex processing, the system conventionally pauses and begins the pixel processing to open up memory space. Once the pram buffer opens up, the vertex processing resumes. Therefore, in these circumstances, we needed to store the intermediate render targets back in the system memory.
However, when render targets are flagged as memory-less render targets, no system backing memory is available. Memory-less render targets may not be committed to the partial rendering operations but instead a sufficient amount of memory space must be provided to ensure that the entire rendering operation is completed in one pass. In an embodiment, a more aggressive approach in allocation of memory space is adopted to ensure a sufficient amount of memory space is available to complete the operation in a single rendering pass. The predetermination of allocated memory may be based on similar operations previously performed by the GPU. In other embodiments, instead of allocating a large memory space in advance, a method of dynamically expanding memory space is employed.
Dynamic Memory Allocation
Referring to
At stage 610, the GPU begins rendering buffer attachments. Rendering operation may be directed to any of lighting, shadows, reflections, and specular illumination of graphic processing.
At stage 615, the allocated memory for the memory-less render target is regularly monitored during the rendering operation. At stage 620, the remaining available memory space is compared to a specified threshold level. If the available memory space is sufficient, the operation proceeds back to stage 615. However, if the available memory space reaches the threshold level, the operation will proceed to stage 625.
Upon the determination that the available memory space is not sufficient to complete the rendering of the memory-less render target in one pass, at stage 625, the rendering operation may be paused. The allocated memory space is increased at stage 630 and subsequently the rendering operation is resumed at stage 635. Therefore, in such a scenario there is no need for storing intermediate results in a system memory. In an embodiment, the memory space increase must be sufficient to complete the rendering operation. In an embodiment, the duration of the pause at stage 625 is very short such that no interruption in the rendering operation occurs. In other embodiment, when the remaining memory space at stage 620 is determined to be within a threshold, the allocated memory space is increased without a pause, i.e. no step 625.
Upon receiving an indication from hardware at stage 620 on shortage of the allocated memory space, the firmware on GPU 403 in
In an embodiment, the memory increase is performed in increments. So after the rendering is resumed, the operation will proceed to stage 640 in order to determine whether the rendering operation is complete. If the rendering operation is not finished, the process goes back to stage 615 to monitor available memory space in on-chip memory. The operation is repeated as many time as necessary until the rendering operation is complete. When the rendering operation is complete then the next rendering operation may begin.
Multi-Sample Anti-Aliasing
In some embodiments of the disclosed subject matter, memory-less render targets may be used for multi-sample anti-aliasing (MSAA) data. Real-world objects that are being captured in images typically have continuous surfaces, smooth curves, and unbroken lines. However, in a display, images are displayed by integrating discrete pixels. Each pixel contains a uniform color and shape. As a result, sometimes representation of real-world objects by integrating pixels may result in images containing jagged edges.
In MSAA techniques, multiple samples (e.g., a factor of 4, 8, 16, or other value) may be generated for a single pixel. A “pixel”, as used here, refers to a single fragment (point) in a graphic image. A “sample”, as used here, may refer to a single value intended to represent the whole pixel. In an embodiment, a sample may be a color value representing the color of a pixel in the graphic image. In other examples, a sample may be a depth value. MSAA samples may then be combined (e.g., averaged) to generate a resolve attachment representing a final pixel value in the graphic image.
For instance, a 4 k resolution screen (3840×2160 pixels) where each pixel is 4 bytes (e.g., a single RGBA value) requires 32 MB of storage. In a four sample MSAA mode, four Red, Green, Blue, and Alpha samples may be associated with each pixel. Therefore, 128 MB storage may be necessary. In addition to the color samples, there may be other sampling for depth and stencil. As such, MSAA data typically requires a larger memory space and higher bandwidth.
Conventionally, a system memory is allocated to back up both the MSAA sample attachments and the resolve attachment. However, it may be possible to render sample attachments and resolve attachments in a single pass. Upon the completion of the rendering pass, the outcome will be written into the resolve attachment and not any of the sample attachments. Therefore, in one embodiment, a memory-less flag may be used for MSAA sample attachments because they will not later be loaded from or stored to memory. As such, the MSAA sample attachments need only exist in on-chip memory while there will be system backing memory for resolve attachments.
Memory-Less Render Targets
At stage 705, graphic data is received by GPU 420 for processing. In an embodiment, a frame of graphic data may be divided into multiple tiles. Each tile may be rendered in one or more passes by GPU 420. For example, the GPU 420 may render a plurality of buffer attachments comprising texture data, color data, depth data, and stencil data and assemble the content to produce a final scene. Referring to
At stage 710, Computing System 400 verifies whether hardware pre-requisites for a rendering pass are met. One pre-requisite may be that buffer attachments of the rendering passes must fit within the allocated space in On-chip Memory 425. In an embodiment, if a rendering pass fails to be compatible with the system hardware, an error message is issued at stage 715. If the rendering pass requirements are compatible with available hardware then the system will proceed with creating render targets.
At stage 720, GPU 420 receives an instruction to not create a memory system backing for a render target (memory-less render target). In an embodiment, memory-less render targets are identified on a per-attachment basis. In an embodiment, users identify buffer attachments that do not need to be saved in System Memory 530 using API commands.
At stage 725, GPU 420 determines whether memory-less flag was properly designated to the render targets. The memory-less render targets will only exist in On-chip memory 425. Therefore, GPU 420 must make sure no other subsequent rendering pass relies on them. The buffer attachments that are rendered in one pass may not require system backing memory. Therefore, buffer attachments that will be loaded from or stored to by the subsequent rendering passes may not be designated as memory-less render targets. If the memory-less flag is incorrectly designated to the render target, an error message may be issued to the user at stage 715. If memory-less flag is correctly designated to the render target, the system will proceed with creating the render target in On-chip Memory 425.
At stage 730, it is determined whether the allocated system memory (e.g., pram buffer) for rendering the memory-less render target is sufficient. As explained previously with reference to
Finally, at stage 735, the buffer attachments are fully rendered. In case of the memory-less render targets, the buffer attachments may be rendered in a single rending pass. The render targets are ready to be committed from On-chip Memory to the frame buffer for display.
Merging Rendering Passes
As previously explained, sometimes several rendering passes may be employed prior to committing a frame buffer's content for display. The multiple rendering passes may be employed to incrementally move the data toward its displayable format. For example, referring to
Switching between several passes in an operation can be expensive and inefficient. Every time the system switches from one rendering pass to another, it stops rendering to the frame buffer and instead renders a texture that is used by subsequent passes. To increase the bandwidth and provide for faster performance, embodiments of this disclosure describe reducing the number of rendering passes. Referring back to
In the example of a user interface noted above, Combined Rendering Pass 820 could render the user interface in one pass. Initially, Render Target 831 may capture intermediate data. In other embodiments, there may be multiple render targets designated to capture the intermediate data. Combined Rendering Pass 820 may read from Render Target 831 while it is being written on to simultaneously. Therefore, to generate the output, Combined Rendering Pass 820 accesses the intermediate data captured by Render Target 831 (pixel-by-pixel) to perform other graphic operations and generate the output render target, for example, Render Target 832. Render Target 832 may then be committed to the frame buffer for displaying the user interface.
In an embodiment, the one or more render targets used to capture intermediate data may be flagged as memory-less render targets. In the example above, Render Target 831 captures the intermediate data, which is used to generate the user interface rendered by Combined Rendering Pass 820. However, Render Target 831 is never accessed again by the subsequent rendering passes and therefore does not need to be backed up in the system memory. In this way, combining multiple rendering passes into one rendering pass provides faster and more efficient processing. In addition, designating the intermediate render target(s) (i.e., render targets in which intermediate results are stored) as memory-less render targets saves memory space.
Rendering Rounded Corner Icons on a User Interface
Referring to
Referring to
However, in the method described above, several dependent rendering passes are used. The render targets are not committed to the frame buffer until all rendering passes are performed. The render targets generated at each step are consumed subsequently by a later a rendering pass, therefor it necessitates system memory.
To improve the efficiency, several of the graphic processing operations described above with respect to
In an embodiment, Combined Rendering Pass 910 generates a background, specifies the locations corresponding to the corners of the icons, and finally blends or combines the icons with the background in a single pass. Combined Rendering Pass 910 merges Rendering Passes 901, 902, and 903. In one embodiment, at least one render target may be designated by Combined Rendering Pass 910 to store intermediate data. Here, Render Target 921 is used as a scratch pad to save the pixel values corresponding to where the corners of the icons are located. Render Target 921 can be flagged as a memory-less render target since it does not require system backing memory.
In an embodiment, the icons may be combined with the background layers using programmable blending. In one embodiment, the square icons are drawn onto Render Target 920 on top of the background. To clip the icons' square corners, Combined Rendering Pass 910 reads pixel values one by one from Render Target 921. If the pixel belongs to a position outside the icons, it can be placed as a top layer above the square icons on Render Target 920, creating a rounded shape. If the pixel belongs to a position within the icons, the value is obtained from Render Target 920. Render Target 920 is then committed (e.g., stored) to the frame buffer for display.
In one or more embodiments, Render Target 921 can be flagged as a memory-less render target therefore requiring no system backing memory. Since Render Target 921 only exists in on-chip memory in such an embodiment, no memory traffic goes beyond the on-chip memory therefore improving the efficiency of the graphic operation.
Combined Rendering Pass to Perform Multiple Graphic Operations
Referring to
At stage 1010, it is determined whether the series of graphic processing operations are in a sequence. A series of operations are in a sequence if the output generated by a first operation is consumed by the subsequent operation(s). For example, in displaying a user interface, a first operation is in series or sequence with a second operation, when the second operation receives and uses the intermediate result generated by the first operation. Dependent rendering passes as described previously in this disclosure are in a sequence.
At stage 1015, a series of graphic processing operations are defined or designated to be performed in a single rendering pass. In an embodiment, the single rendering pass renders graphic data into multiple render targets. The render targets could be, for example, color attachments, depth buffers, or stencil buffers.
At stage 1020, at least one of the multiple render targets may be designated to store intermediate data. The intermediate data is graphic data required to complete the rendering pass. Such a render target, however, will be consumed within the same rendering pass and will not be accessed again by any subsequent rendering pass. Because of this, the render target can be flagged as a memory-less render target at stage 1025. As such, no system backing memory need be allocated for the render target. In some embodiments, a plurality of render targets may be designated to store intermediate data.
At stage 1030, the one or more render targets designated to store intermediate data may be accessed to generate the output. In one embodiment, each pixel of a render target designated to hold intermediate data may be sampled based on its x-y location coordinates. In an embodiment, only the pixel corresponding to the x-y coordinate need be sampled and not the neighboring pixels. Sampled pixels are combined with data from other render targets to generate the output. At stage 1035 the output is committed (e.g., stored) to the frame buffer for display.
Referring to
At stage 1110, the programmable platform can be configured to designate at least one of the plurality of render targets for storing intermediate data. While the intermediate data is never saved to the frame buffer for display, it is necessary for the rendering pass to generate the desired output. In an embodiment, a plurality of rendering passes are designated to store intermediate data. In one embodiment, the designated render targets store geometric properties such as depth, position, surface normal information, and specular coefficients. In another embodiment, the designated render target may store lighting properties. In the example of
In an embodiment, the programmable platform receives an indication that the designated render target is a memory-less render target. One procedure with regards to verification of the memory-less flag is explained with reference to
At stage 1125, the graphic processing system starts sampling from the designated render target. The samples from the designated render target may be combined (e.g., blended) with data from other render target(s) at stage 1125 to generate the output. In an embodiment, the blending occurs pixel-by-pixel, where at each given time the pixel addressed by a specific x-y location coordinate is sampled. In the example of
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Avkarogullari, Gokhan, Imbrogno, Michael, Swift, Michael J.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
9053040, | Jan 12 2011 | Intel Corporation | Filtering mechanism for render target line modification |
9183651, | Oct 06 2010 | Microsoft Technology Licensing, LLC | Target independent rasterization |
9280956, | Nov 29 2012 | Qualcomm Incorporated | Graphics memory load mask for graphics processing |
20030076329, | |||
20030169269, | |||
20060204119, | |||
20090109219, | |||
20120306912, | |||
20130328895, | |||
20140184626, | |||
20150029203, | |||
20150049104, | |||
20160133029, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 04 2017 | IMBROGNO, MICHAEL | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 043365 | /0225 | |
Apr 06 2017 | SWIFT, MICHAEL J | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 043365 | /0225 | |
May 04 2017 | Apple Inc. | (assignment on the face of the patent) | / | |||
Aug 15 2017 | AVKAROGULLARI, GOKHAN | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 043365 | /0225 |
Date | Maintenance Fee Events |
Nov 23 2022 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Jun 11 2022 | 4 years fee payment window open |
Dec 11 2022 | 6 months grace period start (w surcharge) |
Jun 11 2023 | patent expiry (for year 4) |
Jun 11 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jun 11 2026 | 8 years fee payment window open |
Dec 11 2026 | 6 months grace period start (w surcharge) |
Jun 11 2027 | patent expiry (for year 8) |
Jun 11 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jun 11 2030 | 12 years fee payment window open |
Dec 11 2030 | 6 months grace period start (w surcharge) |
Jun 11 2031 | patent expiry (for year 12) |
Jun 11 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |