Archive:GSoC - GPU Assisted Video Decoding: Difference between revisions

From Official Kodi Wiki
Jump to navigation Jump to search
>Pike
No edit summary
m (web link corrections for xbmc -> kodi)
(20 intermediate revisions by 7 users not shown)
Line 1: Line 1:
GPU Hardware Accelerated Video Decoding of MPEG-4 encoded video
{{mininav|[[Development]]|[[:Category:Google Summer of Code]]}}
{{info|This page describes a previous GSoC project. '''For the most recent GSoC pages please check [[:Category:Google Summer of Code]]'''.}}


**Proposed by student: '''Rudd'''
GPU Hardware Accelerated Video Decoding of H.264 (MPEG-4 part-10, a.k.a. MPEG-4 AVC) encoded video.
**Proposed primary mentor: [[User:D4rk|D4rk]]
**Proposed backup mentor: [[User:Elupus|Elupus]]


=GSoC proposal for 2008=
** Proposed by student: '''Rudd'''
** Proposed primary mentor: [[User:D4rk|D4rk]]
** Proposed backup mentor: [[User:Elupus|Elupus]]
 
 
* XBMC community forum discussion threads:
** [http://forum.kodi.tv/showthread.php?t=33802 GPU assisted H.264 Decoding (GSoC Project 2008)]
 
= GSoC proposal for 2008 =
This is a discussion about a proposal for XBMC's [[Google Summer of Code 2008]]
This is a discussion about a proposal for XBMC's [[Google Summer of Code 2008]]


==Current Draft==
== Current Draft ==


===Name===
=== Name ===
Robert Rudd (IRC & XBMC.org username: Rudd)
Robert Rudd (IRC & XBMC.org username: Rudd)
===E-Mail===
=== E-Mail ===


===Project Title===
=== Project Title ===
'''GPU Hardware Accelerated Video Decoding of MPEG-4 encoded video'''
'''GPU Hardware Accelerated Video Decoding of MPEG-4 encoded video'''


===Benefits===
=== Benefits ===
#This project will improve the end-user experience tremendously. XBMC is a prime choice for many users who have an HTPC. These Machines traditionally don't have excessive amounts of processing power. Any computations that could be offloaded to unused hardware would be ideal.
# This project will improve the end-user experience tremendously. XBMC is a prime choice for many users who have an HTPC. These Machines traditionally don't have excessive amounts of processing power. Any computations that could be offloaded to unused hardware would be ideal.
# Exploring new territory: Much talk has been given to the use of GPUs for general purpose computation. An actual implementation of a video decoder in a non-GPU specific manner would be quite the achievement!
# Exploring new territory: Much talk has been given to the use of GPUs for general purpose computation. An actual implementation of a video decoder in a non-GPU specific manner would be quite the achievement!




===Project Abstract===
=== Project Abstract ===
It is not atypical for a modern computer to have 2 primary processing
It is not atypical for a modern computer to have 2 primary processing
units: a Central Processing Unit(CPU) - capable on the order of 1 GFLOPS and a Graphics Processing Unit(GPU) - capable on the order of 100 GFLOPS. Despite this, when playing a high definition H.264 video, which requires on the order of 10's of millions of operations per second, only the CPU is used. Even accounting for the fact that the GPU is highly optimized for floating point operations and the H.264 codec strives to avoid floating point operations, it is still quite obvious there is a large amount of processing power that is being left unused.
units: a Central Processing Unit (CPU) which is capable on the order of 1 GFLOPS, and a Graphics Processing Unit(GPU) which is capable on the order of 100 GFLOPS. Despite this, when playing a high definition H.264 video, which requires on the order of 10's of millions of operations per second, only the CPU is used. Even accounting for the fact that the GPU is highly optimized for floating point operations and the H.264 codec strives to avoid floating point operations, it is still quite obvious there is a large amount of processing power that is being left unused.


I propose that hardware H.264 decoding be integrated into XBMC. While, I plan on spending some of the time leading up to this summer researching the ideal architecture to use when implementing this, I am currently leaning towards an OpenGL Shader Language(GLSL) implementation to keep it as portable and vendor-neutral as possible.
I propose that hardware assisted H.264 decoding be integrated into XBMC. While, I plan on spending some of the time leading up to this summer researching the ideal architecture to use when implementing this, I am currently leaning towards an OpenGL Shader Language (GLSL) implementation to keep it as portable and vendor-neutral as possible.


===Detailed Description===
=== Detailed Description ===
This proposal outlines an implementation of hardware H.264 decoding for the XBMC media player. This project can be broken into 2 primary components: architecture research and implementation.
This proposal outlines an implementation of hardware H.264 decoding for the [[DVDPlayer]] (XBMC's own in-house developed video-player based on FFmpeg). This project can be broken into two primary components; architecture research and implementation.




====Architecture Research====
==== Architecture Research ====
While, it pains me to include something as nebulous as "research" into my project proposal, I do feel that the time spent researching alternatives is necessary and will be well spent. Using a GPU for general purpose computing (the GPGPU movement) is a rather new field of inquiry. There are very few accepted standards, with each vendor proposing their own solutions. Thus far possible initiatives for GPU accelerated video coding/decoding include: Intel's VAAPI, Nvidia's CUDA/FreeVideo, and ATI’s Avivo/CTM. This is in addition to any video capabilities within the DirectX and OpenGL APIs and several GPGPU libraries such as Brook and lib sh.
While, it pains me to include something as nebulous as "research" into my project proposal, I do feel that the time spent researching alternatives is necessary and will be well spent. Using a GPU for general purpose computing (the GPGPU movement) is a rather new field of inquiry. There are very few accepted standards, with each vendor proposing their own solutions. Thus far possible initiatives for GPU accelerated video coding/decoding include: Intel's VAAPI, Nvidia's CUDA/FreeVideo, and ATI’s Avivo/CTM. This is in addition to any video capabilities within the DirectX and OpenGL APIs and several GPGPU libraries such as Brook and lib sh.


Line 39: Line 46:




====Implementation====
==== Implementation ====
While the above research is not to be underestimated, the core of the project will be the implementation of hardware acceleration in whatever architecture I choose. Over the course of the summer, I hope to offload as much of the H.264 processing workload as I can onto the GPU. I will concentrate on the following 2 areas:
While the above research is not to be underestimated, the core of the project will be the implementation of hardware acceleration in whatever architecture I choose. Over the course of the summer, I hope to offload as much of the H.264 processing workload as I can onto the GPU. I will concentrate on the following 2 areas:


=====Inverse Discrete Cosine Transform and Inverse Quantization=====
===== Motion Compensation =====
Motion Compensation is one of the most computationaly intensive portions of the H.264 decoding process. Due to the fact that H.264 supports up to quarter-pixel interpolation there may be an enormous amount of computation for a single frame.
Additionally, since each macroblock is not only allowed to point to a any point on a given reference picture, but can also point to different reference picture, this can cause havoc on the system's cache.


The Discrete Cosine Transform is the heart of any image/video decoding/encoding operation. H.264 decoding is no exception. A large portion of the operations used to decode an H.264 stream is to calculate the IDCT of a given block. I feel this should be the first area of concentration not only because it makes up a large amount of the computations, but IDCT seems like the most suited operation for a video card, being that it is essentially a matrix operation on a large set of floating point numbers.
However, the motion compensation is parallelizable across macroblocks. Each macroblock can be interpolated separately of other macroblocks, presenting a problem that is easily mapped onto the GPU.


The approach that will be taken will most likely be one of the 2 presented in the following papers:


=====Interframe Prediction and Motion Compensation=====
* [http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PSISDG006696000001669606000001&idtype=cvips&gifs=yes Performance evaluation of H.264/AVC decoding and visualization using the GPU]
While, the DCT is the key to sufficiently compressing a single frame, inter-frame techniques are important to reduce the amount of temporal redundancy. One of the primary techniques in this is Motion Estimation, in which a given block is assumed to have moved a small distance, represented by a motion vector.


A video decoder must apply these motion vectors to the blocks of a frame in a process called Motion Compensation. Using a frame and a set of motion vectors, the decoder is able to generate a large portion of the following frame by moving the block to the position denoted by the motion vector.
Outlines a 6 pass strategy - 1 for Fullpel, 2 for Halfpel, 3 for Quarterpel. A quad is rendered for each block/subblock in the frame. the motion vector is repeated at each of it's vertices, and the vertex shader translates the texture coordinates apropriately. The vertex shader also handles pushing irrelevant blocks out of hte viewing frustum to reduce computation. Afterwards a fragment shader is used to perform the actual interpolation


This is a rather complex interpolation problem. Due to the number of blocks an HD frame can have, combined with the ability of H.264 to perform sampling at the quarter-pixel level, inter-frame prediction compromises an enormous amount of computations. It is not immediately clear at the time of this writing how well it will translated into GPU code(it is probably directly related to the architecture I choose), I feel effort should be put into offloading this work to the GPU.


Other areas of optimization could include: intra-frame prediction, deblocking, and bitstream processing. However, I have not yet thoroughly researched these areas.
* [http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PSISDG00669600000166960X000001&idtype=cvips&gifs=yes Real-time high definition H.264 video decode using the Xbox 360 GPU]


Outlines a 3 pass strategy - 1 for fullpel, 1 for off-center interpolations, 1 for center interpolations. Similar approach as above, but batches the blocks differently.


===Deadlines and Deliverables===
 
This is only a very tentative outline. Any given module may be completed in a longer or shorter period. It simply serves as a guideline for what I hope to achieve and the order I plan to set about doing it. At the very least I am hoping I can get the two primary modules (IDCT and inter-frame prediction) implemented and integrated with the SVN repository over the summer.
 
 
===== Inverse Transformation and Reconstruction =====
 
The H.264 codec uses a frequency transform similar to the IDCT, but based on integer math to reduce spatial redundancy. The decoder must perform an inverse transform to recover the residual frame - the frame containing the error between the actual frame and the motion compensated frame. This frame is added to the motion compensated frame to get the final frame to be displayed.
 
FFMPEG makes heavy use of SIMD operations and is relatively fast when it comes to inverse transformation, however, this portion will also ideally need to be moved on to the GPU, if only to reduce GPU<->CPU transfers.
 
 
=== Deadlines and Deliverables ===
This is only a very tentative outline. Any given module may be completed in a longer or shorter period. It simply serves as a guideline for what I hope to achieve and the order I plan to set about doing it. At the very least I am hoping I can get the two primary modules (motion compensation and IDCT) implemented and integrated with the SVN repository over the summer.


# Architecture Research - 2 Weeks- Researching and choosing the architecture of the hardware decoder. At the end of this period I will have code running from within XBMC that uses the given architecture.
# Architecture Research - 2 Weeks- Researching and choosing the architecture of the hardware decoder. At the end of this period I will have code running from within XBMC that uses the given architecture.
# H.264 decoder Skeleton – 1 Week – A skeleton of my H.264 decoder will be written. Will be compromised of mostly the software implementation currently within XBMC (which I believe is handled by FFmpeg at the moment), and will be replaced with the hardware components as they are written.
# H.264 decoder Skeleton – 1 Week – A skeleton of my H.264 decoder will be written. Will be compromised of mostly the software implementation currently within XBMC (which I believe is handled by FFmpeg at the moment), and will be replaced with the hardware components as they are written.
# Inverse DCT and Quantization - 3 Weeks - Implementation of inverse Quantization and DCTs using the chosen architecture. Success will be judged by successfully lowering the amount of CPU cycles required to decode a given H.264 video over the initial software implementation
# Motion Compensation - 3 weeks - Implementation of Motion Compensation using the chosen architecture
# Interframe Prediction - 3 weeks - Implementation of Interframe Prediction using the chosen architecture.
# Inverse Transformation - 3 Weeks - Implementation of inverse Transformation . Success will be judged by successfully lowering the amount of CPU cycles required to decode a given H.264 video over the initial software implementation
# Testing and Final Commit – 2 week - - Extensive testing as well as modifying the code to fit within the XBMC repository. Any left over time will be spent researching other areas to accelerate with hardware.
# Testing and Final Commit – 2 week - - Extensive testing as well as modifying the code to fit within the XBMC repository. Any left over time will be spent researching other areas to accelerate with hardware.




===Personal Statement===
=== Personal Statement ===
I am currently a 21 year old senior at the Massachusetts Institute of Technology majoring in Computer Science and Engineering. In the fall of 2008, I will enter an MIT Master’s program in Electrical Engineering and Computer Science. Systems programming and signal processing have always been interests of mine. I feel that this project will offer me a unique chance to blend my interests in low level systems programming with my passion for signal processing.
I am currently a 21 year old senior at the Massachusetts Institute of Technology majoring in Computer Science and Engineering. In the fall of 2008, I will enter an MIT Master’s program in Electrical Engineering and Computer Science. Systems programming and signal processing have always been interests of mine. I feel that this project will offer me a unique chance to blend my interests in low level systems programming with my passion for signal processing.


I have a lot of experience coding at all levels of complexity. From creation of Java Chess software to the implementation of an Operating System on the level of Unix-version 6 in C and x86 assembly. Working on this project will allow me to grow as a programmer as well as contribute to excellent open source software.
I have a lot of experience coding at all levels of complexity. From creation of Java Chess software to the implementation of an Operating System on the level of Unix-version 6 in C and x86 assembly. Working on this project will allow me to grow as a programmer as well as contribute to excellent open source software.


 
[[Category:Google Summer of Code]]
 
 
[[category:Google Summer of Code]]

Revision as of 00:43, 13 May 2015

Home icon grey.png   ▶ Development ▶ Category:Google Summer of Code ▶ GSoC - GPU Assisted Video Decoding
Notice This page describes a previous GSoC project. For the most recent GSoC pages please check Category:Google Summer of Code.

GPU Hardware Accelerated Video Decoding of H.264 (MPEG-4 part-10, a.k.a. MPEG-4 AVC) encoded video.

    • Proposed by student: Rudd
    • Proposed primary mentor: D4rk
    • Proposed backup mentor: Elupus


GSoC proposal for 2008

This is a discussion about a proposal for XBMC's Google Summer of Code 2008

Current Draft

Name

Robert Rudd (IRC & XBMC.org username: Rudd)

E-Mail

[email protected]

Project Title

GPU Hardware Accelerated Video Decoding of MPEG-4 encoded video

Benefits

  1. This project will improve the end-user experience tremendously. XBMC is a prime choice for many users who have an HTPC. These Machines traditionally don't have excessive amounts of processing power. Any computations that could be offloaded to unused hardware would be ideal.
  2. Exploring new territory: Much talk has been given to the use of GPUs for general purpose computation. An actual implementation of a video decoder in a non-GPU specific manner would be quite the achievement!


Project Abstract

It is not atypical for a modern computer to have 2 primary processing units: a Central Processing Unit (CPU) which is capable on the order of 1 GFLOPS, and a Graphics Processing Unit(GPU) which is capable on the order of 100 GFLOPS. Despite this, when playing a high definition H.264 video, which requires on the order of 10's of millions of operations per second, only the CPU is used. Even accounting for the fact that the GPU is highly optimized for floating point operations and the H.264 codec strives to avoid floating point operations, it is still quite obvious there is a large amount of processing power that is being left unused.

I propose that hardware assisted H.264 decoding be integrated into XBMC. While, I plan on spending some of the time leading up to this summer researching the ideal architecture to use when implementing this, I am currently leaning towards an OpenGL Shader Language (GLSL) implementation to keep it as portable and vendor-neutral as possible.

Detailed Description

This proposal outlines an implementation of hardware H.264 decoding for the DVDPlayer (XBMC's own in-house developed video-player based on FFmpeg). This project can be broken into two primary components; architecture research and implementation.


Architecture Research

While, it pains me to include something as nebulous as "research" into my project proposal, I do feel that the time spent researching alternatives is necessary and will be well spent. Using a GPU for general purpose computing (the GPGPU movement) is a rather new field of inquiry. There are very few accepted standards, with each vendor proposing their own solutions. Thus far possible initiatives for GPU accelerated video coding/decoding include: Intel's VAAPI, Nvidia's CUDA/FreeVideo, and ATI’s Avivo/CTM. This is in addition to any video capabilities within the DirectX and OpenGL APIs and several GPGPU libraries such as Brook and lib sh.

At the time of this writing, I am strongly leaning towards an implementation done in OpenGL Shader Language (GLSL). While, unlike many of the other architectures, GLSL was not made with the intent purpose of allowing a GPU to perform general purpose calculations, it has the advantages of providing enough functionality to do video decoding calculations, while being part of an API that is almost universally supported by GPUs and Operating Systems. Since XBMC strives to be as portable as possible, I feel that GLSL is the most advantageous. However, I will definitely not overlook other architectures. Specifically the GPGPU libraries look very promising.


Implementation

While the above research is not to be underestimated, the core of the project will be the implementation of hardware acceleration in whatever architecture I choose. Over the course of the summer, I hope to offload as much of the H.264 processing workload as I can onto the GPU. I will concentrate on the following 2 areas:

Motion Compensation

Motion Compensation is one of the most computationaly intensive portions of the H.264 decoding process. Due to the fact that H.264 supports up to quarter-pixel interpolation there may be an enormous amount of computation for a single frame. Additionally, since each macroblock is not only allowed to point to a any point on a given reference picture, but can also point to different reference picture, this can cause havoc on the system's cache.

However, the motion compensation is parallelizable across macroblocks. Each macroblock can be interpolated separately of other macroblocks, presenting a problem that is easily mapped onto the GPU.

The approach that will be taken will most likely be one of the 2 presented in the following papers:

Outlines a 6 pass strategy - 1 for Fullpel, 2 for Halfpel, 3 for Quarterpel. A quad is rendered for each block/subblock in the frame. the motion vector is repeated at each of it's vertices, and the vertex shader translates the texture coordinates apropriately. The vertex shader also handles pushing irrelevant blocks out of hte viewing frustum to reduce computation. Afterwards a fragment shader is used to perform the actual interpolation


Outlines a 3 pass strategy - 1 for fullpel, 1 for off-center interpolations, 1 for center interpolations. Similar approach as above, but batches the blocks differently.



Inverse Transformation and Reconstruction

The H.264 codec uses a frequency transform similar to the IDCT, but based on integer math to reduce spatial redundancy. The decoder must perform an inverse transform to recover the residual frame - the frame containing the error between the actual frame and the motion compensated frame. This frame is added to the motion compensated frame to get the final frame to be displayed.

FFMPEG makes heavy use of SIMD operations and is relatively fast when it comes to inverse transformation, however, this portion will also ideally need to be moved on to the GPU, if only to reduce GPU<->CPU transfers.


Deadlines and Deliverables

This is only a very tentative outline. Any given module may be completed in a longer or shorter period. It simply serves as a guideline for what I hope to achieve and the order I plan to set about doing it. At the very least I am hoping I can get the two primary modules (motion compensation and IDCT) implemented and integrated with the SVN repository over the summer.

  1. Architecture Research - 2 Weeks- Researching and choosing the architecture of the hardware decoder. At the end of this period I will have code running from within XBMC that uses the given architecture.
  2. H.264 decoder Skeleton – 1 Week – A skeleton of my H.264 decoder will be written. Will be compromised of mostly the software implementation currently within XBMC (which I believe is handled by FFmpeg at the moment), and will be replaced with the hardware components as they are written.
  3. Motion Compensation - 3 weeks - Implementation of Motion Compensation using the chosen architecture
  4. Inverse Transformation - 3 Weeks - Implementation of inverse Transformation . Success will be judged by successfully lowering the amount of CPU cycles required to decode a given H.264 video over the initial software implementation
  5. Testing and Final Commit – 2 week - - Extensive testing as well as modifying the code to fit within the XBMC repository. Any left over time will be spent researching other areas to accelerate with hardware.


Personal Statement

I am currently a 21 year old senior at the Massachusetts Institute of Technology majoring in Computer Science and Engineering. In the fall of 2008, I will enter an MIT Master’s program in Electrical Engineering and Computer Science. Systems programming and signal processing have always been interests of mine. I feel that this project will offer me a unique chance to blend my interests in low level systems programming with my passion for signal processing.

I have a lot of experience coding at all levels of complexity. From creation of Java Chess software to the implementation of an Operating System on the level of Unix-version 6 in C and x86 assembly. Working on this project will allow me to grow as a programmer as well as contribute to excellent open source software.