On a technical level, the entire Integrate process includes two steps:
The first step is to improve parallelism by dividing the screen into 8x8 pixel tiles and then organizing them into three lists in different ways according to the Integrate method.
The second step is to perform Integrate calculations on each of these three lists separately.
This engineering design helps to improve parallelism, meaning that there won't be situations where some threads in the same group have heavy workloads while others have light workloads, and the entire group still needs to wait for the slowest thread to finish its work.
<aside> ℹ️ Engine/Shaders/Private/Lumen/LumenScreenProbeGather.usf
</aside>
<aside> ℹ️ Engine/Shaders/Private/Lumen/LumenScreenProbeGather.usf : ScreenProbeIntegrateCS
</aside>
In our Diffuse Cube test, all pixels were identified as 'Support Importance Sample BRDF', so we proceeded with the process for this part.
Overall, this step includes three sub-phases: