Calculate more S values per work group
Particularly important for Alm rotation and translation kernels where a large matrix is loaded into a local array and then used only for a few Alm elements.
Options:
- Put a loop over S inside the kernel
- Load Alm elements into float vectors and use vector operations.