NPRG042 Programming in Parallel Environment
Labs 04 - SYCL
hpc.troja.mff.cuni.cz), instead of parlab. The Chimera head node uses the same credentials LDAP, but different NFS storage for home directories, but it gives you access to our GPU worker nodes.
Listing SYCL platforms and devices
Copy the /home/teaching/NPRG042/labs/sycl-list-devs directory to your home and execute the run.sh script. It compiles and runs the executable that lists all SYCL platforms and devices available on the system. Experiment with the number of GPUs used for execution (check the trailing 1 in ${SARGS}:1; it adjusts the number of GPUs in the --gpus option of srun), and observe the output. One Volta server has 4 GPUs (do not use higher numbers).
At present, only volta servers have SYCL properly installed, but those are the servers we would like to use anyway (since they have 11 GPUs together).
Image blur
The main objective of this lab is to implement a data-parallel version of the image Blur stencil using SYCL. The blur stencil receives a grey-scale image (pixels are represented as floats) and a radius parameter as input. It then recalculates the pixel values independently by computing a weighted average of all pixels in a surrounding area, the size of which is determined by the radius. The weights are inverse Manhattan distances from the central pixel, where 5 is used as a constant weight for the central pixel.
Assuming x and y are the coordinates of the pixel, the pseudocode for accumulating sum and weight values from which the new pixel is computed as sum / weight is as follows:
for (int dy = y - radius; dy <= y + radius; ++dy) {
for (int dx = x - radius; dx <= x + radius; ++dx) {
int distance = std::abs(dx) + std::abs(dy); // Manhattan distance
int weight = (distance > 0) ? 1 / (T)distance : (T)5;
weights += weight;
sum += image[x, y] * weight;
}
}
Let us emphasize that the actual window traversed by dx, dy needs to be clamped to the image boundaries.
An example of an image blurred with radius = 5 is shown below:
Compilation and execution
The initial source code with the serial solution is ready in the /home/teaching/NPRG042/labs/sycl-blur directory on hpc.troja.mff.cuni.cz. The initial directory holds the main file blur.cpp that you need to modify. Implement the sycl_blur() function using the serial_blur() function as a reference. The shared directory contains the image wrapper implementation and the stopwatch class. Images are available in the data directory.
For compilation, you may use the provided Makefile. Note it uses the Intel DPC++ compiler (icpx) with appropriate sycl flags.
For your convenience, you may use the provided run.sh. This is a bash script (not an sbatch script); it calls make and executes the compiled executable using separate srun invocations, since this task should be quite fast and it might be better to run it directly in the terminal.
If you wish to use srun yourself, do not forget to add the --gpus option to allocate a GPU. --gpus=1 will allocate one GPU (any available). --gpus=V100:1 will allocate one Volta GPU (those you should use primarily since we have 11 of them and only Voltas are SYCL-enabled). Please note that there are two versions of Volta GPUs in the HPC cluster; for reference measurements, you may fix which worker is being targeted using the -w option (nodes are named volta01-volta05).
Always remember to use --gpus attribute with srun. Otherwise, srun will actually hang up (due to an error), even if you do not require the GPU.