NPRG058: Advanced Programming in Parallel Environment

Final Assignment

The final assignment is to implement the k-medoids algorithm to cluster image feature signatures using SQFD as a distance function. The SQFD was introduced in the second CUDA lecture and is revised below. The serial implementation is available on NFS at /home/_teaching/advpara/final-kmedoids, and you can find the same code in your prepared Git repository. You can use any parallel approach and any technologies you deem fit; however, we expect you to use a combination of at least two different technologies.

Algorithm description

The k-medoids algorithm is quite similar to k-means clustering. It divides a set of objects into k clusters based on their similarity, defined by a given distance function. Unlike k-means, where the objects are points in R^d and Euclidean distance defines (dis)similarity, k-medoids makes more relaxed assumptions. The clustered objects and the given (dis)similarity function d(o_1, o_2) can be anything; the only assumption is that d conforms to the metric axioms. Therefore, the assignment step of the algorithm is virtually the same as in k-means, but the construction of means (centroids) is quite different.

Since we cannot construct the centroid of a cluster using vector operations (like in k-means), we choose one of the assigned objects as a representative (centroid) of the cluster. The best object for the job is the one that lies closest to the center, so we compute distances between all objects within one cluster and subsequently select the object with the lowest sum of distances to all its neighbors.

The input dataset comprises feature signatures of images. A feature signature is a list of tuples (f_i, w_i), where f_i is a feature vector R^d of fixed dimensionality (d = 7 in our case) and w_i is its weight (in R+). Signatures may differ in size, so that simpler images are represented by shorter lists and vice versa. Our signatures are typically tens to hundreds of features (tuples) long.

The SQFD (Signature Quadratic Form Distance) is used as a distance function that measures dissimilarity, and it can handle the fact that the input signatures have different sizes. How this function is computed was described in detail in the lectures; however, here is a quick refresher:

sqfd(o, q) = sqrt((w_o | -w_q) x A_f x (w_o | -w_q)^T), where o and q are input feature signatures, | is vector concatenation, and A_f is a matrix defined as

A_f(i,j) = exp(-α L_2^2(f_i, f_j)), where f_i is the i-th feature of the concatenated vector of features (from o and q), and α is a constant tuning parameter.

Remember that the best way to parallelize SQFD internally is to evaluate all elements of A_f independently, multiply them by the appropriate weights (according to distributivity rules), and sum them up using a parallel reduction.

Technical details

There are only two strong evaluation criteria:

The solution must utilize a combination of (at least) two parallel technologies. A natural choice would be CUDA + MPI; however, using CUDA (multi-GPU) with smart host-based scheduling or MPI + an elaborate CPU parallel solution is also acceptable.
The solution must exhibit sufficient speedup (based on selected technologies) -- i.e., a speedup that would be expected from a straightforward, decent implementation.

Also provide a README file with:

a brief summary explaining how your solution was implemented (mainly which parallel technologies were used and how),
information on how to build and execute the solution (especially if you use MPI or if you added some extra CLI arguments),
and measured (expected) speedup compared to the provided serial baseline (that correspond to the execution arguments provided in the previous point).

Submit your solution into the prepared Git repository and, once it is complete, notify the teachers via Mattermost. There is no deadline, but it is strongly recommended to submit your solution at least one week before you plan to attend the oral exam, so that there is enough time to review the solution and award you credit.