NPRG042 Programming in Parallel Environment

Assignment 1


Parasheet
C# .NET Tasks
Assigned: 2.3.2026
Deadline: 13.3.2026 23:59 (CET)
ReCodEx: assignment
Results: w201 (32 threads), w202 (64 threads)
speedup points
2.5× or less 0
2.5× to 6× 1
6× to 10× 2
10× to 15× 3
15× or more 4

The objective is to write an updater for an Excel-like sheet (with values and functions) using C# .NET built-in parallel abstractions (e.g., Tasks). The sheet is represented as a 2D array of cells. Each cell may contain either a literal value (we use only integers for simplicity) or a function. A function is an arithmetic operation that takes values from other cells as inputs and evaluates to an integer as well (i.e., each cell has an integer value that is either fixed or computed). When a cell is updated (its value is changed), all function cells that (transitively) depend on it must be recalculated.

To simplify debugging, we will use only three functions (SUM, MIN, and MAX). However, to simulate heavy computations (and to justify parallel evaluation), function cells have randomly assigned computational cost (a positive integer), which denotes how many times the function computed (within each update). This also helps with the verification of the parallel solution, as the source cells are accessed during each iteration and we can verify that each iteration yields the same result (i.e., when a cell is being recalculated, its input cells should not change). Furthermore, the dependencies between cells do not form a loop (i.e., the sheet is a directed acyclic graph), so we can always find a valid order of evaluation.

To achieve better performance, the updates and recalculations should run in parallel. The updates are issued sequentially (as if the user were editing the sheet), but they may be resolved concurrently. For the sake of simplicity, only the cells with literal values are being updated in our scenarios, so the dependency graph of the functions does not change over time. Once in a while, the user may ask for the current state of the sheet (a printout). At that point, the main thread waits for all pending updates to complete and then prints the sheet (cell values). The printing time is not measured, but the waiting time is.

Interfaces

The code is designed so that the updating algorithm is separated from the sheet representation, and all the related tasks (data loading, time measurement, etc.) are already implemented in the framework. Your job is to implement the IParasheetUpdater interface, which is defined as follows:

    public interface IParasheetUpdater
    {
        void AttachSheet(IParasheet sheet);
        Task? UpdateValue(int row, int column, int value);
    }

The AttachSheet method is called once at the beginning of the execution. You may perform initialization of any support structures there. The UpdateValue method is called whenever a cell with a literal value is updated. It must correctly handle the update of the cell value and trigger the recalculation of all dependent function cells. The method returns a Task that represents the update operation, so the caller can wait for its completion if needed (e.g., before printing the sheet). You may return null if there is no need to wait for the update (i.e., when the update was handled synchronously).

    public interface IParasheet
    {
        int RowCount { get; }
        int ColumnCount { get; }
        ICell GetCell(int row, int column);
    }

The Parasheet implementation is merely a 2D container for cells, which have the following interface:

    public interface ICell
    {
        int Row { get; }
        int Column { get; }
        IEnumerable<ICell> GetDependencies();
        IEnumerable<ICell> GetDependents();
        int GetValue();
        void SetValue(int newValue);
        void SetFunction(Func<IEnumerable<ICell>, int> function, IEnumerable<ICell> dependencies);
        bool UpdateValue();
    }

The value of literal cells is set by SetValue. It should not be used on function cells. The UpdateValue method is used to trigger the recalculation of a function cell, and it returns true if the value of the cell has changed (false result could be used for optimizations of transitive recalculations). GetDependencies returns a container representing all cells that are used as inputs for the function (value cells have no dependencies). GetDependents returns a container representing all cells that use the current cell as an input (i.e., reverse dependencies). GetValue returns the current value of the cell (either a literal or a function result after the last update). SetFunction should not be called in your updater implementation (it is used only for initialization of the sheet in the framework).

The parasheet and cell classes are already implemented by the framework; your job is to implement the updater using the prescribed interfaces. The cell methods are not thread-safe, and it is your responsibility to ensure that they are not called concurrently on the same cell (except for the cases of concurrent reads).

Implementation details

The updater must specifically ensure that the following conditions are met:

On the other hand, it is allowed to implement optimizations that will put the sheet (temporarily) in an inconsistent state. Namely, you may skip a scheduled update of a function cell if a new affecting change is detected in the sheet before the update itself is performed. For example, let us have value cells V1 and V2 and function cells F1 (computed from V1) and F2 (computed from F1 and V2). If V1 is changed, F1 needs to be updated, which may then trigger the update of F2. If V2 is changed before the update of F2 begins, it does not have to be scheduled for update again. In the sequential case, F2 would be updated twice (first after the V1 change and then after the V2 change), but in the parallel case, it may be updated only once (if the timing is right).

If one of the cells is being updated and one of its inputs is changed during the update, it will be detected by the framework and the cell will be marked as inconsistent (this flag is internal, not accessible via the interface). Your implementation must make sure that such a situation does not happen or that inconsistent cells are re-evaluated (before the next printout). Using a similar sheet from the previous example (V1, V2, F1 <- V1, F2 <- (V2, F1)), let us consider the following scenario: V2 is updated so F2 begins to update. Meanwhile, V1 is updated, so F1 begins to update. F1 completes first, which changes the input for F2 during its computation. This is OK only if F2 is re-evaluated again (after its pending, inconsistent update completes and before the next printout of the sheet).

Submission

Your implementation must be wrapped in the ParasheetUpdater class (implementing IParasheetUpdater interface) in the ParasheetUpdater.cs file, which is also the only file you must submit in ReCodEx. Do not rely on any implementation aspects of the framework, only the provided interfaces.

Your solution will be tested on .NET Core 9.0 (which is both in ReCodEx and on Parlab). The testing framework, serial implementation, and some testing data are available at /home/_teaching/para/01-parasheet or you may simply download them here. The framework can be compiled using the dotnet command:

$> dotnet build -c Release ./Parasheet.sln

The compiled binary will appear in the ./bin subdirectory.

When using parlab cluster, perform even the compilation on the workers using srun:

$> srun -p mpi-homo-short -A nprg042s dotnet build -c Release ./Parasheet.sln

Testing

Please note that ReCodEx is only a first-pass verification. Your solution will be subjected to performance evaluation on Parlab after the deadline (all solutions are collected and tested together). It will be measured using 32 and 64 cores on w20x machines. You may (and should) try it yourself:

$> srun -c 64 -n 1 -p mpi-homo-short -A nprg042s ./Parasheet --minCost 1000 --maxCost 20000 --updater parasheet <sheet.csv> <update.plan>

To determine your speedup, use --updater serial to execute the serial version of the updater and compare the times. The program writes two numbers to stderr: the first is the measured time (in seconds), which will be used in the evaluation. The second is total blocking time (excluding time spent waiting for tasks started by the framework), and it is for your information only.

For the final evaluation, we will use the same execution parameters and similar testing data you are given in the data sub-directory (just generated with different random seeds). You may generate more testing data using the generator.py. The testing data were generated as follows:

python generator.py --rows 32 --cols 32 --steps 40 --functions 800 --avg-degree 16 --pattern random random-32
python generator.py --rows 32 --cols 32 --steps 40 --pattern grid grid-32
python generator.py --rows 128 --cols 64 --steps 1024 --print-frequency 0 --clusters 1024 --pattern clusters clusters-1k
python generator.py --rows 32 --cols 32 --steps 4 --print-frequency 1.0 --functions 1000 --layers 4 --pattern layers layers-32