Phase I Report¶

Parallel Performance Evaluation (Process-based Matrix Multiplication)

This page includes the full standalone HTML report inside the Phase I section of the website.

Use the button below if you want to open the report as a separate full page.

Embedded Report¶

6) Discussion¶

In theory, parallel processes can reduce execution time by dividing computation across cores. However, Phase I results show that for the tested sizes (1200– 2400), using 4 processes provided minimal or negative performance impact.

Why performance did not improve¶

Process Creation Overhead¶

Creating multiple processes with fork() introduces overhead that can dominate short executions.

Synchronization Cost¶

The parent must wait for all children (wait()), so total time includes coordination overhead.

Problem Size Threshold¶

For these matrix sizes, the computation may not be large enough to outweigh overhead.

For significantly larger matrices (e.g., N=10000), performance gains may become more likely because computation time would dominate process overhead.

6.1 Implications for Parallel Programming¶

Parallelism improves performance only when workload exceeds overhead cost
Using all available cores does not guarantee better performance
There is a minimum problem size where parallelization becomes beneficial

6.2 Experimental Limitations¶

Virtualization overhead: VM layer may add performance penalties
ARM64 architecture: Different characteristics than x86 systems
Initialization overhead: Random initialization adds non-parallelizable work

7) Conclusion¶

In Phase I, we successfully:

Set up Ubuntu Linux (24.04.3 LTS ARM64) in a virtual machine
Installed GCC and development tools
Compiled and executed the provided program
Tested 6 configurations
Collected execution-time data across all runs

Key findings¶

Parallel processes did not provide meaningful speedup for these sizes
Process creation and synchronization overhead likely dominated runtime
Larger problem sizes may be required to observe clear benefits

Working VM

Installing Kernel

Log in and update

Passing Memory Tests

Future Work Recommendations¶

Thread-Based Parallelism¶

Evaluate pthread-based implementations and compare overhead.

Improved Work Distribution¶

Explore alternative decomposition strategies for better load balance.

Overhead Reduction¶

Investigate techniques such as process pooling or reduced synchronization.

Larger Problem Sizes¶

Test matrices larger than 2400×2400 to identify the crossover point.

Comparative Benchmarks¶

Compare against sequential and optimized baselines to quantify overhead costs.

Downloadable Version¶

Open the PDF report

Phase I Report¶

Embedded Report¶

6) Discussion¶

Why performance did not improve¶

Process Creation Overhead¶

Synchronization Cost¶

Problem Size Threshold¶

6.1 Implications for Parallel Programming¶

6.2 Experimental Limitations¶

7) Conclusion¶

Key findings¶

Future Work Recommendations¶

Thread-Based Parallelism¶

Improved Work Distribution¶

Overhead Reduction¶

Larger Problem Sizes¶

Comparative Benchmarks¶

Downloadable Version¶