Skip to content

Phase I Report

Parallel Performance Evaluation (Process-based Matrix Multiplication)

This page includes the full standalone HTML report inside the Phase I section of the website.

Use the button below if you want to open the report as a separate full page.

Open Full HTML Report Open PDF Version


Embedded Report

6) Discussion

In theory, parallel processes can reduce execution time by dividing computation across cores. However, Phase I results show that for the tested sizes (1200– 2400), using 4 processes provided minimal or negative performance impact.

Why performance did not improve

  • Process Creation Overhead

Creating multiple processes with fork() introduces overhead that can dominate short executions.

  • Synchronization Cost

The parent must wait for all children (wait()), so total time includes coordination overhead.

  • Problem Size Threshold

For these matrix sizes, the computation may not be large enough to outweigh overhead.

For significantly larger matrices (e.g., N=10000), performance gains may become more likely because computation time would dominate process overhead.


6.1 Implications for Parallel Programming

  • Parallelism improves performance only when workload exceeds overhead cost
  • Using all available cores does not guarantee better performance
  • There is a minimum problem size where parallelization becomes beneficial

6.2 Experimental Limitations

  • Virtualization overhead: VM layer may add performance penalties
  • ARM64 architecture: Different characteristics than x86 systems
  • Initialization overhead: Random initialization adds non-parallelizable work

7) Conclusion

In Phase I, we successfully:

  • Set up Ubuntu Linux (24.04.3 LTS ARM64) in a virtual machine
  • Installed GCC and development tools
  • Compiled and executed the provided program
  • Tested 6 configurations
  • Collected execution-time data across all runs

Key findings

  • Parallel processes did not provide meaningful speedup for these sizes
  • Process creation and synchronization overhead likely dominated runtime
  • Larger problem sizes may be required to observe clear benefits
Working VM

Working VM

Installing Kernel

Installing Kernel

Log in and update

Log in and update

Passing Memory Tests

Passing Memory Tests


Future Work Recommendations

  • Thread-Based Parallelism

Evaluate pthread-based implementations and compare overhead.

  • Improved Work Distribution

Explore alternative decomposition strategies for better load balance.

  • Overhead Reduction

Investigate techniques such as process pooling or reduced synchronization.

  • Larger Problem Sizes

Test matrices larger than 2400×2400 to identify the crossover point.

  • Comparative Benchmarks

Compare against sequential and optimized baselines to quantify overhead costs.


Downloadable Version

Open the PDF report