☕Implement of Parallel Matrix Multiplication Methods Using FOX Algorithm on Peking University's High-performance Computing System
Yes We Code
- Reference Documents
- Thomas Anastasio, Example of Matrix Multiplication by Fox Method
- Jaeyoung Choi, A New Parallel Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
- Ned Nedialkov, Communicators and Topologies: Matrix Multiplication Example
- Source Codes
- C language
- Fortran
- Source Codes' Contents
- Code Tests
- Dell XPS8900
- Code Test on Dell XPS8900 Workstation (Intel® Core™ i7-6700K Processor)
- Analyzing MPI Performance Using Intel Trace Analyzer
- PKU-HPC
- Lenovo X8800 Supercomputer Platform
- Code Performance Tests on X8800 Supercomputer Platform's CPU Node (Intel® Xeon® Processor E5-2697A v4)
- Code Performance Tests on X8800 Supercomputer Platform's MIC Node (Intel® Xeon Phi™ Processor 7250)
- Code Tests' Contents
- Dell XPS8900
- Reports
- 1801111621_洪瑶_并行程序报告.pdf
- 并行程序报告.docx
- 洪瑶_1801111621并行程序设计报告.pptx
- Parallel FOX Algorithm Project Report.pptx (will be added in the future)
- Parallel FOX Algorithm Project Report Paper.tex (will be added in the future)
- Parallel FOX Algorithm Project Report Paper.pdf (will be added in the future)
- Reports' Contents
- Imagines
- FOX.png
- FOX Stage Whole.JPG
- FOX Stage Loading Balance.png
- 规约计算 (Reduction)
- 拥有者计算原则 (Owner Computing Rule)
- 流水并行(Pipeline Parallelism):
- 在一个进程上,矩阵计算被划分为P个阶段 (P Supercomputing Steps in a Process)
- 数据并行 (Data Parallelism):
- 在每个进程上同时计算局部的矩阵乘积 (Local Matrix Multiplications are computing on every processess at the same Computing Step)
-
Mathematical Modeling of Matrix Multiplication
-
Time Complexity
-
Storage Complexity
-
Example Implementation in C Language
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (k = 0; k < n; k++)
C(i,j) = C(i,j) + A(i,k)*B(k,j);
- Basic Flow
- Matrix 's Dimension is , and Matirx 's Dimension is a .
- Compute Matrix in parallel.
- Let is the number of processors, and be an integer such that it devides and .
- Create a Cartesian topology with process mesh , and , .
- Denote , , .
- Distribute and by blocks on p processess such that is block and is block, stored on process .
- Details
-
Partitions of Matrices A, B and C. (Index syntax in Mathematical form: start from 1)
-
Data Distribution on the 2-D Cartesian Topology Processes Mesh (Index syntax in Mathematical formulars: start from 1)
-
Data Mapping
-
Partition may not perfect such that every sub-matrix is a square matrix. Yet, that's not a problem, except load unbalance on each process!
-
Unbalanced Partition
-
Mathematical Modeling of Sub-Matirx Multiplication
-
Parallelism type: Data parallelism with Pipeline parallelism
-
Rewrite the formula of Sub-Matirx Multiplication as q−1 Supercomputing Steps
-
Parallel Modeling Algorithm Operations on each step:
-
-
Algorithm Analysis on each Supercomputing Step
-
-
Communication in total
- Computing in total
-
n_bar = n/grid->q; Set_to_zero(local_C); source = (grid->my_row + 1) % grid->q; dest = (grid->my_row + grid->q - 1) % grid->q; temp_A = Local_matrix_allocate(n_bar); for (stage = 0; stage < grid->q; stage++) { bcast_root = (grid->my_row + stage) % grid->q; if (bcast_root == grid->my_col) { MPI_Bcast(local_A, 1, local_matrix_mpi_t, bcast_root, grid->row_comm); Local_matrix_multiply(local_A, local_B,local_C); } else { MPI_Bcast(temp_A, 1, local_matrix_mpi_t, bcast_root, grid->row_comm); Local_matrix_multiply(temp_A, local_B,local_C); } MPI_Sendrecv_replace(local_B, 1, local_matrix_mpi_t, dest, 0, source, 0, grid->col_comm, &status); }
- Intel® Trace Analyzer Statistics results While FOX Kernel Executing
- Intel® Trace Analyzer's Load Balance Analysis While FOX Kernel Executing
Maybe, there are many mistakes in the both documents and Codes, because of the limitation of our knowledge and strength. As a result: THESE DOCUMENTS AND CODES ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. I MAKE NO WARRANTIES, EXPRESS OR IMPLIED, THAT THEY ARE FREE OF ERROR.
You can use and copy these works for any academic purpose, Except just copy to finish your homework or republish these works without proper declare their original author.