What is blocked matrix multiplication?

What is blocked matrix multiplication?

When two block matrices have the same shape and their diagonal blocks are square matrices, then they multiply similarly to matrix multiplication. For example, (7) Note that the usual rules of matrix multiplication hold even when the block matrices are not square (assuming that the block sizes correspond).

Why is blocked matrix multiplication faster?

The major difference from an unblocked matrix multiplication is that we can no longer hold a whole row of A in fast memory because of blocking. For each iteration of kk, we will need to load A[ii, kk] into fast memory. However, blocking increases computational density.

What is blocking in cache?

Cache Blocking is a technique to rearrange data access to pull subsets (blocks) of data into cache and to operate on this block to avoid having to repeatedly fetch data from main memory.

How do you optimize a matrix multiplication?

Efficient Matrix Multiplication relies on blocking your matrix and performing several smaller blocked multiplies. Ideally the size of each block is chosen to fit nicely into cache greatly improving performance. The ideal block size depends on the underlying memory hierarchy (how big is the cache?).

What is the compatibility of block matrix multiplication?

The only requirement is that the blocks be compatible. That is, the sizes of the blocks must be such that all matrix products of blocks that occur make sense. This means that the number of columns in each block of must equal the number of rows in the corresponding block of .

How is block size cache calculated?

In a nutshell the block offset bits determine your block size (how many bytes are in a cache row, how many columns if you will). The index bits determine how many rows are in each set. The capacity of the cache is therefor 2^(blockoffsetbits + indexbits) * #sets. In this case that is 2^(4+4) * 4 = 256*4 = 1 kilobyte.

How do you find a block in a cache?

To search a word in the cache

  1. The set is identified by the index bits of the address.
  2. The tag bits derived from the memory block address are compared with the tag bits associated with the set. If the tag matches, then there is a cache hit and the cache block is returned to the processor.

What is the inverse of a block matrix?

Notice that the inverse of a block diagonal matrix is also block diagonal. Similarly, the inverse of a block secondary diagonal matrix is block secondary diagonal too, but in transposed partition so that there is a switch between B and C.

author

Back to Top