we generalize the vector in the back into a matrix, which is the multiplication of matrices.
we have previously discussed using gradient descent for determining the parameters of a linear regression model. however, if we have matrix multiplication, we can solve this problem without using gradient descent.
let's start with an example of calculating the product of two matrices shown in the graph below:
how can we calculate this? in the graph above, the matrix on the left is a 23 matrix, and the one on the right is a 32 matrix. we can extract the first column of the matrix on the right, which becomes a 23 matrix multiplied by a 31 column vector, making it the same as what we previously discussed. as shown in the graph below, we obtain a 21 column vector:
similarly, by extracting the second column of the matrix on the right and multiplying, we obtain another 21 column vector. then, by concatenating these two column vectors, we obtain the multiplication result of the two matrices.
in the previous example, the matrix on the left is a 23 matrix, and the one on the right is a 32 matrix. do we have the same requirement in general cases? let's take a look together.
for the general case, the form of matrix multiplication is shown in the graph below:
from the graph above, we can see that for ab, it is only required that the number of columns of a is equal to the number of rows of b, and the resulting matrix c has the same number of rows as a and the same number of columns as b.
as we learned from the previous example, the multiplication of matrix a and matrix b can be simplified into the multiplication of matrix a and the column vector of matrix b. then, the results are concatenated to form c. this completes the multiplication between matrices.
in the process of simplifying the multiplication between a matrix and a matrix into the multiplication between a matrix and a column vector, the matrix on the right (a) is used o times (i.e., the number of columns of matrix b), while the matrix b is divided into o column vectors for use. it's quite interesting to think about this.
let's calculate the multiplication of two 22 matrices. the operation process is shown in the graph below:
let's see how the establishment of the rule for matrix multiplication can make it easier for us to express practical problems.
assuming we have four houses with different areas (in square feet) as shown in the graph below:
in this case, the relationship between the area and the selling price no longer follows a single function. let's assume we have three possible formulas for calculating the selling price of the house. with matrix multiplication, we can easily describe this problem.
we take the parameters of the first model and place them in the first column of matrix b, the parameters of the second model in the second column, and the parameters of the third model in the third column. then, we have the matrix multiplication equation as shown below:
by calculating the above equation, we can obtain a 43 result matrix. each column of the result matrix corresponds to the predicted selling price of the corresponding model for the four houses. as shown in the graph below:
with just one matrix operation, we can complete the predictions of the selling prices of the four houses for three different models. the 12 calculations for house price predictions are represented by one matrix multiplication equation, which is quite exciting to think about.
what's even better is that almost every widely used programming language has a good linear algebra library that implements matrix multiplication. furthermore, if we want to compare the performance of different models, we only need to compare the result matrices.