Monday, May 15, 2017

Jacobian

The Jacobian matrix is used when performing a change of coordinates, often to simplify an integration. This is because the Jacobian tells us how much the area around an N-dimensional space is skewed when changing bases. Intuitively we would think that for some change of bases function 
$$T(u,v)\ =\ (x(u,v), y(u,v))$$ 
that the integral 
$$\int\int_D\ f(x,y) dx\ dy = \int\int_{D^*} f(x(u,v), y(u,v)) du\ dv$$
However, it is easy to find cases where this does not hold. For example, take
$$T(u, v) = (-u^2 + 4u, v)$$ 
Over the unit square 
The area of the unit square is obviously 1, but after applying the change of basis function T, we arrive at the area below

Please note that in this graph the x intercept is 3, making the area captured under the new basis 3. You can prove this to yourself by seeing that
$$T(1,1) = (-1 + 4, 1) = (3, 1)$$
And similarly for the other coordinates in the unit square. Showing that 
$$\int\int_D f(x,y) dx dy \neq \int\int_{D^*} f(x(u,v), y(u,v)) du dv$$
What this really shows is that our intuition was wrong, we need to multiply the result of our coordinate change by some value indicating the sensitivity of the points to the change. This is known as the Jacobian. More specifically, we multiply by the determinant of the Jacobian matrix. Please take a look at the bottom of this post to learn about the man himself, Carl Gustav Jacob Jacobi.
The Jacobian is defined as 
$$\frac{\delta\ x_i}{\delta\ y_i}\ \forall\ i$$
That is, the derivative over all your change of basis functions with respect to each new basis. The result of this is a matrix, but we are looking for a single value to multiply our integral by so we take the determinant. The determinant of a matrix of values 
$$a_{(0,0)}, a_{(1,0)}, a_{(0,1)}, a_{(1,1)}$$ 
Where subscripts indicate the (row, col) position in the matrix is
$$a_{(0,0)} *a_{(1,1)} - a_{(1,0)} * a_{(0,1)}$$
Above is the most well known application to the Jacobian matrix. However, the Jacobian is really just a set of derivatives that indicate the sensitivity of functions to changes in their input values. In fact, the Jacobian matrix comes up in Machine Learning when using backpropagation to calculate the loss of a particular component in a modular neural network.
In the modular network above, note that minimizing the error with respect to w requires finding the intermediary derivative of each y_i with respect to the inputs z_i. Since the Jacobian of all y_i with respect to all x_i tells us how sensitive y is to changes in inputs x to our network, we can use it to say that any errors are approximately the sum of the Jacobian over all y_i with respect to x_i evaluated at x_i. We can think of this as a measure of the sensitivity of the error with respect to the inputs. For a more detailed explanation please see Pattern Recognition And Machine Learning by Christopher M. Bishop, page 247, section 5.3.4., from which I took this example and the above image. 
The Jacobian matrix itself is interesting in the information it provides and its applications, but before looking at it in more detail I knew nothing about its creator, Carl Gustav Jacob Jacobi. A quick look at the wikipedia page shows he was an impressive Mathematician and part of a family of distinguished people. His older brother Moritz Von Jacobi contributed to physics and engineering and Carl was home schooled by his uncle until the age of 12. At 12 he was moved to a private school, and in half a year was put into senior level courses. The only reason he didn't go to college immediately is that the universities were not accepting applicants under 16 years old. Before going to college, by now bored by his education, he tried to solve the quintic equation by radicals (whatever that is). At 21 years old he was lecturing on the theory of curved surfaces at the University of Berlin. On the other hand, when I was 21 I was either learning or forgetting about the Jacobian matrix in my multivariate calculus course.

graph 1 (unit square) - source
graph 2 - wolfram alpha
example - Vector Calculus 5th edition by Marsden and Tromba 
modular network image - Pattern Recognition and Machine Learning - Bishop pg. 247
Jacobian application - Pattern Recognition and Machine Learning - Bishop pg. 247

No comments:

Post a Comment