CS231n- 4강

from http://cs231n.stanford.edu/

CS231n

: 각 연산을 node로 나타내서 복잡한 연산의 모든 절차를 그래프로 표현하는 방법

df/dx = df/dq * dq/dx

Add gate :gradient distrubutor

Q : max gate?

A : gradient router (gradient/0)

Q: mul gate?

A: gradient switcher(둘이 바꿔준다)

input이 vector라면?
local gradient - Jacobian matrix
- derivative of each element of z w.r.t each element of x )
- 벡터 미적분학에서, 야코비 행렬(영어: Jacobian matrix)은 다변수 벡터 함수의 도함수 행렬이다. 야코비 행렬식(영어: Jacobian determinant)은 야코비 행렬의 행렬식이다. (https://ko.wikipedia.org/wiki/%EC%95%BC%EC%BD%94%EB%B9%84_%ED%96%89%EB%A0%AC)

Q. 4096 dimention의 output vector jacobian matrix’s size?

A. 4096 * 4096

Q. size 100의 minibatch를 사용하면 Jacobian matrix의 사이즈는?

A. 409600 * 409600 - 더 큰 크기의 행렬이 된다.

Q. 첫 번째 질문 상황일 때 Jacobian matrix는 어떻게 생겼는가?

A :diagonal( 대각행렬)

Summary

Nueral Network

Before

Linear score function : f=Wx

Now (activiation function)

2-layer Neural Network : f=W2 max(0,W1x)