$$ \def\v{\vec} \def\mean#1{\langle#1\rangle} \def\abs#1{,|#1|,} $$

CS434

GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.

Fall 2016: Unsupervised and Reinforcement Learning in Neural Networks

[TOC]

Unsupervised learning

learning : act of acquiring new/ modifying(reinforcing existing knowledge/behaviors/skills
long term memory
- explicit (declarative) : episodic (events), semantic (concepts)
- implicit : procedural (skills), emotional conditioning, priming effect, conditioned reflex
types of learning
- supervised : labeled data
- unsupervised : no explicit teachings, concept formulation, detect intrinsic structure
- reinforced : choice of actions, reward upon correct one
synaptic plasticity : change in connection strengh
neuron models
- compartmental models
- cable models
- point models
- Hodgkin-Huxley : biologically realistic, non-linear, computationally expensive, analyticially intractable
- integrate and fire models
  - time constant : $\tau=RC$
- Hebbian learning : cell $j$ repeatedly takes part in firing cell $i$, then $j$ to $i$ efficency is increased, local rule, correlations
synaptic dynamics
- induction of changes
  - fast (if stimulated appropriately)
  - slow (homeostasis)
- persistence of changes
  - long : long-term potentiation/depression
  - short : shot term plasticity
- functionality : useful for
  - learning new behaviour
  - development (wiring receptive field)
  - activity control in network
  - coding
unsupervised vs reinforcement

Hebb's rule

Hebbian learning : rate-based theory
- rate : $v_i^{post}=g(I_i)=g(\sum_j w_{ij}v_j^{pre})$
- weight : $\Delta w_{ij}\propto F(pre,post)$
- fixed rate vs jointly varying rate
Hebian rate-based model
- BCM : lead to specialized neurons, developmental learning, postsynaptic term nonlinear (e.g. $\vartheta =f(\bar v_i^{post})$)
synaptic changes for development
- initial random connections : unselective neurons
- correlated input : output neurons
- matrix : correlation $C_{kj}=\mean{x_kx_j}=\frac{1}{p}\sum_{\mu=1}^px_k^\mu x_j^\mu$ to covariance $C^0_{kj}=\mean{(x_k-\mean{x_k})(x_j-\mean{x_j})}$

Principal component analysis

PCA : orthogonal transformation to convert correlated variables into linearly uncorrelated variables
- subtract mean : $\v x-\mean{\v x}$
- calculate covariance matrix : $C_{ki}^0=\mean{(x_k-\mean{x_k})(x_j-\mean{x_j})}$
- calculate eigenvectors : $C^0\cdot\v e_n=\lambda_n\cdot \v e_n$ with $\lambda_1\le\cdots\le\lambda_n$ (keep only first component)
- neural implementation : need more than hebbian learning
  - encode patterns in firing rates
  - compute co-variance matrix $C$ of inputs
  - compute eigenvectors of $C$
- all direction : equally likely
Oja rule

Independent component analysis

ICA : find natural axis
- renormalize : mean zero, unit variance
- whiten : PCA, divide each component by $\sqrt{\lambda^n}$
signal : mixed $x=As$ recover through $s\approx y=Wx$
non-gaussianity : maximise $J(y)=[E(F(y))-E_{Gauss}(F(v))]^2$
- $J(w)=[\gamma(w)-a]^2$ : $\frac{\partial J}{\partial w}=0\iff\frac{\partial\gamma}{dw}=0$
- $K(y)=E(y^4)-3E(y^2)^2$
- gradient ascent : $\Delta w_n=\eta E[x_n g(\sum_k w_kx_k)]$ for $J(\v w)=E[F(\v w^\top\v x)]=E[F(y)]$
kurtosis : of $y=\v w\v x$ is at local max if project $\v w$ in direction of independent components
ICA gradient ascent
- batch : $\v w^{new}=\v w^{old}+\eta E[\v g(\v w^\top\v x)]$
- online : $\Delta\v w=\eta\v x g(\v w^\top\v x)$
- online-components : $\Delta w_{ij}=\eta x_j g_i(\sum_k w_{ik}x_k)$
newton methods
fast ICA
- initialize : $w$
- newton step : $\v w^{new}=E[\v x g(\v w^\top\v x)-\v w g'(\v w^\top \v x)]$
- renormalize : $\v w=\frac{\v w^{new}}{\abs{\v w^{new}}}$
- loop if not converged
temporal ICA : find unmixing matrix $W$ st. output time-independent
- $\mean{y_i(t)y_k(t-\tau)}=\delta_{ik}\lambda(\tau)$ for all delays
- $\mean{y_i(t)y_k(t)}=\delta_{ik}$
application : localized edge detectors receptive fields development (gabor wavelets)
hebbian learning = unsupervised learning
- in linear neurons : PCA
- in nonlinear neurons : ICA

Clustering

network wiring is adaptive, not fixed
prototype : $w_k$
closest rule : $\abs{\v w_k-\v x}\le\abs{\v w_i-\v x};\forall i$
clustering : discretization, compression, quantization
reconstruction error : mean $E=\sum_k\sum_{\mu\in C_k}(\v w_k-\v x^\mu)^2$
k-means online :
- determine winner $k$ : $\abs{\v w_k-\v x^\mu}\le\abs{\v w_i-\v x^\mu};\forall i$ (network dynamics)
- update this prototype : $\Delta\v w_k=-\eta(\v w_k-\v x^\mu)$ (Ojas hebbian learning)
- reduce : $\eta$ (e.g. $1/N_k$, developmental change)
k-means batch
- for every datapoint find closest
- update weights of all prototypes : $\Delta\v w_k=-\eta\sum_{\mu\in C_k}(\v w_k-\v x^\mu)$
equivalence : batch and online same for small learning rate
neuronal implementation
- equivalence : assuming weights normalized
  - $\v w_k^\top\v x^\mu\ge\v w_i^\top\v x^\mu;\forall i$
  - $\abs{\v w_k-\v x^\mu}\le\abs{\v w_i-\v x^\mu};\forall i$
- dead units : due to local minima, initialize weights to data samples and update winners with neighboring losers
- winner take all : winner excitated itself, inhibit others, add noise
- model

Kohonen networks

Kohonen's model : topographical feature map, spatially bounded excitation zone, input space mapped onto cortical space
- model
- 2D to 1D : travelling salesman problem
- 2D to 2D
Kohonen algorithm
- choose input : $\v x^\mu$
- determine winner $k$ : $\abs{\v w_k-\v x^\mu}\le\abs{\v w_i-\v x^\mu};\forall i$
- update weights of winner and neighbors : $\Delta \v w_j=\eta\Lambda(j,k)(\v x^\mu-\v w_j)$
- decrease size of neighborhood function
- iterate until stabilization
elastic net algorithm
- choose input
- calculate relative importance of $j$ : $\alpha(\abs{\v x^\mu -\v w_j})$
- update everybody : $\Delta \v w_j=\eta\alpha(\abs{\v x^\mu-\v w_j})(\v x^\mu-\v w_j)+\sum_{k\in T_j}(\v w_k-\v w_j)$ with $T_j$ neighbors in input space
- iterative until stabilization
- importance factor : $\alpha(\abs{\v x^\mu -\v w_j})=\frac{\exp(-\gamma\abs{\v x^\mu-\v w_k}^2)}{\sum_k\exp(-\gamma\abs{\v x^\mu-\v w_k}^2)}$

Receptive field development

self-organization of cortex
- neurons have localized receptive fields (RF)
- neighboring neurons have similair RF
- RC and neighbors change with experience
simple cells and RF as Gabor filters
- sensitivity : edge detection
complex cells and energy model
- sensitivity : oriented bars, rough position, invariant
- classification
- stimulation : light spot or gabor-like wavelet
position invariant cells in inferotemporal cortex (IT) : side (bottom) part of ventral way
hebbian learning and RF development
localized receptive fields with orientation tuning : hebbian learning + linear neurons + arborzation function

Long-term potentation

spike-based hebbian learning
spike-time dependent plasticity by local variable
induction rule : abs
- depression
- potentiation
STDP model yields BCM

Reinforcement learning

conditioning : Pawlow's dog
RL theory
exploration exploitation dilemma
- explore : estimate reward probabilities
- exploitation : take action which looks optimal to maximize reward
strategy/policy
- greedy : best action $Q(s,a^*)>Q(s,a_j)$
- $\epsilon$-greedy : best action with probability $P=1-\epsilon$
- softmax : action with probability $P(a')=\frac{\exp(\beta Q(a'))}{\sum_a\exp(\beta Q(a))}$
  - temperature : $T=1/\beta$
- optimistic : start with q-values that are too big
multi-step horizon
Bellman equation : $Q(s,a)=\sum_{s'}P_{s\to s'}^a(R_{s\to s'}^a+\gamma\sum_{a'}\pi(s',a')Q(s',a'))$
sarsa
- initialize Q-values, start from initial state
- in state $s$, choose action $a$ according to policy
- observe reward $r$ and next state $s'$
- choose action $a'$ in state $s'$ according to policy
- update Q-values
- iterate
sarsa convergence in expectation : when $\mean{\Delta Q(s,a)}=0$
- to $Q(s,a)=\sum_{s'}P_{s\to s'}^aR_{s\to s'}^a$ if $\Delta Q(s,a)=\eta[r-Q(s,a)]$
- to Bellman eqs if $\Delta Q(s,a)=\eta[r-(Q(s,a)-\gamma Q(s',a'))]$
v-values
Q-learning : encode piece of trajectory, no eligibility trace needed
off-policy : another policy for update (e.g. greedy)
eligibility traces : at the moment of reward, update also previous action values along trajectories
continuous state spaces : $Q(s,a)=\sum_j w_{aj}\Phi(s-s_j)$
temporal difference (TD) sarsa algo
RL with linear function approximation
- when state-spaces large or continuous
- generalize between similar states
- restricted to linearly separable state space dimensions
- often fails (local minima, diverges) when complex or unknown structure of state space
- value function as neural nets : trained with SGD (supervised)
experience replay : learning works best on uncorrelated data so store all state, action, reward transitions and train by randomly sampling from list
network multiplexing : solve moving target problem by keeping the target constant for some time
action values Q vs state values V : risk-seeking/aversive
when to avoid TD methods (Q-learning, sarsa) : hard-to-tune, diverge, good for fully observable markov processes

Policy gradient methods

policy gradient : forgot Q-values, optimize directly reward for an action
- associate actions with stimuli in a stochastic fashion
- change policy parametrically using gradient
- optimize total expected reward : $J=\mean{\mean{R}_a}_s=\sum_sP(s)\sum_a\pi(a\mid s)R(a,s)$ with stimuli $s$ (states implicitly)
log-likelihood trick
policy gradient estimation : sample average as Monte Carlo approximation $\Delta w_i=\eta\frac{\partial}{\partial w_i}\mean{\mean{R}_a}s\approx\eta\frac{1}{N}\sum{n=1}^N\frac{\partial}{\partial w_i}\pi(s_n)R(a_n\mid s_n)$
biology gradient
matching law : average reward is same on multiple source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cs434.md

cs434.md

CS434

Unsupervised learning

Hebb's rule

Principal component analysis

Independent component analysis

Clustering

Kohonen networks

Receptive field development

Long-term potentation

Reinforcement learning

Policy gradient methods

Files

cs434.md

Latest commit

History

cs434.md

File metadata and controls

CS434

Unsupervised learning

Hebb's rule

Principal component analysis

Independent component analysis

Clustering

Kohonen networks

Receptive field development

Long-term potentation

Reinforcement learning

Policy gradient methods