# CS-434: Unsupervised and Reinforcement Learning in Neural Networks

# Introduction

This compendium is made for the course CS-434 Unsupervised and Reinforcement Learning in Neural Networks at École Polyteqnique Fédérale de Lausanne (EPFL) and is a summary of the lectures and lecture notes. Is is not the complete curriculum, but rather a list of reading material.

## Prerequisites

### Eigendecomposition

Compute eigenvalues and eigenvectors

### Matrix operations

#### Positive semi-definite:

# Unsupervised Learning

**Synaptic Plasticity**: In neuroscience, synaptic plasticity is the ability of synapses to strengthen or weaken over time, in response to increases or decreases in their activity

## Hebbian learning

When an axon of cell

### Rate-based Hebbian Learning

active = high rate = many spikes per second

### Oja rule (1989)

Detects first principal component.

### Spike time dependent learning window

Function that maps distance between pre and post to how they wire together.

## Component Analysis

### Principal component analysis

- Subtract mean
- Calculate covariance matrix
- Find eigenvalues and corresponding vector
- The vector with the greatest eigenvalue points in the direction of the principal component
$FinalData = RowFeatureVector \times RowDataAdjust$ $RowOriginalData = (RowFeatureVector^T \times FinalData) + OriginalMean $

### Independent coponent analysis

- Subtract mean
- Whitening (zero variance in all dimensions)
- Change to coordinates of maximum variance: PCA
- Normalize: divide each com,ponent by
$\sqrt{\lambda^n}$

#### Independence

Normalized gaussian distribution has no preferred axis.

#### Kurtosis

The classical measure of non-Gaussianity is kurtosis or the fourth-order cumulant. The kurtosis of

Since

and we know that the gaussian has fourth moment

#### Temporal ICA

Find unmixing matrix

## Clustering

### K-means

- Determine winner
$k$ $\| \vec{w}_k - \vec{x}^{\mu} \| \leq \| \vec{w}_i - \vec{x}^{\mu} \|$ for all$i$

- Update winner
$\Delta \vec{w}_k = \eta ( \vec{x}^{\mu} - \vec{w}_k )$

### Kohonen maps

Kind of an extension of k-means.

**Learning rule:**

# Reinforcement Learning

Reinforcement learning = Hebbian learning + reward

## Bellmann equation

Exploration vs. expoitation dilemma.

**On-policy**: The same policy is used to select the next action and to update the Q-values.**Off-policy**: The action is selected, using policy A (e.g. soft-max). The Q-values are updated using policy B (e.g.$\epsilon$ greedy).

## SARSA (on-policy)

**TD**: Temporal Difference**Eligability traces**: A mixture between TD and Monte Carlo methods.

## Q-learning (off-policy)

## Policy Gradient

Forget Q-values. Optimize directly the reward for an action.

Stimuli/observations

**When not to use TD algorithms**:

- Continous state spaces are need hard-to-tune function approximations
- Can diverge even if fully observable using function approximations
- Continous action are difficult to represent using TD algorithms