This is an old version of the compendium, written Aug. 12, 2014, 1:19 p.m. Changes made in this revision were made by stiaje. View rendered version.

TDT4195: Visual Computing Fundamentals

# The Gaussian First things first. You're going to see the Gaussian appear all over this course, and especially in the image processing part. You might as well learn it by heart from the get-go. The Gaussian in one dimension: $$ N(x,\sigma) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{x^2}{e^{2\sigma}}} $$ The Gaussian in two dimensions: $$ N(x,y,\sigma) = \frac{1}{\sigma^2 2\pi} e^{-\frac{x^2 + y^2}{e^{2\sigma}}} $$ We can see from this that the Gaussian is separable, yay! This means that we typically apply two one-dimensional gauss filter operations (one in the x direction and one in the y direction) instead of a two-dimensional gauss directly over the entire image. # Graphics ## Quarternions Quarternions are basically 4D vectors that are useful for doing rotation. The rotation quarternion for a rotation of $ \theta $ radians around an axis defined by a unit vector $ \vec{u} $ going through origo, is: $$ q = \cos{\frac{\theta}{2}} + \vec{u} \sin{\frac{\theta}{2}} $$ To rotate a point $p$ to a rotated point $p'$ by the quaternion $q$, do: $$ p' = qpq^{-1} $$ $$ q^{-1} = e^{ -\frac{1}{2} \theta ( \vec{u}_x i + \vec{u}_y j + \vec{u}_z k ) } = \cos{\frac{\theta}{2}} - \vec{u} \sin{\frac{\theta}{2}} $$ # Digital Image Processing ## Typical image processing steps Image aquisition -> Image enhancement -> Image restoration -> Morphological processing -> Segmentation -> Representation and description -> Object recognition. ## The Human Eye The human eye has two types of receptors, cones and rods. A typical eye has 6-7 million cones, each connected to a dedicated nerve end. Cones enable color vision. It also has 75-150 million rods, several connected to one nerve end. Rods allows log light sentisivity. ## Sampling and Quantization Sampling and quantization used when converting a stream of continuous data into digital form. Formalized: A continuous function $f(t)$ is to be sampled every $T$ steps of $t$ (which typically represents time). We say that $T$ is the *sampling interval*. The sampled function is then the sequence of values $f_n = f(nT), n \in \mathbb{N}$. Images are typically represented as digitized streams of two dimenstions, x and y. ### Sampling Theorem

When a stream contains higher frequencies than the sampling frequency can handle, unwanted artifacts known as aliasing are produced. The Nyquist-Shannon sampling theorem formalizes this: $$f_s \geq 2 \cdot f_{max}$$

This implies that sampling should be performed with a frequency twice as large as the highest frequency that occurs in the signal to avoid aliasing. ## Image enhancement Image enhancement typically aims to do things like: noise removal, highlight interesting details, make the image more visually appealing. There are two main categories of techniques: spatial domain techniques, and transform domain techniques. ### Histograms Histograms of an image provides information about the distribution of intensity levels of an image. Both global (entire image) and local (parts of the image) histograms are useful. ### Spatial domain enhancement techniques Involves direct manipulation of pixels, with or without considering neighboring pixels. Spatial image enhancement techniques that do not consider a pixel's neighborhood are called intensity transformations or point processing operations. Intensity transformations change the value of each pixel based on its intensity alone. Examples include: image negatives, contrast stretching, gamma transform, thesholding/binarization. #### Neighborhood A neighborhood, informally, consists of the pixels close to a given pixel. Formally: $\delta(i,j) = \left\{(k,l) | 0 \lt (k-i)^2 + (l-j)^2 \le c; k, i \in \mathbb{N} \right\}$. #### Spatial Filtering A spatial filter exists of a neighborhood, associated weights for each pixel in the neigborhood, and a predefined operation on the weighted pixels. When the weights sum to 1, the gray value is not changed. ##### Smoothing We can make an averaging spatial filter to smooth an image. Consider an square 8-neighborhood, and the following weights: +-----+-----+-----+ | _1_ | _1_ | _1_ | | 9 | 9 | 9 | +-----+-----+-----+ | _1_ | _1_ | _1_ | | 9 | 9 | 9 | +-----+-----+-----+ | _1_ | _1_ | _1_ | | 9 | 9 | 9 | +-----+-----+-----+ This results in a smoother version of the image, which reduces noise. This averging filter is also known as the box-filter. The averaging spatial filter is a linear filter. An example of a popular non-linear smoothing filter is the median filter. The median filter sets a pixel to median of itself and its neighbors. #### Convolution #### Sharpening We use Laplace for this. TODO: write about this. ### Transform domain enhancement techniques Involves transforming the image into a different representation. Examples of transforms include fourier transforms and wavelet transforms. #### Frequency domain Filtering can be done in the frequency domain. We use the discrete fourier transform to enable this. The Discrete Fourier-Transform (DFT) is defined as: $$ F(u) = \sum\limits_{i=1}^{M-1} f(x) e^{-i 2 \pi \frac{ux}{M}}, f(x): x \in [0, M-1] $$ The DFT is reversible, and the Inverse DFT (IDFT) looks like this: $$ F(x) = \frac{1}{M} \sum\limits_{i=1}^{M-1} F(u) e^{+i 2 \pi \frac{ux}{M}} $$ Working on single pixels Neighbourhoods of pixels Filters