# TMA4240: Statistics

## Words and terms and stuff

*Intersection* | Snitt | $\cap$

The intersection of two events

If you're familiar with logic, think of

Example: Let

*Mutual exclusion* | disjoint sets | disjunkte mengder

Two events

*Union* | $\cup$

The union of two events

If you are familiar with logic, think of

### Balanced object | Balansert objekt

An object (e.g. a die or a coin) is balanced if all possible outcomes are equally likely when it is rolled/flipped/thrown.

### Discrete sample space | Diskrete utfallsrom

A discrete sample space contains either a finite number of possibilities or an unending sequence with as many elements as there are whole numbers.

Typically represents count data, such as the number of heads after

### Continuous sample space | Kontinuerlig utfallsrom

A continuous sample space contains an infinite number of possibilities equal to the number of points on a line segment. Note that between any two different points on a line segment there are an infinite number of points.

Typically represents measured data, such as height or weight.

### Degrees of freedom | Frihetsgrader

"The degrees of freedom is the number of values in the final calculation of a statistic that are free to vary."

It is the quantity

It could also be the parameter

### Population | Populasjonen | Univers

The population (often of an experiment) is the totality of observations with which we are concerned.

The number of observations in the populations is defined to be the size of the population.

Each observation in a population is a value of a random variable

### Sample | Utvalg

A sample is a subset of a population.

### Statistic

A statistic is a function of the random variables that constitute a random sample.

### Little blue book | Tabeller og formler i statistikk

A little blue book titled "Tabeller og formler i Statistikk", available from Akademika, which one is allowed to bring to this course's exam. Its ISBN is 978-82-519-1595-3.

Note that the one used at NTNU is a 2011-revision by NTNU's IMF institute of the original 2001-edition by Kvaløy, Jan Terje - Tjelmeland, Håkon.

*Combinatorics* | Kombinatorikk

*Multiplication rule* | Rule of product | Multiplication principle

More information: Wikipedia

"The fundamental principle of counting," states that if an operation can be performed in

*Permutation* | Permutasjon | $_{n}P_{r}$

More information: Wikipedia

A permutation is an ordered arrangement of all or part of the members of a set.
Example: The set

A set containing

Example:

**Used when/for**: Selecting people (distinct objects) from a group/set,

A *circular permutation* is an circular arrangement of objects. There are

Distinguishing things. If a set consists of

*Combination* | Kombinasjon | $_{n}C_{r}$

More information: Wikipedia

"A combination is an unordered permutation." - Lao Tze-Te, fictional chinese statistician/tea-leaf interpreter.

The number of combinations of *permutations* (

In general, when one is calculating a series of possible combinations (poker hands, anyone?) the order is irrelevant **but** each selection of

**Note**: While a combination ignores the elements' ordering, a series of selections from the same source (i.e. a card deck) has to be calculated from most specific to least specific, as the more specific selections tend to be subsets of the least specific selections. For example, any specific combination of

Example: You want to pull some cards out of a standard issue playing card deck at random without caring about the order. A standard card deck has

What if we wanted both cards to be of the same suit, but didn't care about their rank?

What if we wanted one of the cards to be a spade but didn't care about the other cards' suit? There are

But what if we want one spade, and one non-spade?

**Used for/when**: Playing cards (poker hands and such), tilfeldig trekk uten tilbakelegging.

*Probability* | Sannsynlighet

The probability of an event,

If each and every possible outcome is equally likely, then the probability of a specific event (

Example: What's the probability of drawing a spade from a deck of cards? There are

The probability that at least one of two events occurs is denoted by

*Complementary events* | Complement

### Additive rule | addisjonssetningen

If

However if

**Additive rule for three events**,

## Random Variables and Probability Distributions

*Random Variable* | stochastic variable | stokastisk variabel

A **random variable** is a function that associates a real number with each element in the sample space.
Think of it as a *variable* in an experiment: if we sample a population (e.g. we select a bunch of people from Oslo at random), their height, weight, eye colour and feelings towards fish are all random variables.
Anything that can vary between sample points can called a random variable.

In this course, random variables are usually denoted by capital letters (e.g.

Example: The sample space describing the possible outcomes of flipping a balanced coin -- that is a coin which is equally likely to land as heads or tails when flipped -- is

*Discrete Probability Distributions* | Diskrete sannsynlighetsfordelinger

"A discrete random variable assumes each of its values with a certain probability."

Example: If we toss a coin three times and denote

#### Probability function | Probability mass function | Probability distribution | Sannsynlighetsfordeling

The set of ordered pairs

$f(x)\geq 0$ $\sum_{x}f(x) = 1$ $P(X = x) = f(x)$

But what does this mean?

And the points on the list means:

- The probability of
$X$ assuming any possible$x$ is equal to or greater than zero. Negative probability is not covered by this course., so we assume it's impossible. - The probability of
$X$ not assuming any of the possible values$x$ is$0$ ; If we add up the probability of$X$ assuming each and every possible value$x$ the answer should be$1$ . In other words$X$ should not be able to assume any value not covered by$x$ , nor should the sum of the probability of each possible outcome exceed$1$ . $f(x)$ is shorthand for$P(X = x)$ . We could use$g(x)$ instead, or whatever you want to. Typically the corresponding lower case letter is used if a random variable is denoted by a capital letter.

**Note**: in the book,

Example: Susie receives a shipment of

##### Conditional probability and other stuff in within the probability distribution expression

**Note that this is NOT the same as conditional distribution**

What do you do if you're asked to find

#### (discrete) Cumulative distribution function | (diskret) Kumulativ fordelingsfunksjon

What if we want to know the probability of our random variable assuming a value less than or equal to some possible value of

The cumulative distribution function

Example: Returning to our friend Susie in the previous example (the gal with the bananas and apples), what is the probability of Susie selecting

### Continuous Probability Distributions | Kontinuerlig sannsynlighetsfordeling

"A continuous random variable has a probability of *exactly* any of its possible values."

Wait, what?
Assume we have a device which allows us to accurately measure the weight of an object in kilograms to twenty decimal places.
Further, assume this device is being used to weigh every single adult above *exactly*

The values of continuous random variables are plotted as a graph. The probability that the random variable assumes some value within a given range is equal to the area beneath the graph within the range.

#### Probability density function (pdf) | density function | tetthetsfunksjon

The function

$f(x) \geq 0\text{, for all } x \in \mathbb{R}\text{.}$ $\int_{-\infty}^{\infty}f(x)dx = 1\text{.}$ $P(a < X < b) = \int_{a}^{b} f(x)dx\text{.}$

Which means

- Negative probabilities are not covered by this course.
- It is not possible for
$X$ to assume a value outside the scope of$f(x)$ . - When we write
$P(a < X < b)$ it is shorthand for$\int_{a}^{b} f(x)dx$ .

**Note**: When

#### (continuous) Cumulative distribution function | (kontinuerlig) Kumulativ fordelingsfunksjon

The cumulative distribution function

Which allows us to write

*Joint Probability Distributions* | Simultan sannsynlighetsfordeling

#### Joint probability distribution (discrete case) | Simultan sannsynlighetsfordeling (diskret tilfelle)

For two discrete random values **joint probability distribution** iff.

$f(x,y) \geq 0$ for all$(x,y)$ ,$\sum \sum f(x,y) = 1,$ $P(X = x, Y = y) = f(x,y)$

#### Joint density function (continuous case)| Simultan tetthetsfunksjon (kontinuerlig tilfelle)

For continuous random values **joint density function** iff.

$f(x,y) \geq 0$ for all$(x,y)$ ,$\int \int f(x,y) = 1,$ $P(X = x, Y = y) = f(x,y)$

#### Marginal distributions | marginalfordeling

The marginal distribution of

In other words, if we have a joint probability distribution or joint density function we can obtain the probability function or distribution for either of the random variables by summing or integrating over the values of the other random variable.

The marginal distributions of

### Expected value of a random variable | mean | forventningsverdi | $\mu$ | $E(X)$

Also known as: mean of the random variable

Let

What if we have a random variable that depends on the random variable

#### Expected value of random variables with joint probability distribution

If

**Note**: if you are asked to find

#### Expected value/mean of linear combinations of random variables

If

Also:

*Variance* | varianse | $\sigma^{2}$

The variance of the random variable

The variance of a random variable

#### Variance of the random variable g(X)

If

*Standard Deviation* | standardavvik | $\sigma$

The standard deviation of a random variable

It shows how much variation exists from the mean/expected value. A low standard deviation indicates that the data points tend to be very close to the mean, while a high standard deviation indicates that the data points are spread out over a large range of values.

*Covariance* | kovarianse | Cov(X,Y)

The covariance of two random variables

*Statistical indepedence* | uavhengighet

## Discrete Probabaility Distributions | Diskret sannsynlighetsfordelinger

A note on probability distributions and cumulative probability distributions:

Most often we are asked to find

However, the formulas and definitions given for the various probability distributions are almost always of the type

### The Bernoulli Process

More information: Wikipedia

*Bernoulli was a 18th century Swiss science-guy after whom the Bernoulli Process is named. Bernoulli's encounters with women were the first Bernoulli trials, and he'd divide them into two categories: "I totally hit that" and "haven't hit it yet". Today, it is more common to use a more PC example when introducing students to the Bernoulli Process, such as a coin flip.*

Experiments often consist of repeated trials, each with two possible outcomes that could be labeled success or failure; **Bernoulli process**.
Each trial (e.g. coin flip, drawing a ball at random and then putting it back) is called a **Bernoulli trial**.
The Bernoulli process is a finite or infinite sequence of binary random variables.
A common example of a Bernoulli process is repeated coin flipping (possibly with an unfair coin, but consistent unfairness).
Drawing from a deck of cards is a Bernoulli process if the outcome is either success or failure (e.g. we could be interested in drawing a spade) and the cards are replaced between each selection, so that the probability of drawing a given card remains the same between draws.

Strictly speaking, a Bernoulli process is a finite or infinite sequence of independent random variables

- For each
$i$ , the value of$X_{i}$ is either$0$ or$1$ .- (The experiment consists of repeated trials with the outcome
$X_{i}$ which is either$0$ or$1$ , i.e. the outcome is binary. We are free to redefine$1$ as "success" and$0$ as "failure" or$1$ as "vagina" and$0$ as "cats".)

- (The experiment consists of repeated trials with the outcome
- For all values of
$i$ , the probability that$X_{i} = 1$ is the same number$p$ .- (The probability of either of the two possible outcomes remain constant from trial to trial: this implies that the trials are independent.)

### Binomial Distribution | Binomialfordeling | $b(x;n,p)$

A binomial distribution is the probability distribution of the discrete random variable

The probability distribution of the binomial random variable

Note that

#### Cumulative distribution function for Binomial distributions | binomial sums

We can find

#### Mean and variance of binomial distributions

### Multinomial distribution | multinomisk fordeling

If we allow the trials of a binomial experiment to have more than two possible outcomes, it becomes a multinomial experiment.

If a given trial can result in one of the

### Hypergeometric Distribution

Hypergeometric distributions are kind of like binomial ones except that they don't require independence between trials, which means that we can do stuff like draw from a deck of cards without replacing the card (and shuffling the deck) between each draw.

A hypergeometric distribution is the probability distribution of a hypergeometric random variable,

- A random sample of size
$n$ is selected without replacement from$N$ items - Of the
$N$ items,$k$ may be classified as successes and$N-k$ are classified as failures. As previously stated, the hypergeometric random variable$X$ is the number of successes in such an experiment.

The values of a hypergeometric distribution are denoted by

#### Mean and variance of a the hypergeometric distribution

### Negative binomial distribution | negativ-binomisk fordeling

The negative binomial distribution is the probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occurs. Or the number of failures before a specified number of successes occur.

For example, we could throw a six sided die until we get a five for the third time. The probability distribution of the number of non-fives that we got will be negative binomial.

If repeated independent trials can result in success with probability

Example: Team

If we wanted to find the probability of team

### Geometric distribution | Geometrisk fordeling

The geometric distribution is a special case of the negative binomial distribution where

The values of the geometric distribution is denoted by

Example: We are flipping a balanced coin. What is the probability that we don't get heads until the fifth flip?

#### Mean and variance of a geometric distribution

Note that the mean and variance of a random variable following the geometric distribution are:

### Poisson Distribution

The Poisson distribution expresses the probability of a given number of events occuring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

Given only the average rate of some event occuring for a certain period of observation (for example that Bob gets punched four times a day) and assuming that the process that produces the event is essentially random, the Poisson distribution specifies how likely it is that Bob will get punched 2, or 5, or 10, or any other number, during one period of observation.

The probability distribution of the Poisson random variable

Note that the little blue book uses

#### Variance and mean of the Poisson distribution

## Continuous Probability Distributions | kontinuerlige sannsynlighetsfordelinger

### Continuous Uniform Distribution | Kontinuerlig uniformfordeling

### Normal Distribution | Gauss distribution | Normalfordeling | Gaussfordeling

The normal distribution is super double important.

The density of a normal random variable

Did you also notice how the table of values for the normal distribution in the little blue book only covers

Tasked with finding the probability that the normal random variable

If tasked to find

**Note**: This "formula" is in the little blue book on page 31 (the map of relations between probability distributions), above the arrow going from Normal(

#### Normal approximation to the Binomial

If

In other words, if you have a binomial random variable

**Continuity correction**:

(Long version) Note that since continuous random variables have a probability of

Check out page 189-90 in the book (9e) for an explanation with pictures.

(Short version) if

**Quality of the approximation**:

If

### Gamma Distribution | Gammafordeling | $\Gamma$

The continuous random variable

The mean and variance of the gamma distribution are

(Note that

The gamma function is defined by

Properties of the gamma function

So yeah, the Gamma function is pretty dope and you better hope we don't have to do any calculations with it by hand.

### Exponential distribution | Eksponensialdistribusjon

The exponential distribution is a special case of the gamma distribution where

The continuous random variable

The mean and variance of the exponential distribution are

The exponential function is pretty closely related to the Poisson process. This is apparently super useful. Check out page 196.

### Chi-Squared Distribution | Kjikvadratfordeling | $\chi^{2}$ -distribution

The Chi-Squared distribution is another special case of the gamma distribution where

The continuous random variable

The mean and variance are

#### Degrees of freedom and the chi-squared distribution

For every piece of information we estimate based on supplied information, a degree of freedom is lost. In this course it would seem like we never lose more than one degree of freedom, and the whole thing is succinctly summarized on page 27 of the little blue book.

## Functions of Random Variables

### Transformation of Variables

Suppose that **discrete** random variable with probability distribution

What does this mean?

If we have a discrete random variable

Suppose that **discrete** random variables with joint probability distribution

**Examples of one-to-one transformations**

Example:
Let

### Moment-generating functions | Momentgenererende funksjoner

The moment-generating function of a random variable is an alternative specification of its probability distribution.

The moment-generating function of the random variable

**Neat properties of moment-generating functions**
If

#### What is moment?

More information: Wikipedia

"Loosely speaking it is a quantative measure of the shape of a set of points."

Moments are ordered.
The function

The

Beyond that it's all a bit diffuse. The course only covers the formula for moment-generating functions.

## Sampling Distributions

### Random sample | tilfeldig utvalg

Random sampling means that a subset of the population is selected at random to elminiate any possibility of bias in the sampling procedure.

Let

Since the

If

#### Sample mean | Gjennomsnitt

The sample mean is the numerical average of the observations in a sample

#### Sample median

The sample median is the middle value of the observations in a sample.
It is obtained by sorting the values in either ascending or ascending order and then selecting the middle value (if the sample size is odd) or the average of the two values closest to the middle (if the sample size is even)

#### Sample mode | typetall | modus | modalverdi

The sample mode is the most occuring value in a sample.

#### Sample variance | varianse

The sample variance is defined as the average of the squares of the deviations of the observations from their mean.
If

We could also write the sample variance as

#### Sample standard deviation | standardavvik

The standard deviation is still the positive square root of the sample variance.

#### Sample range | variasjonsbredde

The sample range is the difference between the largest and smallest observed values.

### Sampling distribution

The probability distribution of a statistic is called a sampling distribution.
The sample average (

The sampling distribution of

### The central limit theorem

This is some hot shit right here.

If

According to the book, if

**Note**: The book says (pg 263) that the Central Limit Theorem cannot be used unless *and then use the central limit theorem anyway*.

Example:
You are told that the variance of a population is presumed to be

#### Central limit theorem for two populations

If independent samples of size

##### Comparing two populations

(this is a rewrite of example 8.6)

We have two populations with the following information:

We find

### Sampling distribution of $S^{2}$

If

In other words, "the sampling distribution of

The values of the random variable

The probability that a random sample produces a variable with a

**How to read the $\chi^{2}$-table**:

So, given a

Example:
Given a random sample, from a population with a presumed standard deviation of

And then we look it up in the table and see that a

### t-distribution | student t-distribution | t-fordeling

The

If our random sample was selected from a normal population we can write

Let

A

#### Finding t-values

The

## Estimation problems

### Estimators

A point estimate of some population parameter

Similarly,

An estimator **unbiased** (forventningsrett) if its mean is equal to the paramater it estimates.
From the book: A statistic

If we consider all possible unbiased estimators of some parameter **most efficient estimator** of

### Confidence intervals | konfidensintervall

A

The general formulation of a request for a confidence interval is

If

Example:

#### How large should the sample size be?

Confidence intervals get better as the sample size increases.
Ideally, all measurements would be made on the entire population.
However that is usually either unfeasible or, like, a bunch of work, but we still might want to know how large our sample size should be in order to be certain that the confidence interval is *tight* (note that "tight" is not an actual statistical term).

Luckily, if

#### Finding a confidence interval

**On $\mu$ with known $\sigma^{2}$**
If

Example:
An estimate,

Since the variance is known, we can use the above method.
*Kritiske verdier i standard normalfordelingen/critical values in the standard normal distribution*).

**One-sided confidence bounds on $\mu$ with known $\sigma^{2}$**
If

**On $\mu$ with unknown $\sigma^{2}$**
If

### Prediction interval | prediksjonsintervall

A prediction interval is a prediction of the possible value of a future observation.
It is something we do if we don't know what

Prediction intervals can be used to determine if an observation is an outlier.

** $\mu$ unknown, $\sigma^{2}$ known**
For a normal distribution of measurements with unknown mean

** $\mu$ unknown, $\sigma^{2}$ unknown**
For a normal distribution of measurements with unknown mean

### Maximum likelihood estimator | Sannsynsmaksimeringsestimator

Given independent observations

This is however, not that useful, as computing the MLE from this form can be quite challenging. To overcome this hurdle we use a trick which is quite simply to take the logarithm of our likelihood function

Using this trick allows us to rewrite our likelihood function into a more computable form by rewriting it as a sum. This is a valid approach because the logarithm is a strictly increasing function, which means maximizing this function is equivalent to maximizing our original equation.

Now simply (or not so simply) solve the equation

## Hypotheses testing | hypothesis | hypoteser | hypotesetesting

A statistical hypothesis is an assertion or conjecture concerning one or more populations. The absolute truth or falsity of a statistical hypothesis cannot be known unless the entire population is examined. Which has been shown to be a major hassle. So instead we take a random sample and see if it either supports or refutes our hypothesis.

### The null hypothesis | null-hypotesen

When performing a hypothesis test we present two mutually exclusive hypothesis: a **null hypothesis** (denoted **alternative hypothesis** (denoted

Typically, one of the following conclusions are reached when testing a hypothesis:

An analogy is the hypothesis testing done in a those american jury trials we've seen on TV, where the null hypothesis is that the defendant is not guilty (innocent until proven guilty) while the alternative hypothesis is that the defendant is guilty.

Typically, the null hypothesis states the the probability is equal to some value. The alternative hypothesis states that the probability is either higher or lower than the null hypothesis states, or equal to some other value.

### Testing a statistical hypothesis

Illustrated with a table!

A hypothesis-test-experiment generally has a requirement that the outcome needs to fulfill in order for the experimenters to reject the null hypothesis. The probability of committing a type I error is computed by evaluating the probability that the requirement is exceeded or met given that the null hypothesis is true (there is an example on page 323).

The probability of committing a type II error is impossible to compute unless the alternative hypothesis is specific.
If the alternative hypothesis is