Wikipendium

Share on Twitter Create compendium Add Language
Edit History
Tools
  • Edit
  • History
  • Share on Twitter

  • Add language

  • Create new compendium
Log in
Table of Contents
  1. Words and terms and stuff
    1. Intersection | Snitt | $\cap$
    2. Mutual exclusion | disjoint sets | disjunkte mengder
    3. Union | $\cup$
    4. Balanced object | Balansert objekt
    5. Discrete sample space | Diskrete utfallsrom
    6. Continuous sample space | Kontinuerlig utfallsrom
    7. Degrees of freedom | Frihetsgrader
    8. Population | Populasjonen | Univers
    9. Sample | Utvalg
    10. Statistic
    11. Little blue book | Tabeller og formler i statistikk
  2. Combinatorics | Kombinatorikk
    1. Multiplication rule | Rule of product | Multiplication principle
    2. Permutation | Permutasjon | $_{n}P_{r}$
    3. Combination | Kombinasjon | $_{n}C_{r}$
  3. Probability | Sannsynlighet
    1. Complementary events | Complement
    2. Additive rule | addisjonssetningen
  4. Random Variables and Probability Distributions
    1. Random Variable | stochastic variable | stokastisk variabel
    2. Discrete Probability Distributions | Diskrete sannsynlighetsfordelinger
      1. Probability function | Probability mass function | Probability distribution | Sannsynlighetsfordeling
        1. Conditional probability and other stuff in within the probability distribution expression
      2. (discrete) Cumulative distribution function | (diskret) Kumulativ fordelingsfunksjon
    3. Continuous Probability Distributions | Kontinuerlig sannsynlighetsfordeling
      1. Probability density function (pdf) | density function | tetthetsfunksjon
      2. (continuous) Cumulative distribution function | (kontinuerlig) Kumulativ fordelingsfunksjon
    4. Joint Probability Distributions | Simultan sannsynlighetsfordeling
      1. Joint probability distribution (discrete case) | Simultan sannsynlighetsfordeling (diskret tilfelle)
      2. Joint density function (continuous case)| Simultan tetthetsfunksjon (kontinuerlig tilfelle)
      3. Marginal distributions | marginalfordeling
    5. Expected value of a random variable | mean | forventningsverdi | $\mu$ | $E(X)$
      1. Expected value of random variables with joint probability distribution
      2. Expected value/mean of linear combinations of random variables
    6. Variance | varianse | $\sigma^{2}$
      1. Variance of the random variable g(X)
    7. Standard Deviation | standardavvik | $\sigma$
    8. Covariance | kovarianse | Cov(X,Y)
    9. Statistical indepedence | uavhengighet
      1. Ways to rule out statistical independence | stuff to do with independent random variables | uncorrelated variables
  5. Discrete Probabaility Distributions | Diskret sannsynlighetsfordelinger
    1. The Bernoulli Process
    2. Binomial Distribution | Binomialfordeling | $b(x;n,p)$
      1. Cumulative distribution function for Binomial distributions | binomial sums
      2. Mean and variance of binomial distributions
    3. Multinomial distribution | multinomisk fordeling
    4. Hypergeometric Distribution
      1. Mean and variance of a the hypergeometric distribution
    5. Negative binomial distribution | negativ-binomisk fordeling
    6. Geometric distribution | Geometrisk fordeling
      1. Mean and variance of a geometric distribution
    7. Poisson Distribution
      1. Variance and mean of the Poisson distribution
  6. Continuous Probability Distributions | kontinuerlige sannsynlighetsfordelinger
    1. Continuous Uniform Distribution | Kontinuerlig uniformfordeling
    2. Normal Distribution | Gauss distribution | Normalfordeling | Gaussfordeling
      1. Normal approximation to the Binomial
    3. Gamma Distribution | Gammafordeling | $\Gamma$
    4. Exponential distribution | Eksponensialdistribusjon
    5. Chi-Squared Distribution | Kjikvadratfordeling | $\chi^{2}$-distribution
      1. Degrees of freedom and the chi-squared distribution
  7. Functions of Random Variables
    1. Transformation of Variables
    2. Moment-generating functions | Momentgenererende funksjoner
      1. What is moment?
  8. Sampling Distributions
    1. Random sample | tilfeldig utvalg
      1. Sample mean | Gjennomsnitt
      2. Sample median
      3. Sample mode | typetall | modus | modalverdi
      4. Sample variance | varianse
      5. Sample standard deviation | standardavvik
      6. Sample range | variasjonsbredde
    2. Sampling distribution
    3. The central limit theorem
      1. Central limit theorem for two populations
        1. Comparing two populations
    4. Sampling distribution of $S^{2}$
    5. t-distribution | student t-distribution | t-fordeling
      1. Finding t-values
  9. Estimation problems
    1. Estimators
    2. Confidence intervals | konfidensintervall
      1. How large should the sample size be?
      2. Finding a confidence interval
    3. Prediction interval | prediksjonsintervall
    4. Maximum likelihood estimator | Sannsynsmaksimeringsestimator
  10. Hypotheses testing | hypothesis | hypoteser | hypotesetesting
    1. The null hypothesis | null-hypotesen
    2. Testing a statistical hypothesis
  11. Hustling those Cumulative Probability Distribution exercises
    1. Only valid for discrete probability distributions
    2. Only valid for continuous probability distributions
      1. Normal distribution
‹

TMA4240: Statistics

Tags:
  • statistikk
  • tma4245
  • maths
  • tma4240
+

Words and terms and stuff

Intersection | Snitt | $\cap$

The intersection of two events $A$ and $B$, denoted by the symbol $A\cap B$, is the event containing all elements that are common to both $A$ and $B$. I.e. all the elements that are both in $A$ and $B$.

If you're familiar with logic, think of $\cap$ as the logical operator "and" ($\wedge$) or a conjunction.

Example: Let $D$ be the event that the outcome of a die roll is prime while $E$ is the event that the outcome of said die roll is odd. $D = \{2, 3, 5\}$ and $E = \{1, 3, 5\}$. Then the intersection of $D$ and $E$ is $D\cap E = \{3, 5\}$.

Mutual exclusion | disjoint sets | disjunkte mengder

Two events $A$ and $B$ are mutually exclusive, or disjoint, if $A\cap B = \emptyset$, i.e. if $A$ and $B$ have no elements in common.

Union | $\cup$

The union of two events $A$ and $B$, denoted $A\cup B$, is the event containing all the elements that belong to $A$ or $B$ or both. A trick to remember if $\cup$ or $\cap$ is the symbol for union is that the one representing union looks like the first character in the word "union".

If you are familiar with logic, think of $\cup$ as the logical operator "or" ($\vee$) or a disjunction.

Balanced object | Balansert objekt

An object (e.g. a die or a coin) is balanced if all possible outcomes are equally likely when it is rolled/flipped/thrown.

Discrete sample space | Diskrete utfallsrom

A discrete sample space contains either a finite number of possibilities or an unending sequence with as many elements as there are whole numbers.

Typically represents count data, such as the number of heads after $k$ coin flips or how many children are born on a given day.

Continuous sample space | Kontinuerlig utfallsrom

A continuous sample space contains an infinite number of possibilities equal to the number of points on a line segment. Note that between any two different points on a line segment there are an infinite number of points.

Typically represents measured data, such as height or weight.

Degrees of freedom | Frihetsgrader

"The degrees of freedom is the number of values in the final calculation of a statistic that are free to vary."

It is the quantity $n-1$, where $n$ is the sample size of the experiment.

It could also be the parameter $v$ of a Chi-squared($\chi^{2}$)-distribution. Because that is also referred to as "$v$ degrees of freedom".

Population | Populasjonen | Univers

The population (often of an experiment) is the totality of observations with which we are concerned.

The number of observations in the populations is defined to be the size of the population.

Each observation in a population is a value of a random variable $X$ having some probability distribution $f(x)$.

Sample | Utvalg

A sample is a subset of a population.

Statistic

A statistic is a function of the random variables that constitute a random sample.

Little blue book | Tabeller og formler i statistikk

A little blue book titled "Tabeller og formler i Statistikk", available from Akademika, which one is allowed to bring to this course's exam. Its ISBN is 978-82-519-1595-3.

Note that the one used at NTNU is a 2011-revision by NTNU's IMF institute of the original 2001-edition by Kvaløy, Jan Terje - Tjelmeland, Håkon.

Combinatorics | Kombinatorikk

Multiplication rule | Rule of product | Multiplication principle

More information: Wikipedia

"The fundamental principle of counting," states that if an operation can be performed in $n_{1}$ ways, and if for each of these ways a second operation can be performed in $n_{2}$ ways, ..., and if for each of these ways a $k$th operation can be performed in $n_{k}$ ways, then the sequence of $k$ operations can be performed in $n_{1}n_{2}...n_{k}$ ways.

Permutation | Permutasjon | $_{n}P_{r}$

More information: Wikipedia

A permutation is an ordered arrangement of all or part of the members of a set. Example: The set $S = \{a, b\}$ has two permutations: $\{a, b\}$ and $\{b, a\}$.

A set containing $n$ elements/objects has $n!$ possible permutations.

$r$ objects can be selected from a set of $n$ distinct objects in $$_{n}P_{r} = \frac{n!}{(n-r)!}$$ ways.

Example: $2$ objects are selected from the set $S = \{a, b, c\}$. There are $\frac{3!}{(3-2)!} = 6$ possible solutions (permutations): $\{a, b\}$, $\{a, c\}$, $\{b, a\}$, $\{b, c\}$, $\{c, a\}$ and $\{c, b\}$.

Used when/for: Selecting people (distinct objects) from a group/set,

A circular permutation is an circular arrangement of objects. There are $(n-1)!$ circular permutations of $n$ objects. Rotating a circular permutation does not produce a new permutation.

Distinguishing things. If a set consists of $n$ objects of which $n_{1}$ are of one kind, ..., $n_{k}$ are of a $k$th kind, its number of distinct permutations is $$\frac{n!}{n_{1}!...n_{k}}$$ Example: We have $10$ coloured balls, of which $3$ are red, $2$ are green and $5$ are vomit. There are $\frac{10!}{(3!)(2!)(5!)} = 2520$ possible permutations of this example.

Combination | Kombinasjon | $_{n}C_{r}$

More information: Wikipedia

"A combination is an unordered permutation." - Lao Tze-Te, fictional chinese statistician/tea-leaf interpreter.

The number of combinations of $n$ distinct objects taken $r$ at a time is $$_{n}C_{r} = \binom{n}{r} = \frac{n!}{r!(n-r)!}$$ A combination is an unordered permutation. $2$ objects can be selected from the set $\{a, b\}$ Example: Look at the set $\{a, b\}$. It has two permutations ($\{a, b\}$ and $\{b, a\}$) but only there is only one ($1$) possible way to select a combination of size $2$ from it, as $\{a, b\}$ and $\{b, a\}$ are the same combination.

In general, when one is calculating a series of possible combinations (poker hands, anyone?) the order is irrelevant but each selection of $n$ elements removes that many elements from the set ("tilfeldig trekk uten tilbakelegging").

Note: While a combination ignores the elements' ordering, a series of selections from the same source (i.e. a card deck) has to be calculated from most specific to least specific, as the more specific selections tend to be subsets of the least specific selections. For example, any specific combination of $n$ cards is a subset of the unspecified selection of $n$ cards ("any $n$ cards") – the royal straight flush is a subset of the straight and flush).

Example: You want to pull some cards out of a standard issue playing card deck at random without caring about the order. A standard card deck has $52$ distinct cards, each with one of $4$ distinct suits and $13$ distinct ranks. There are $\binom{52}{2} = 1326$ possible combinations of $2$ cards out of $52$ if we don't care about their suit or rank. But what if we did? How many ways can we draw $2$ cards, both of which are jacks (of a specific rank)? Since there's $4$ cards with the same rank, there are $\binom{4}{2} = 6$ combinations of two jacks. In other words, in $6$ out of the $1326$ possible ways to draw two cards, we get two jacks. But what if we just want both cards to have the same rank? There's $13$ different ranks, of which we are interested in either: $\binom{13}{1}\binom{4}{2} = 78$ of the $1326$ possible ways to draw two cards of the same rank (a pair).

What if we wanted both cards to be of the same suit, but didn't care about their rank? $\binom{13}{2}\binom{4}{1} = 312$.

What if we wanted one of the cards to be a spade but didn't care about the other cards' suit? There are $13$ possibilities for one of the cards (since it has to have a specific suit), and $51$ possibilities for the other card (because we don't care about it but it can't be the same card as the other one). So! There are $(13)(51) = 663$ ways (out of $1326$) to end up with at least one spade if you draw two cards.

But what if we want one spade, and one non-spade? $\binom{13}{1}\binom{39}{1} = 741$ possibilities.

Used for/when: Playing cards (poker hands and such), tilfeldig trekk uten tilbakelegging.

Probability | Sannsynlighet

The probability of an event, $A$, occuring is denoted $P(A)$ (read as "the probability of $A$"). This is a real number between $0$ and $1$.

If each and every possible outcome is equally likely, then the probability of a specific event ($A$) occurring can be calculated by dividing the number of possible outcomes in which it occurs ($n$) by the total number of possible outcomes ($N$): $P(A) = \frac{n}{N}$.

Example: What's the probability of drawing a spade from a deck of cards? There are $52$ cards in a deck, of which $13$ are of the desired suit. This translates to there being $52$ possible outcomes to the act of drawing a card, $13$ of which contain the desired event (that the drawn card is a spade). The probability of drawing a spade is thus $\frac{13}{52} = \frac{1}{4} = 0.25$.

The probability that at least one of two events occurs is denoted by $P(A\cup B)$. The probability that both of these events occur is denoted by $P(A\cap B)$.

Complementary events | Complement

$A$ is an event. $A'$ is all events in which $A$ is not. For two such complementary events then $P(A) + P(A') = 1$. If you are into logic you can think of a complementary event as a negation: $A'$ is all other events than $A$, or simply "not $A$".

Additive rule | addisjonssetningen

If $A$ and $B$ are two events then $P(A\cup B) = P(A) + P(B) - P(A\cap B)$. That is, the probability of either $A$ or $B$ occuring is equal to the sum of the probability of $A$ occuring and the probability of $B$ occuring minus the probability of $A$ AND $B$ occuring. This is because the event that both $A$ and $B$ occurs ($A\cap B$) is a subset of both $A$ and $B$: $(A\cap B) \subset A$ and $(A\cap B) \subset B$ (you could draw a venn diagram to verify this. Nothing fancy, just a box with two intersecting circles). This follows from observing that if both $A$ and $B$ occurs then, well, they both occur.

However if $A$ and $B$ are mutually exclusive (if $A\cap B = \emptyset$) then $P(A\cup B) = P(A) + P(B)$.

Additive rule for three events, $A$, $B$, $C$: $P(A\cup B\cup C) = P(A) + P(B) + P(C) - P(A\cap B) - P(A\cap C) - P(B\cap C) + P(A\cap B\cap C)$

Random Variables and Probability Distributions

Random Variable | stochastic variable | stokastisk variabel

A random variable is a function that associates a real number with each element in the sample space. Think of it as a variable in an experiment: if we sample a population (e.g. we select a bunch of people from Oslo at random), their height, weight, eye colour and feelings towards fish are all random variables. Anything that can vary between sample points can called a random variable.

In this course, random variables are usually denoted by capital letters (e.g. $X$, $Y$), while the corresponding lower case/small letter (e.g. $x$, $y$) denotes one of its values.

Example: The sample space describing the possible outcomes of flipping a balanced coin -- that is a coin which is equally likely to land as heads or tails when flipped -- is $S = \{TTT,TTH,THT,HTT,THH,HTH,HHT,HHH\}$. We can define $X$ as "the number of heads" when flipping a coin three times, which would cause $X$ to assume either $0$, $1$, $2$ or $3$. One can also say that $X$ takes on all values $x \in \{0, 1, 2, 3\}$. The book seems to do both. If the outcome is $TTT$ then $X$ is $0$. If $X$ is $2$ then the possible outcomes are $\{THH, HTH, HHT\}$.

Discrete Probability Distributions | Diskrete sannsynlighetsfordelinger

"A discrete random variable assumes each of its values with a certain probability."

Example: If we toss a coin three times and denote $X$ as "the number of heads", $X$ assumes the value $1$ with probability $\frac{3}{8}$, because all $8$ sample points (possible outcomes) are equally likely and $3$ of them contains $1$ head.

$$ \begin{array}{c|cccc} x & 0 & 1 & 2 & 3 \\ \hline P(X = x) & \frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8} \end{array}$$ In the table above, the first row contains the possible values of $x$ while the second row contains the probability that $X$ will assume the given value of $x$. E.g. Row 2, column 2 contains $\frac{1}{8}$: the probability that $X$ (the number of heads in three coin flips) is $0$. Notice how the fractions in the second row add up to $1$, and that $X$ cannot assume a value not featured in the first row?

Probability function | Probability mass function | Probability distribution | Sannsynlighetsfordeling

The set of ordered pairs $(x,f(x))$ is a probability function of the discrete random variable $X$ if, for each possible outcome $x$

  1. $f(x)\geq 0$
  2. $\sum_{x}f(x) = 1$
  3. $P(X = x) = f(x)$

But what does this mean? $(x,f(x))$ is an ordered pair: an ordered pair in this sense is a pair of mathematical objects (variables, functions, numbers, etc). That it is "ordered" simply means that $(x,f(x))$ is not the same pair as $(f(x),x)$ (i.e. that the order of the elements in the pair matter). We say that the given ordered pair is a "probability function" of the discrete random variable $X$ IF the following statements are true for every possible outcome, $x$. Here, $x$ is a given, possible value of $X$.

And the points on the list means:

  1. The probability of $X$ assuming any possible $x$ is equal to or greater than zero. Negative probability is not covered by this course., so we assume it's impossible.
  2. The probability of $X$ not assuming any of the possible values $x$ is $0$; If we add up the probability of $X$ assuming each and every possible value $x$ the answer should be $1$. In other words $X$ should not be able to assume any value not covered by $x$, nor should the sum of the probability of each possible outcome exceed $1$.
  3. $f(x)$ is shorthand for $P(X = x)$. We could use $g(x)$ instead, or whatever you want to. Typically the corresponding lower case letter is used if a random variable is denoted by a capital letter.

Note: in the book, $f(x)$ alone is sometimes referred to as the probability distribution. Yeah, I don't know why they're not being consistent either. It's like they don't really know what they're doing.

Example: Susie receives a shipment of $20$ bananas, of which $4$ are actually apples (Susie has poor eyesight). If Susie randomly selects $2$ of these, what is the probability distribution for the number of apples? We say that $X$ is a random variable whose values $x$ are the possible numbers of apples selected. Since Susie only selects two things, $x\in \{0, 1, 2\}$. So: $$ \begin{array}{l} f(0) = P(X = 0) = \frac{\binom{4}{0}\binom{16}{2}}{\binom{20}{2}} = \frac{60}{95} \\ f(1) = P(X = 1) = \frac{\binom{4}{1}\binom{16}{1}}{\binom{20}{2}} = \frac{32}{95} \\ f(2) = P(X = 2) = \frac{\binom{4}{2}\binom{16}{0}}{\binom{20}{2}} = \frac{3}{95} \end{array}$$ With this, we see that the probability distribution of $X$ (which is ($(x, f(x))$) can be put into a table: $$\begin{array}{c|ccc|c} x && 0 && 1 && 2 && \text{sum} \\ \hline f(x) && \frac{60}{95} && \frac{32}{95} && \frac{3}{95} && 1\end{array}$$

Conditional probability and other stuff in within the probability distribution expression

Note that this is NOT the same as conditional distribution

What do you do if you're asked to find $P(X\geq 0 | X \leq 1)$? Well the "|" means it's a conditional probability question, that is $P(X\geq 0 | X \leq 1)$ is read as "the probability of $X\geq 0$ given that $X \leq 1$", and it is solved like other conditional probability problems: $$P(X\geq 0 | X \leq 1) = \frac{P(X\geq 0\cap X\leq 1)}{P(X\leq 1)}$$ where $P(X\geq 0\cap X\leq 1) = P(0\leq X \leq 1)$.

(discrete) Cumulative distribution function | (diskret) Kumulativ fordelingsfunksjon

What if we want to know the probability of our random variable assuming a value less than or equal to some possible value of $x$? We could use a cumulative distribution function! It's defined as follows:

The cumulative distribution function $F(x)$ of a discrete random variable $X$ with the probability distribution $f(x)$ is $$\begin{array}{cc} F(X) = P(X \leq x) = \sum_{t\leq x}f(t)\text{,} && \text{for } -\infty < x < \infty \text{.}\end{array}$$ That is, we add up the probabilities of $X$ assuming each possible value of $x$ up to and including some given value. Note that $F(X)$ (the cumulative distribution function) is a monotone nondecreasing function, defined not only for the values assumed by the given random variable but for all real numbers. That is, $-\infty < t \leq x$. However, in practice we are only concerned with the known possible values of $X$ and start out with $t$ being the lowest possible of these and iterating throughout the rest of them as long as the condition $t \leq x$ holds.

$F(x)$ can be read as "the probability that $X$ assumes some value equal to or lower than $x$".

Example: Returning to our friend Susie in the previous example (the gal with the bananas and apples), what is the probability of Susie selecting $0$ or $1$ apples? $F(1) = P(X \leq 1) = \sum_{t = 0}^{1}f(t) = \frac{60}{95} + \frac{32}{95} = \frac{92}{95}$

Continuous Probability Distributions | Kontinuerlig sannsynlighetsfordeling

"A continuous random variable has a probability of $0$ of assuming exactly any of its possible values."

Wait, what? Assume we have a device which allows us to accurately measure the weight of an object in kilograms to twenty decimal places. Further, assume this device is being used to weigh every single adult above $21$ years of age. Because of the device's ridiculous accuracy, the probability that an individual will weigh any given weight is vanishingly small. In fact this probability is so small we declare it to be $0$ -- the probability of selecting an individual at random who weighs, say, exactly $65.00000000000000000000$ kilograms is said to be $0$. So what do we do? We ask about intervals -- what is the probability that a randomly selected individual weighs at least $65$ but not more than $66$ kilograms, for example.

The values of continuous random variables are plotted as a graph. The probability that the random variable assumes some value within a given range is equal to the area beneath the graph within the range.

Probability density function (pdf) | density function | tetthetsfunksjon

The function $f(x)$ is a probability density function (pdf) for the continuous random variable $X$, defined over the set of real numbers, if

  1. $f(x) \geq 0\text{, for all } x \in \mathbb{R}\text{.}$
  2. $\int_{-\infty}^{\infty}f(x)dx = 1\text{.}$
  3. $P(a < X < b) = \int_{a}^{b} f(x)dx\text{.}$

Which means

  1. Negative probabilities are not covered by this course.
  2. It is not possible for $X$ to assume a value outside the scope of $f(x)$.
  3. When we write $P(a < X < b)$ it is shorthand for $\int_{a}^{b} f(x)dx$.

Note: When $X$ is continuous the probability of it assuming any specific given value is $0$, which means that $P(a < X \leq b) = P(a < X < b) + P(b) = P(a < X < b)$.

(continuous) Cumulative distribution function | (kontinuerlig) Kumulativ fordelingsfunksjon

The cumulative distribution function $F(x)$ of a continuous random variable $X$ with density function $f(x)$ is $$ \begin{array}{cc} F(x) = P(X\leq x) = \int_{-\infty}^{x}f(t)dt\text{, } && \text{for } -\infty < x < \infty \end{array}$$

Which allows us to write $P(a < X < b) = F(b) - F(a)$ and $f(x) = \frac{dF(x)}{dx}$

Joint Probability Distributions | Simultan sannsynlighetsfordeling

Joint probability distribution (discrete case) | Simultan sannsynlighetsfordeling (diskret tilfelle)

For two discrete random values $X$ and $Y$, the probability of their simultaneous occurance can be described by a function. The joint probability distribution $f(x,y) = P(X = x, Y = y)$ gives the probability of $x$ and $y$ simultaneously occuring. $f(x,y)$ is a joint probability distribution iff.

  1. $f(x,y) \geq 0$ for all $(x,y)$,
  2. $\sum \sum f(x,y) = 1,$
  3. $P(X = x, Y = y) = f(x,y)$

Joint density function (continuous case)| Simultan tetthetsfunksjon (kontinuerlig tilfelle)

For continuous random values $X$ and $Y$, the probability of their simultaneous occurance can be described by a function $f(x,y)$, called the joint density function iff.

  1. $f(x,y) \geq 0$ for all $(x,y)$,
  2. $\int \int f(x,y) = 1,$
  3. $P(X = x, Y = y) = f(x,y)$

Marginal distributions | marginalfordeling

The marginal distribution of $X$ is the probability distribution/probability function of $X$, obtained by summing $f(x,y)$ (the joint probability distribution of $X$ and $Y$) over the possible values of $Y$. Vice versa for $Y$. If $X$ and $Y$ are continuous rather than discrete and we have a joint density function, the summation is replaced by integrals.

In other words, if we have a joint probability distribution or joint density function we can obtain the probability function or distribution for either of the random variables by summing or integrating over the values of the other random variable.

The marginal distributions of $X$ alone and $Y$ alone are, respectively for the discrete and continuous cases: $$\begin{array}{llcl} \text{Discrete: } && g(x) = \sum_{y}f(x,y) && \text{and} && h(y) = \sum_{x}f(x,y) \\ \text{Continuous: } && g(x) = \int_{-\infty}^{\infty}f(x,y)dy && \text{and} && h(y) = \int_{-\infty}^{\infty}f(x,y)dx \end{array}$$

Expected value of a random variable | mean | forventningsverdi | $\mu$ | $E(X)$

Also known as: mean of the random variable $X$, mean of the probability distribution of $X$. It is the expected value of $X$. Which means that if the experiment is repeated an infinite number of times, one could expect that $X$ would be, on average, $E(X)$.

Let $X$ be a random variable with probability distribution $f(x)$. The mean, or expected value, of $X$ is $$ \begin{array}{lc} \text{Discrete: } && \mu_x = E(X) = \sum_{x}xf(x) \\ \text{Continuous: } && \mu_x = E(X) = \int_{-\infty}^{\infty}xf(x) \end{array}$$

What if we have a random variable that depends on the random variable $X$? For example a function defined with $X$ as its input: $g(X)$. The expected value of the random variable $g(X)$ is $$ \begin{array}{lc} \text{Discrete: } && \mu_{g(X)} = E[g(X)] = \sum_{x}g(x)f(x) \\ \text{Continuous: } && \mu_{g(X)} = E[g(X)] = \int_{-\infty}^{\infty}g(x)f(x)dx \end{array}$$ As you can see, if $g(X) = X$ then these expressions become equal to the previous ones.

Expected value of random variables with joint probability distribution

If $X$ and $Y$ are random variables with joint probability distribution $f(x,y)$, then the expected value of the random variable $g(X,Y)$ is $$\begin{array}{ll} \text{Discrete: } && \mu_{g(X,Y)} = E[g(X,Y)] = \sum_{x}\sum_{y}g(x,y)f(x,y) \\ \text{Continuous: } && \mu_{g(X,Y)} = E[g(X,Y)] = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}g(x,y)f(x,y)dxdy \end{array}$$

Note: if you are asked to find $E(XY)$, try defining $g(X,Y) = XY$ and finding $E(g(X,Y))$ instead.

Expected value/mean of linear combinations of random variables

If $a$ and $b$ are constants, then $$\begin{array}{l} E(aX + b) = aE(X) + b \\ E(b) = b \\ E(aX) = aE(X) \\ \end{array}$$

Also: $$\begin{array}{l} E[g(X) \pm h(X)] = E[g(X)] \pm E[h(X)] \\ E[g(X,Y) \pm h(X,Y)] = E[g(X,Y)] \pm E[h(X,Y)] \\ E[g(X) \pm h(Y)] = E[g(X)] \pm E[h(Y)] \\ E[X \pm Y] = E(X) \pm E(Y) \end{array}$$

Variance | varianse | $\sigma^{2}$

The variance of the random variable $X$, or the variance of the probability distribution of $X$, is a measure of how far a set of numbers is spread out. It describes how far the numbers lie from the mean/expected value.

The variance of a random variable $X$ is $$\sigma_{x}^{2} = E(X^{2}) - \mu_{X}^{2}$$ That is, the expected value of $X^{2}$ minus the expected value of $X$ squared.

Variance of the random variable g(X)

If $X$ is a random variable with probability distribution $f(x)$, then the variance of the random variable $g(X)$ is $$\begin{array}{ll} \text{Discrete: } && \sigma_{g(X)}^{2} = E([g(X) - \mu_{g(X)}]^{2}) = \sum_{x}[g(x) - \mu_{g(X)}]^{2}f(x) \\ \text{Continuous: } && \sigma_{g(X)}^{2} = E([g(X) - \mu_{g(X)}]^{2}) = \int_{-\infty}^{\infty}[g(x) - \mu_{g(X)}]^{2}f(x)dx \end{array}$$

Standard Deviation | standardavvik | $\sigma$

The standard deviation of a random variable $X$ is the positive square root of the variance of $X$. $$\sigma_{X} = \sqrt{\sigma_{X}^{2}}$$

It shows how much variation exists from the mean/expected value. A low standard deviation indicates that the data points tend to be very close to the mean, while a high standard deviation indicates that the data points are spread out over a large range of values.

Covariance | kovarianse | Cov(X,Y)

The covariance of two random variables $X$ and $Y$ with means $\mu_{X}$ and $\mu_{Y}$, respectively, is given by $$Cov(X,Y) = \sigma_{XY} = E(XY) - \mu_{X}\mu_{Y}$$

Statistical indepedence | uavhengighet

$X$ and $Y$ are two random variables with joint probability distribution $f(x,y)$ and marginal distributions $g(x)$ and $h(y)$ respectively. We say that the random variables $X$ and $Y$ are statistically independent, if and only if $$f(x,y) = g(x)h(y)$$

Ways to rule out statistical independence | stuff to do with independent random variables | uncorrelated variables

If $X$ and $Y$ are independent then $Cov(X,Y) = \sigma_{XY} = 0$. So if the covariance of two random variables isn't $0$, then they are not independent.

Furthermore (still only applies if $X$ and $Y$ are independent): $$\mu_{XY} = E(XY) = E(X)E(Y)$$ $$Var(X+Y) = Var(X) + Var(Y)$$ $$Var(XY) = [E(X)]^{2}Var(Y) + [E(Y)]^{2}Var(X) + Var(X)Var(Y)$$

($b$ and $a$ are constants) $$\begin{array} \sigma^{2}_{X+bY} = a^{2}\sigma^{2}_{X} + b^{2}\sigma^{2}_{Y} \\ \sigma^{2}_{X-bY} = a^{2}\sigma^{2}_{X} + b^{2}\sigma^{2}_{Y} \\ \sigma^{2}_{\sum_{i=1}^{n}a_{i}X_{i}} = \sum_{i=1}^{n}a^{2}_{i}\sigma_{X_{i}}^{2} \end{array}$$

Discrete Probabaility Distributions | Diskret sannsynlighetsfordelinger

A note on probability distributions and cumulative probability distributions:

Most often we are asked to find $P(X \leq x)$. This value is the probability that $X$ assumes a value equal to or less than $x$. I.e. the sum of the probabilities that $X$ assumes the value $y$, for $X_{\text{min}} \leq y \leq x$. Or $\sum_{i}P(X = i)$ (note that this particular comparison does not hold for continuous probability distributions but bear with me).

However, the formulas and definitions given for the various probability distributions are almost always of the type $P(X = x)$. For example, the probability distribution of a Poisson random variable $X$ is $p(x;E(X))$. When asked to find $P(X \leq a)$ for a given Poisson random variable, you'd probably just look it up in a table. But if you were to try and compute $p(a;E(X))$ yourself, you would find that it does not equal the value you find in the table. This is because $P(X \leq x)$ is a cumulative distribution function, while $p(x;E(X))$ is a probability mass function. That is, $p(x;E(X)) = P(X = a)$. If you want to compute the value of $P(X\leq a)$ yourself, you would have to compute $\sum_{i=X_{min}}^{a}p(i;E(X))$. Typically, if a lower case letter is used to denote the probability function, then the equivalent upper case letter is used to denote the cumulative probability function.

The Bernoulli Process

More information: Wikipedia

Bernoulli was a 18th century Swiss science-guy after whom the Bernoulli Process is named. Bernoulli's encounters with women were the first Bernoulli trials, and he'd divide them into two categories: "I totally hit that" and "haven't hit it yet". Today, it is more common to use a more PC example when introducing students to the Bernoulli Process, such as a coin flip.

Experiments often consist of repeated trials, each with two possible outcomes that could be labeled success or failure; $1$ or $0$; heads or tails, and each of which are independent. This is called a Bernoulli process. Each trial (e.g. coin flip, drawing a ball at random and then putting it back) is called a Bernoulli trial. The Bernoulli process is a finite or infinite sequence of binary random variables. A common example of a Bernoulli process is repeated coin flipping (possibly with an unfair coin, but consistent unfairness). Drawing from a deck of cards is a Bernoulli process if the outcome is either success or failure (e.g. we could be interested in drawing a spade) and the cards are replaced between each selection, so that the probability of drawing a given card remains the same between draws.

Strictly speaking, a Bernoulli process is a finite or infinite sequence of independent random variables $X_{1}$, $X_{2}$, $X_{3}$, $\ldots$, with the following properties:

  1. For each $i$, the value of $X_{i}$ is either $0$ or $1$.
    • (The experiment consists of repeated trials with the outcome $X_{i}$ which is either $0$ or $1$, i.e. the outcome is binary. We are free to redefine $1$ as "success" and $0$ as "failure" or $1$ as "vagina" and $0$ as "cats".)
  2. For all values of $i$, the probability that $X_{i} = 1$ is the same number $p$.
    • (The probability of either of the two possible outcomes remain constant from trial to trial: this implies that the trials are independent.)

Binomial Distribution | Binomialfordeling | $b(x;n,p)$

A binomial distribution is the probability distribution of the discrete random variable $X$: the number of successes in $n$ Bernoulli trials. The values of a binomial distribution is denoted by $b(x;n,p)$, where $x$ is the value of the random variable $X$; $n$ is the number of trials and $p$ is the probability of the outcome being $1$/"success".

The probability distribution of the binomial random variable $X$ (the number of successes/$1$s in $n$ independent trials) in a Bernoulli trial with probability $p$ of success/$1$, is $$\begin{array}{cc} b(x;n,p) = \binom{n}{x}p^{x}(1-p)^{n-x}\text{,} && x = 0,1,2,\ldots,n.\end{array}$$

Note that $\sum_{x=0}^{n}b(x;n,p) = 1$; the sum of the probability of $0$, $1$, $2$, ..., $n$ successes occuring is $1$; the number of successes in a binomial distribution is an integer between or equal to $0$ and $n$.

Cumulative distribution function for Binomial distributions | binomial sums

We can find $P(X < r)$ or $P(a \leq X \leq b)$ for a binomial distribution. $$P(X < r) = B(r-1;n,p) = \sum_{x=0}^{r-1}b(x;n,p)$$ $$P(X \leq r) = B(r;n,p) = \sum_{x=0}^{r}b(x;n,p)$$ $$P(X \geq r) = 1 - P(X < r) = 1 - B(r-1;n,p)$$ $$P(a \leq X \leq b) = B(a,b;n,p) = \sum_{x=a}^{b}b(x;n,p)$$ $$P(a \leq X \leq b) = P(X < B) - P(X < a)$$ Although typically we just look it up in the tables in the little blue book ("Tabeller og formler i statistikk"). Also note that $$P(X < 1) = B(0;n,p) = \sum_{x=0}^{0}b(0;n,p) = \binom{n}{0}p^{0}(1-p)^{n-0} = (1-p)^n$$

Mean and variance of binomial distributions

$$\begin{array}{lc} \text{Mean/Expected value:} && \mu = np \\ \text{Variance:} && \sigma^{2} = np(1-p) \end{array}$$

Multinomial distribution | multinomisk fordeling

If we allow the trials of a binomial experiment to have more than two possible outcomes, it becomes a multinomial experiment.

If a given trial can result in one of the $k$ possible outcomes $E_{1}$, $E_{2}$, ..., $E_{k}$ with probabilities $p_{1}$, $p_{2}$, ..., $p_{k}$, then the multinomial distribution returns the probability that $E_{1}$, $E_{2}$, ..., $E_{k}$ occurs $x_{1}$, $x_{2}$, ..., $x_{k}$ times in $n$ independent trials: $$f(x_1,x_2,\ldots,x_k;p_1,p_2,\ldots,p_k,n) = \binom{n}{x_1,x_2,\ldots,x_k}p_{1}^{x_{1}}p_{2}^{x_{2}}\cdots p_{k}^{x_{k}}$$ Assuming $$\begin{array}{ccc} \sum_{i=1}^{k}x_{i} = n && \text{and} && \sum_{i=1}^{k}p_{i} = 1 \end{array}$$

Hypergeometric Distribution

Hypergeometric distributions are kind of like binomial ones except that they don't require independence between trials, which means that we can do stuff like draw from a deck of cards without replacing the card (and shuffling the deck) between each draw.

A hypergeometric distribution is the probability distribution of a hypergeometric random variable, $X$, classified as the number of successes of a hypergeometric experiment -- an experiment with the following properties:

  1. A random sample of size $n$ is selected without replacement from $N$ items
  2. Of the $N$ items, $k$ may be classified as successes and $N-k$ are classified as failures. As previously stated, the hypergeometric random variable $X$ is the number of successes in such an experiment.

The values of a hypergeometric distribution are denoted by $h(x;N,n,k)$ $$\begin{array}h(x;N,n,k) = \frac{\binom{k}{x}\binom{N-k}{n-x}}{\binom{N}{n}}\text{,} && \text{max}\{0,n-(N-k)\}\leq x\leq \text{min}\{n,k\}\end{array}$$

Mean and variance of a the hypergeometric distribution

$$\mu = \frac{nk}{N}$$ $$\sigma^{2} = \frac{N-n}{N-1}\cdot n\cdot \frac{k}{N}(1-\frac{k}{N})$$

Negative binomial distribution | negativ-binomisk fordeling

The negative binomial distribution is the probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occurs. Or the number of failures before a specified number of successes occur.

For example, we could throw a six sided die until we get a five for the third time. The probability distribution of the number of non-fives that we got will be negative binomial.

If repeated independent trials can result in success with probability $p$ and a failure with probability $1-p$, then the probability distribution of the random variable $X$ (the number of the trial on which the $k$th success occurs) is $$P(X = x) = \begin{array}b^{*}(x;k,p) = \binom{x-1}{k-1}p^{k}(1-p)^{x-k}\text{,} && x=k, k+1, k+2, \ldots.\end{array}$$

Example: Team $A$ and team $B$ are facing off in a best out of seven match of rock, paper, scissors. Suppose that team $A$ has probability $0.65$ of winning a game over team $B$. What is the probability that $A$ will win the match in $6$ rounds? We denote "$A$ wins the match" as success, with probability $p = 0.65$. Furthermore: $x = 6$ and $k = 4$ (we are interested in the probability of $A$ having won 4 rounds after the sixth round) $b^{*}(6;4,0.65) = \binom{6-1}{4-1}0.65^{4}(1-0.65)^{6-4} = \binom{5}{3}0.65^{4}(1-0.65)^{2}$.

If we wanted to find the probability of team $A$ winning the match, we could add the probability of $A$ winning 4 out of 4, 4 out of 5, 4 out of 6 and 4 out of 7 ($b^{*}(4;4,0.65) + b^{*}(4;5,0.65) + b^{*}(4;6,0.65) + b^{*}(4;7,0.65)$)

Geometric distribution | Geometrisk fordeling

The geometric distribution is a special case of the negative binomial distribution where $k=1$. That is, when the specified number of successes is $1$. $$b^{*}(x;1,p) = \binom{x-1}{1-1}p^{1}(1-p)^{x-1} = p(1-p)^{x-1}$$

The values of the geometric distribution is denoted by $g(x;p)$ $$\begin{array}g(x;p) = p(1-p)^{x-1}\text{,} && x = 1, 2, 3, \ldots.\end{array}$$

Example: We are flipping a balanced coin. What is the probability that we don't get heads until the fifth flip? $g(5;0.5) = 0.5(1-0.5)^{5-1} = 0.5(1-0.5)^{4}$

Mean and variance of a geometric distribution

Note that the mean and variance of a random variable following the geometric distribution are: $$\mu = \frac{1}{p} \text{ and } \sigma^{2} = \frac{1-p}{p^2} $$

Poisson Distribution

The Poisson distribution expresses the probability of a given number of events occuring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

Given only the average rate of some event occuring for a certain period of observation (for example that Bob gets punched four times a day) and assuming that the process that produces the event is essentially random, the Poisson distribution specifies how likely it is that Bob will get punched 2, or 5, or 10, or any other number, during one period of observation.

The probability distribution of the Poisson random variable $X$, representing the number of outcomes occuring in a given time interval or specified region denoted by $t$, is $$\begin{array}p(x;\lambda t) = \frac{e^{-\lambda t}(\lambda t)^x}{x!}\text{,} && x = 0, 1, 2, \ldots.\end{array}$$ Where $\lambda$ is the average number of outcomes per unit time/distance/area or volume. In other words, $\lambda t$ is the expected value of $X$ for the time period $t$. So an easier way to think about it is $$p(x;E(X)) = \frac{e^{-E(X)}(E(X))^x}{x!}$$

Note that the little blue book uses $\mu$ as $\lambda t$, and that you are typically given $\mu$, $E(X)$ or $Var(X)$ in Poisson distribution related assignments.

Variance and mean of the Poisson distribution

$$E(X) = \lambda t = Var(X)$$

Continuous Probability Distributions | kontinuerlige sannsynlighetsfordelinger

Continuous Uniform Distribution | Kontinuerlig uniformfordeling

Normal Distribution | Gauss distribution | Normalfordeling | Gaussfordeling

The normal distribution is super double important.

The density of a normal random variable $X$, with mean $\mu$ and variance $\sigma^{2}$, is $$\begin{array} n(x;\mu,\sigma) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{1}{2\sigma^{2}}(x-\mu)^{2}}\text{,} && -\infty < x < \infty \end{array}$$ Did you notice how the density of $X$ is defined solely by its expected value and variance?

Did you also notice how the table of values for the normal distribution in the little blue book only covers $\mu = 0$ and $\sigma^{2} = 1$? Good on you. Thankfully we don't have to compute the integral of $n(x;\mu,\sigma)$ to find the probability of $X$ assuming some value. Although it is doable the result contains the error function and I don't even know what course that's covered in.

Tasked with finding the probability that the normal random variable $X$ assumes some value less than (or equal to, remember how the probability of a continuous random variable to assume any single value is $0$?) $a$ -- i.e. $P(X < a)$, with a given variance $\sigma^{2}$ and mean $\mu$, we calculate $$z_{a} = \frac{a - \mu}{\sigma}$$ and find $P(Z < z)$ instead, where $Z$ is a normal random variable with mean $\mu = 0$ and variance $\sigma^{2} = 1$. This means we can use the tables!

If tasked to find $P(a < X < b)$, calculate $z_{a}$ and $z_{b}$ and evaluate $P(z_{a} < X < z_{b})$.

Note: This "formula" is in the little blue book on page 31 (the map of relations between probability distributions), above the arrow going from Normal($\mu$, $\sigma^{2}$) to Normal($0,1$).

Normal approximation to the Binomial

If $X$ is a binomial random variable with probability $p$ and $n$ trials, then the limiting form of the distribution of $$Z = \frac{X-np}{\sqrt{np(1-p)}}$$ as $n\to \infty$, is the standard normal distribution $n(z;0,1)$ (that is, $Z$ is a normal random variable with mean $\mu =0$ and variance $\sigma^{2} = 1$). Note that $n\to \infty$ is something the book uses but these approximations are apparently applicable for $n = 100$ (which isn't very close to $\infty$), so just roll with it.

In other words, if you have a binomial random variable $X$ (with probability $p$ and $n$ trials) and are asked to find some approximation for $P(X \leq a)$, compute $Z$ and find $P(Z \leq a)$ instead.

Continuity correction:

(Long version) Note that since continuous random variables have a probability of $0$ of assuming any given value and binomial random variables can only assume positive integers ($0 \leq X \leq \infty$), the value "$X$" used to compute $Z$ doesn't exactly equal $a$ in $P(X \leq a)$, but rather is off by $0.5$. This is because the distribution of a binomial random variable is a histogram, where each bar's center is directly on top of the integers along the x-axis with a width of $1$. That is, if $B$ is a binomial random variable that has some probability of assuming $4$, there will be a bar with its center above $4$ on the x-axis that extends to $3.5$ and $4.5$. So if we want to find $P(X \leq 4)$ we need to include this entire bar, which means we calculuate $Z$ for $X=4.5$. That is, if we want to find the area under the curve to the left of $x$, we add $0.5$ to $x$. If we want to find the area under the curve to the right of $x$, we subtract $0.5$ from $x$.

Check out page 189-90 in the book (9e) for an explanation with pictures.

(Short version) if $X$ is a binomial random variable with probability $p$ and $n$ trials (with mean $\mu = np$ and variance $\sigma^{2} = np(1-p)$) and $z_{a} = \frac{a-np}{\sqrt{np(1-p)}}$ then the normal approximation of the following cumulative distributions of $X$ are: $$P(X \leq a) \approx P(Z < z_{a+0.5})$$ $$P(X \geq a) = 1 - P(X \leq (a-1)) \approx 1 - P(Z < z_{(a-1)+0.5}) = 1 - P(Z < z_{a-0.5})$$ $$P(a \leq X \leq b) \approx P(z_{a-0.5} < X < z_{b+0.5})$$

Quality of the approximation:

If $np\geq 5$ and $n(1-p) \geq 5$ the approximation of the normal to the binomial has a higher quality. (pg 241)

Gamma Distribution | Gammafordeling | $\Gamma$

The continuous random variable $X$ has a gamma distribution with parameters $a$ and $b$ if its density function is given by $$f(x;a,b) = \begin{array}{ll}\frac{1}{b^{a}\Gamma(a)}x^{a - 1}e^{\frac{-x}{b}} \text{,} && x > 0\text{,} \\ 0\text{,} && \text{elsewhere,}\end{array}$$ where $a > 0$ and $b > 0$.

The mean and variance of the gamma distribution are $$E(X) = \mu = a \beta$$ $$Var(X) = \sigma^{2} = a \beta^{2}$$

(Note that $\alpha$ and $\beta$ are used instead of $a$ and $b$ in both the book and the little blue book, but who's got time for that jazz. People still know what's up.)

The gamma function is defined by $$\begin{array}\Gamma(a) = \int_{0}^{\infty}x^{a-1}e^{-x}dx, && \text{for } a > 0\end{array}$$

Properties of the gamma function $$\Gamma(a) = (a-1)\Gamma(a-1)$$ $$\begin{array}{ll} \Gamma(n) = (n-1)! && \text{if } n \in \mathbb{Z}\end{array}$$ $$\Gamma(1) = 1$$ $$\Gamma(\frac{1}{2}) = \sqrt(\pi)$$

So yeah, the Gamma function is pretty dope and you better hope we don't have to do any calculations with it by hand.

Exponential distribution | Eksponensialdistribusjon

The exponential distribution is a special case of the gamma distribution where $a = 1$.

The continuous random variable $X$ has an exponential distribution, with parameter $b$, if its density function is given by $$f(x;b) = \begin{array} \frac{1}{b} e^{-\frac{x}{b}}, && x > 0, \\ 0, && \text{elsewhere}\end{array}$$ where $b > 0$

The mean and variance of the exponential distribution are $$E(X) = \mu = b$$ $$Var(X) = \sigma^{2} = b^{2}$$

The exponential function is pretty closely related to the Poisson process. This is apparently super useful. Check out page 196.

Chi-Squared Distribution | Kjikvadratfordeling | $\chi^{2}$-distribution

The Chi-Squared distribution is another special case of the gamma distribution where $a = \frac{v}{2}$ and $b=2$ ($v$ is a positive integer). The Chi-squared distribution has a single parameter, $v$, called the degrees of freedom.

The continuous random variable $X$ has a chi-squared distribution with $v$ degrees of freedom if its density function is given by $$f(x;v) = \begin{array} \frac{1}{2^{v/2}\Gamma(v/2)}x^{(v/2)-1}e^{-(x/2)}, && x > 0, \\ 0, && \text{elsewhere} \end{array}$$ ($v$ is a positive integer).

The mean and variance are $$E(X) = \mu = v$$ $$Var(X) = \sigma^{2} = 2v$$

Degrees of freedom and the chi-squared distribution

For every piece of information we estimate based on supplied information, a degree of freedom is lost. In this course it would seem like we never lose more than one degree of freedom, and the whole thing is succinctly summarized on page 27 of the little blue book.

Functions of Random Variables

Transformation of Variables

Suppose that $X$ is a discrete random variable with probability distribution $f(x)$. Let $Y = u(X)$ define a one-to-one transformation between the values of $X$ and $Y$ so that the equation $y = u(x)$ can be uniquely solved for $x$ in terms of $y$, say $x = w(y)$. Then the probability distribution of $Y$ is $g(y) = f[w(y)]$

What does this mean?

If we have a discrete random variable $X$ with a known probability distribution function, and another function that defines a one-to-one mapping between $X$ and $Y$ (another discrete random variables with the exact same values as $X$), then the probability function of $Y$ is equal to the probability function of $x$ with the one-to-one mapping function between $Y$ and $X$ (the one that translates from $Y$ to $X$) as its argument.

Suppose that $X_{1}$ and $X_{2}$ are discrete random variables with joint probability distribution $f(x_{1}, x_{2})$. Let $Y_{1} = u_{1}(X_{1}, X_{2})$ and $Y_{2} = u_{2}(X_{1}, X_{2})$ define a one-to-one transformation between the points $(x_{1}, x_{2})$ and $(y_{1}, y_{2})$ so that the equations $$\begin{array} \text{}y_{1} = u_{1}(X_{1}, X_{2}) & \text{and} & y_{2} = u_{2}(X_{1}, X_{2}) \end{array}$$ may be unqiuely solved for $x_{1}$ and $x_{2}$ in terms of $y_{1}$ and $y_{2}$, say $x_{1} = w_{1}(y_{1}, y_{2})$ and $x_{2} = w_{2}(y_{1}, y_{2})$. Then the joint probability distribution of $Y_{1}$ and $Y_{2}$ is $$g(y_{1}, y_{2}) = f[w_{1}(y_{1}, y_{2}), w_{2}(y_{1}, y_{2})]$$

Examples of one-to-one transformations $$Y = X$$ $$Y = X^{2}$$ $$Y = X_{1} + X_{2}$$

Example: Let $X$ and $Y$ be stochastic random variables. The probability distribution of $X$ is $f(x) = xe^{-x^{2}}$ and $Y$ is defined as $Y = u(X) = X^{2}$ with the probability distribution $g(y)$. In accordance with the formula on page 34 of the little blue book, we now have: $$w(y) = \sqrt{y}$$ $$w'(y) = \frac{1}{2\sqrt{y}}$$ Because $w(y)$ is the inverse of $u(x)$ solved for $y$. $$g(y) = f(w(y)) \dot |w'(y)| = \sqrt{y}e^{-\sqrt{y}^{2}} \dot |\frac{1}{2\sqrt{y}}|$$

Moment-generating functions | Momentgenererende funksjoner

The moment-generating function of a random variable is an alternative specification of its probability distribution.

The moment-generating function of the random variable $X$ is given by $E(e^{tX})$ and is denoted by $M_{X}(t)$:

$$ M_{X}(t) = E(e^{tX}) = $$ $$ \sum_{x = 0}^{n} e^{tx}f(x) \text{, } \text{if } X \text{ is discrete,} $$ $$ \int_{-\infty}^{\infty}e^{tx}f(x)dx \text{, } \text{if } X \text{ is continuous.} $$

Neat properties of moment-generating functions If $X$ and $Y$ are two random variables with separate moment generating functions that are equal for all values of $t$, then $X$ and $Y$ have the same probability distribution.

$$M_{X+a}(t) = e^{at}M_{X}(t)$$ $$M_{aX}(t) = M_{X}(at)$$ $$\begin{array}Y = \sum_{i=1}^{n}X_{i} && \text{then} && M_{Y}(t) = \Pi_{i=1}^{n}M_{X_{i}}(t)\end{array}$$

What is moment?

More information: Wikipedia

"Loosely speaking it is a quantative measure of the shape of a set of points."

Moments are ordered. The function $\mu_{r}' = E(X^{r})$ (see page 218) defines the $r$th moment about the origin of the random variable $X$. $$E(X^{r}) = \begin{array} \sum_{x}x^{r}f(x)\text{,} & \text{if } X \text{ is discrete,} \\ \int_{-\infty}^{\infty}x^{r}f(x)\text{,} && \text{if } X \text{ is continuous.} \end{array}$$ We see that the $0$th moment ($E(X^{0}$) is equal to $1$ (look up the three requirements for continuous and discrete random variables).

The $1$st moment is equal to the mean of $X$, $E(X)$.

Beyond that it's all a bit diffuse. The course only covers the formula for moment-generating functions.

Sampling Distributions

Random sample | tilfeldig utvalg

Random sampling means that a subset of the population is selected at random to elminiate any possibility of bias in the sampling procedure.

Let $X_{i}$, $i = 1, 2, \ldots, n$ represent the $i$th measurement or sample value that we observe. The random variables $X_{1}$, $X_{2}$, ...$X_{n}$ will then constitute a random sample from the population $f(x)$ with numerical values $x_{1}$, $x_{2}$, ..., $x_{n}$ if the measurements are obtained by repeating the experiment $n$ independent times under essentially the same conditions. Because of this (the indentical conditions under which the elements of the sample are selected), it is reasonable to assume that the $n$ random variables are independent and that they all have the same probability distribution $f(x)$.

Since the $n$ random variables can be assumed to be independent, their joint probability distribution is $$f(x_{1}, x_{2}, \ldots, x_{n}) = f(x_{1})f(x_{2})\cdots f(x_{n})$$

If $p$ is some population and a random sample is collected from it, then $\hat{p}$ is a proportion of the objects in said sample that fulfill some criteria (e.g. the proportion $\hat{p}$ is how many people who enjoy a good dicking from a random sample of all the people in Oslo). The value of $\hat{p}$ (the proportion of objects in a sample that fulfill some criteria) is then used to make an inference concerning the true proportion $p$ (the population).

$\hat{p}$ might vary from random sample to random sample from the same population and as such it is a random variable that can be represented by $P$ and called a statistic.

Sample mean | Gjennomsnitt

The sample mean is the numerical average of the observations in a sample $$ \bar{x} = \sum_{i=1}^{n}\frac{x_{i}}{n}$$

Sample median

The sample median is the middle value of the observations in a sample. It is obtained by sorting the values in either ascending or ascending order and then selecting the middle value (if the sample size is odd) or the average of the two values closest to the middle (if the sample size is even) $$\tilde{x} = \begin{array}{ll} x_{n+1}/2, & \text{if $n$ is odd,} \\ \frac{1}{2}(x_{n/2}+x_{n/2+1}), & \text{if $n$ is even.} \end{array}$$

Sample mode | typetall | modus | modalverdi

The sample mode is the most occuring value in a sample.

Sample variance | varianse

The sample variance is defined as the average of the squares of the deviations of the observations from their mean. If $X_{1}$, ..., $X_{n}$ represent $n$ random variables then the sample variance is $$S^{2} = \frac{1}{n-1}\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}$$ The compute value of $S^{2}$ for a given sample is denoted by $s^{2}$

We could also write the sample variance as $$S^{2} = \frac{1}{n(n-1)}\left[n\sum_{i=1}^{n}X_{i}^{2} - (\sum^{n}_{i=1}X_i)^2\right]$$

Sample standard deviation | standardavvik

The standard deviation is still the positive square root of the sample variance. $$S = \sqrt{S^{2}}$$

Sample range | variasjonsbredde

The sample range is the difference between the largest and smallest observed values. $$R = X_{max} - X_{min}$$

Sampling distribution

The probability distribution of a statistic is called a sampling distribution. The sample average ($\bar{X}$) and sample variance ($S^{2}$) are used to make inferences on the parameters $\mu$ (expected value/mean) and $\sigma^{2}$ (variance).

The sampling distribution of $\bar{X}$ with sample size $n$ is the distribution that results when an experiment is conducted over and over with sample size $n$ and the many values of $\bar{X}$ result. This sampling distribution describes the variability of sample averages around the population mean $\mu$.

The central limit theorem

This is some hot shit right here.

If $\bar{X}$ is the mean of a random sample of size $n$ taken from a population with mean $\mu$ and finite variance $\sigma^{2}$, then the limiting form of the distribution of $$Z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$$ as $n\to \infty$ is the standard normal distribution $n(z;0,1)$ (the standard normal distribution has $\mu = 0$ and $\sigma^{2} = 1$).

According to the book, if $n \geq 30$ this is generally a good approximation. Which could be taken to imply that $n\to \infty$ holds for $n \geq 30$. But don't do that. Think of it as an approximation.

Note: The book says (pg 263) that the Central Limit Theorem cannot be used unless $\sigma$ (standard deviation) is known, but if $\sigma$ isn't known then we can replace it with $s$ (the sample standard deviation) and then use the central limit theorem anyway.

Example: You are told that the variance of a population is presumed to be $\sigma^{2}$, but nothing about the mean. However you are also told that a random sample of size $n$ is present and that the involved random variables are independent and normal. You can then find the probability that the random variable $Y$, which is the average value of some random variable in the sample, differs from the mean by some amount: $$P(|Y - \mu| > a)$$ What do you do? First: since $Y$ is derived from normal independent random variables, it is normal. As such it is symmetrical about the mean. So $$P(|Y - \mu| > a) = 2P(Y - \mu > a) = 2(1 - P(Y - \mu \leq a))$$ Second: You whip out the central limit theorem, that's what you do! Because $Y$ was defined to be the average value of a random variable (in other words, the sample mean) in the sample. $$Z = \frac{Y-\mu}{\sigma/\sqrt{n}} = \frac{a}{\sqrt{\sigma^{2}}\sqrt{n}}$$ And then you find $P(Z \leq z)$ by looking it up in the standard normal distribution's table and bam. Central limit theorem's your daddy.

Central limit theorem for two populations

If independent samples of size $n_{A}$ and $n_{B}$ are drawn at random from two populations (discrete or continuous) with means $\mu_A$ and $\mu_B$ and variances $\sigma_A$ and $\sigma_B$, then the sampling distribution of the differences of means, $\bar{X}_A - \bar{X}_B$, is approximately normal distributed with mean and variance given by $$\mu_{\bar{X}_A - \bar{X}_B} = \mu_{A} - \mu_{B}$$ $$\sigma^{2}_{\bar{X}_A - \bar{X}_B} = \frac{\sigma^{2}_{A}}{n_{A}}+\frac{\sigma_{B}^{2}}{n_{B}}$$ and so $$Z=\frac{(\bar{X}_A - \bar{X}_B) - \mu_{\bar{X}_A - \bar{X}_B}}{\sqrt{\frac{\sigma^{2}_{A}}{n_{A}}+\frac{\sigma_{B}^{2}}{n_{B}}}} = \frac{(\bar{X}_A - \bar{X}_B) - (\mu_A \mu_B)}{\sqrt{\sigma^{2}_{\bar{X}_A - \bar{X}_B}}}$$ is approximately a standard normal variable.

Comparing two populations

(this is a rewrite of example 8.6)

We have two populations with the following information: $$\begin{array}{cc} \hline \textbf{Population A} & \textbf{Population B} \\ \hline \mu_{A} = 6.5 & \mu_{B} = 6.0 \\ \sigma_{A} = 0.9 & \sigma_{B} = 0.8 \\ n_{A} = 36 & n_{B} = 49 \\ \hline \end{array}$$ What is the probability that a random sample of size $n_{A}$ will have a mean that is at least $1$ more than the mean of a random sample of size $n_{B}$? In other words, the probability that the difference between $\bar{X}_A$ and $\bar{X}_B$ is equal to or greater than one? $P(\bar{X}_A - \bar{X}_B \geq 1.0)$? If we use the central limit theorem for two populations, the sampling distribution $\bar{X}_A - \bar{X}_B$ can be assumed to be approximately normal and will have a mean of $$\mu_{\bar{X}_A - \bar{X}_B} = \mu_{A} - \mu_{B} = 6.5-6.0 = 0.5$$ and standard deviation of $$\sigma_{\bar{X}_A - \bar{X}_B} = \sqrt{\frac{\sigma^{2}_{A}}{n_{A}}+\frac{\sigma_{B}^{2}}{n_{B}}} = \sqrt{\frac{0.9^{2}}{36} + \frac{0.8^{2}}{49}} = 0.189$$

We find $$z = \frac{(\bar{X}_A - \bar{X}_B) - (\mu_A \mu_B)}{\sqrt{\sigma^{2}_{\bar{X}_A - \bar{X}_B}}} = \frac{1.0 - 0.5}{0.189} = 2.65$$ and look $P(\bar{X}_A - \bar{X}_B \geq 1.0) = P(Z > 2.65) = 1 - P(Z < 2.65)$ up in a table and see that it is equal to $1-0.9960 = 0.0040$. In other words it is not very likely.

Sampling distribution of $S^{2}$

If $S^{2}$ is the variance of a random sample size $n$ taken from a normal population having the variance $\sigma^{2}$, then the statistic $$\chi = \frac{(n-1)S^{2}}{\sigma^{2}} = \sum_{i=1}^{n}\frac{(X_{i} - \bar{X})^{2}}{\sigma^{2}}$$ has a chi-squared distribution with $v = n-1$ degrees of freedom.

In other words, "the sampling distribution of $S^{2}$" is employed when we want to make inferences about the variance of a population based on a random sample without use of the true mean of the population. (That is, we use the mean of the sample rather than the population)

The values of the random variable $\chi^{2}$ are calculated from each sample by the formula $$\chi^{2} = \frac{(n-1)s^{2}}{\sigma^2}$$

The probability that a random sample produces a variable with a $\chi^{2}$ value greater than some specified value $x$ is equal to the area under the curve to the right of $x$. Which... wait, what?

How to read the $\chi^{2}$-table: $\chi^{2}_{v,a}$ is the $\chi^{2}$-value with $v$ degrees of freedom that leaves an area of $a$ to the right on the $\chi^{2}$-distribution.

So, given a $\chi^{2}$-value of $x$ with $v$ degrees of freedom we look up the $\chi^{2}$-table in the little blue book and find the row for $v$ degrees of freedom and traverse the columns until we locate two columns with values that $x$ lies between.

Example: Given a random sample, from a population with a presumed standard deviation of $1$, of size $5$ with values $1.9$, $2.4$, $3.0$, $3.5$, $4.2$, should we be convinced the standard deviation really is $1$? First we find the sample variance: $$s^{2} = \frac{1}{5(5-1)}(5\sum^{5}_{i=1}X_{i}^{2} - (\sum^{5}_{i=1}X_i)^2)) = \frac{1}{20}((5)(48.26) - 15^{2}) = 0.815$$ With this we calculate the $\chi^{2}$-value: $$\chi^{2} = \frac{(5-1)s^{2}}{\sigma^{2}} = \frac{(4)(0.815)}{1} = 3.26$$ Note that the degrees of freedom in this case is $n-1$, because we estimated the mean by calculating $\bar{X}$ and replacing $\mu$ with it.

And then we look it up in the table and see that a $\chi^{2}$-value of $3.26$ lies between $a = 0.950$ and $a = 0.050$ (as does $95\%$ of the $\chi^{2}$-distribution) and because of this we don't really have any reason to suspect that the standard deviation isn't $1$.

t-distribution | student t-distribution | t-fordeling

The $t$-distribution is used for when we want to make inferences about $\mu$. Or: The $t$-distribution arises when we estimate the mean of a normally distributed population. Or: The $t$-distribution is involved when we don't know what the variance is (e.g. confidence and prediction intervals with unknown variance).

If our random sample was selected from a normal population we can write $$T = \frac{Z}{\sqrt{V/(n-1)}}$$ where $Z$ is a standard normal variable (a variable with the standard normal distribution) defined as $$Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}$$ and $V$ is a chi-squared variable with $v=n-1$ degrees of freedom defined as $$V=\frac{(n-1)S^{2}}{\sigma{2}}$$ $Z$ and $V$ are independent because $\bar{X}$ and $S^{2}$ are independent because the sample was drawn from a normal population and the book says so.

Let $X_{1}$, $X_{2}$, ..., $X_{n}$ be independent random variables that are all normal with mean $\mu$ and standard deviation $\sigma$. Let $$\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_{i}$$ and $$S^{2} = \frac{1}{1-n}\sum_{i=1}^{2}(X_i - \bar{X})^{2}$$ Then the random variable $T = \frac{\bar{X} - \mu}{S/\sqrt{n}}$ has a $t$-distribution with $v = n-1$ degrees of freedom.

A $t$-distribution with $\infty$ degrees of freedom looks exactly like the standard normal distribution.

Finding t-values

$t_{x,v}$ represents the $t$-value with $v$ degrees of freedom which we find an area equal to $x$ to the right of. So $P(T > t_{x,v})$ is the probability that $T$ will assume some value that lies to the right of $t_{x,v}$ in the distribution of $T$.

The $t$-distribution is symmetric about a mean of zero, so $t_{x} = -t_{1-x}$.

$95\%$ of the values of a $t$-distribution with $v$ degrees of freedom lie between $-t_{0.025}$ and $t_{0.025}$. So the probability of a $t$-value not falling between these two points is $5\%$. Which is more than the probability of not getting either heads or tails on a coin flip but small enough of a probability that the book tells us to RAISE OUR EYEBROWS SUSPICIOUSLY and question the assumed value of $\mu$.

Estimation problems

Estimators

A point estimate of some population parameter $\theta$ is a single value $\hat{\theta}$ of a statistic $\hat{\Theta}$. For example, the value $\bar{x}$ (sample mean) of the statistic $\bar{X}$, computed from a sample of size $n$, is a point estimate of the population parameter $\mu$ (population mean).

Similarly, $\hat{p} = \frac{x}{n}$ is a point estimate of the true proportion $p$ for a binomial experiment (with $n$ trials, $x$ successes).

An estimator $\hat{\Theta}$ is unbiased (forventningsrett) if its mean is equal to the paramater it estimates. From the book: A statistic $\hat{\Theta}$ is said to be an unbiased estimator of the parameter $\theta$ if $\mu_{\hat{\Theta}} = E(\hat{\Theta}) = \theta$.

If we consider all possible unbiased estimators of some parameter $\theta$, the one with the smallest variance is called the most efficient estimator of $\theta$.

Confidence intervals | konfidensintervall

A $p\%$ confidence interval for $\mu_{X}$, is an interval within which $p\%$ of $X$'s values will be.

The general formulation of a request for a confidence interval is $100(1-a)$%. That is, if you are asked to find a $95\%$ confidence interval, the value of $a$ is $95 = 100(1-a) \implies a = 1- \frac{95}{100}$.

If $\bar{x}$ is used as an estimate of $\mu$, we can be $100(1-a)$% confident that the error will not exceed $z_{\frac{a}{2}}\frac{\sigma}{\sqrt{n}}$.

Example: $95\%$ of the values of a standard normal distribution lie within $2$ standard deviations of the mean. Which means that a $95\%$ confidence interval for the standard normal variable $Z$ is $[-2\sigma, 2\sigma]$.

How large should the sample size be?

Confidence intervals get better as the sample size increases. Ideally, all measurements would be made on the entire population. However that is usually either unfeasible or, like, a bunch of work, but we still might want to know how large our sample size should be in order to be certain that the confidence interval is tight (note that "tight" is not an actual statistical term).

Luckily, if $\bar{x}$ is used as an estimate of $\mu$ we can be $100(1-a)$% confident that the error will not exceed a specified amount $d$ when the sample size is $$n= \lceil(\frac{z_{\frac{a}{2}}\sigma}{d})^{2}\rceil$$ (the answer is rounded up)

Finding a confidence interval

On $\mu$ with known $\sigma^{2}$ If $\bar{x}$ is the sample mean of a random sample of size $n$ from a population with a known variance $\sigma^{2}$, a $p$% confidence interval for $\mu$ is given by $$\bar{x} - z_{\frac{a}{2}}\frac{\sigma}{\sqrt{n}} < \mu < \bar{x} + z_{\frac{a}{2}}\frac{\sigma}{\sqrt{n}}$$ where $a = 1 - \frac{p}{100}$ and $z_{\frac{a}{2}}$ is the $z$-value leaving an area of $\frac{a}{2}$ to the right. To find the $z$-value, use the "critical values in the standard normal distribution"-table on page 3 of the little blue book.

Example: An estimate, $\bar{x} = 6.76$, of $\mu$ has been made based a random sample with size $n = 5$. The variance of the population is known to be $\sigma^{2} = 0.060^{2}$. An old man appears outside your window, "show me a $95\%$ confidence interval for the mean of this population", he whispers, somehow piercing your eardrums despite the barrier of glass separating the two of you.

Since the variance is known, we can use the above method. $p = 0.95$ so $a = 0.05$. To find $z_{\frac{a}{2}}$, compute $\frac{a}{2}$ and look it up in the table on page 3 of the little blue book (Kritiske verdier i standard normalfordelingen/critical values in the standard normal distribution). $z_{\frac{a}{2}}$ is the value in the right hand column on the same row as $\frac{a}{2}$.

One-sided confidence bounds on $\mu$ with known $\sigma^{2}$ If $\bar{X}$ is the mean of a random sample of size $n$ from a population with variance $\sigma^{2}$, the one-sided $100(1-a)\%$ confidence bounds for $\mu$ are given by $$\begin{array}{rr} \text{upper one-sided bound:} & \bar{x} + z_{a}\frac{\sigma}{\sqrt{n}} \\ \text{lower one-sided bound:} & \bar{x} - z_{a}\frac{\sigma}{\sqrt{n}} \end{array}$$ In other words it's almost like finding the whole confidence interval except we find $z_{a}$ rather than $z_{a/2}$.

On $\mu$ with unknown $\sigma^{2}$ If $\bar{x}$ and $s$ are the mean and standard deviation of a random sample from a normal population with unknown variance $\sigma^{2}$, a $100(1-a)\%$ confidence interval for $\mu$ is $$\bar{x} - t_{\frac{a}{2}}\frac{s}{\sqrt{n}} < \mu < \bar{x} + t_{\frac{a}{2}}\frac{s}{\sqrt{n}}$$ Where $t_{\frac{a}{2}}$ is the $t$-value with $v=n-1$ degrees of freedom, leaving an area of $\frac{a}{2}$ to the right. (Use the "kritiske verdier i $t$-fordelingen/critical values of the $t$-distribution" on page 4 of the little blue book to find $z_{\frac{a}{2}}$)

Prediction interval | prediksjonsintervall

A prediction interval is a prediction of the possible value of a future observation. It is something we do if we don't know what $\mu$ is.

Prediction intervals can be used to determine if an observation is an outlier.

$\mu$ unknown, $\sigma^{2}$ known For a normal distribution of measurements with unknown mean $\mu$ and known variance $\sigma^{2}$, a $100(1-a)\%$ prediction interval of a future observation $x_{0}$ is $$\bar{x} - z_{\frac{a}{2}}\sigma \sqrt{1+\frac{1}{n}} < x_0 < \bar{x} + z_{\frac{a}{2}}\sigma \sqrt{1+\frac{1}{n}}$$ Where $z_{a/2}$ is the $z$-value leaving an area of $a/2$ to the right.

$\mu$ unknown, $\sigma^{2}$ unknown For a normal distribution of measurements with unknown mean $\mu$ and unknown variance $\sigma^{2}$, a $100(1-a)\%$ prediction interval of a future observation $x_{0}$ is $$\bar{x} - t_{\frac{a}{2}}s \sqrt{1+\frac{1}{n}} < x_0 < \bar{x} + t_{\frac{a}{2}}s \sqrt{1+\frac{1}{n}}$$

Maximum likelihood estimator | Sannsynsmaksimeringsestimator

Given independent observations $x_{1}$, $x_{2}$, ..., $x_{n}$ from a probability density function (continuous case) or probability mass function (discrete case) $f(\mathbf{x};\theta)$, the maximum likelihood estimator $\hat{\theta}$ is that which maximizes the likelihood function $$L(x_{1}, x_{2}, \ldots, x_{n};\theta) = f(\mathbf{x};\theta) = \Pi_{i=1}^{n}f(x_{i},\theta)$$

This is however, not that useful, as computing the MLE from this form can be quite challenging. To overcome this hurdle we use a trick which is quite simply to take the logarithm of our likelihood function $$l(x;\theta)=ln(L(x_{1}, x_{2}, \ldots, x_{n};\theta))$$

Using this trick allows us to rewrite our likelihood function into a more computable form by rewriting it as a sum. This is a valid approach because the logarithm is a strictly increasing function, which means maximizing this function is equivalent to maximizing our original equation. $$l(x_{1}, x_{2}, \ldots, x_{n};\theta) = l(\mathbf{x};\theta) = \Sigma_{i=1}^{n}ln(f(x_{i},\theta))$$

Now simply (or not so simply) solve the equation $$\frac{d(l(\mathbf{x};\theta))}{d\theta}=0$$

Hypotheses testing | hypothesis | hypoteser | hypotesetesting

A statistical hypothesis is an assertion or conjecture concerning one or more populations. The absolute truth or falsity of a statistical hypothesis cannot be known unless the entire population is examined. Which has been shown to be a major hassle. So instead we take a random sample and see if it either supports or refutes our hypothesis.

The null hypothesis | null-hypotesen

When performing a hypothesis test we present two mutually exclusive hypothesis: a null hypothesis (denoted $H_{0}$) and an alternative hypothesis (denoted $H_{1}$). The null hypothesis is the hypothesis we want to test. If $H_{0}$ is rejected, we accept the alternative hypothesis, $H_{1}$.

Typically, one of the following conclusions are reached when testing a hypothesis: $$\begin{array}{rl} \textbf{reject H_{0}} & \text{in favor of } H_{1} \text{ because of sufficient evidence in data or} \\ \textbf{fail to reject H_{0}} & \text{because of insufficient evidence in the data.}\end{array}$$

An analogy is the hypothesis testing done in a those american jury trials we've seen on TV, where the null hypothesis is that the defendant is not guilty (innocent until proven guilty) while the alternative hypothesis is that the defendant is guilty.

Typically, the null hypothesis states the the probability is equal to some value. The alternative hypothesis states that the probability is either higher or lower than the null hypothesis states, or equal to some other value.

Testing a statistical hypothesis

Illustrated with a table! $$\begin{array}{r|ll} \hline \hline & H_{0} \textbf{ is true} & H_{0} \textbf{ is false} \\ \hline \textbf{Do not reject } H_{0} & \text{Correct decision} & \text{Type II error} \\ \textbf{Reject }H_{0} & \text{Type I error} & \text{Correct decision} \\ \hline \end{array}$$

A hypothesis-test-experiment generally has a requirement that the outcome needs to fulfill in order for the experimenters to reject the null hypothesis. The probability of committing a type I error is computed by evaluating the probability that the requirement is exceeded or met given that the null hypothesis is true (there is an example on page 323).

The probability of committing a type II error is impossible to compute unless the alternative hypothesis is specific. If the alternative hypothesis is $p = 1/2$ and the null hypothesis is $p = 1/4$, then the probability of committing a type II error is equal to the probability that the requirement is not met (nor exceeded) given that the alternative hypothesis is true. If the null hypothesis is $p=1/4$ and the alternative hypothesis is $p > 1/4$, it is impossible to compute the probability of committing a type II error.

Hustling those Cumulative Probability Distribution exercises

$$P(X \leq a | X \leq b)$$ Remember how $P(A|B) = \frac{P(A\cap B)}{P(B)}$? We can do the same thing here. $$P((X\leq a) \cap (X\leq b) = \begin{array}{ll} P(X \leq a) && \text{if } a \leq b \\ P(X \leq b) && \text{if } b \leq a \end{array}$$

$$P(X \leq a \cap X \geq b) = \begin{array} P(X = a) && \text{if } a = b \\ P(b \leq X \leq a) && \text{if } a > b \\ 0 && \text{if } b > a \end{array} $$

$$P(X \leq a) = P(X < a) + P(X = a)$$ $$P(X > a) = 1 - P(X \leq a)$$ $$P(X \geq a) = 1 - P(X < a)$$

$$P(a \leq X \leq b) = P(X \leq b) - P(X \leq a)$$ (Assuming $a < b$)

$$P(|X-\mu| > a)$$ This one is fun. How can $X$ differ from the mean by $a$? By being either greater than $a+\mu$ or less than $a-\mu$. Which means that $$P(|X-\mu| > a) = P(X < (\mu-a) \cup X > (\mu+a) = 1 - P(\mu-a < X < \mu+a)$$

Only valid for discrete probability distributions

$$P(X < a) = P(X \leq (a-1))$$ $$P(X = a) = P(X \leq a) - P(X \leq (a-1))$$

Only valid for continuous probability distributions

$$\begin{array} P(X = a) = 0\text{,} && \text{for } a\in \mathbb{R}\end{array}$$ (For continuous probability distributions, the probability that $X$ assumes any single given value is said to be $0$) Which means: $$P(X \leq a) = P(X < a) + P(X = a) = P(X < a) + 0 = P(X < a)$$

Normal distribution

$$P(Z > a) = P(Z < -a)$$ ($Z$ is a normal random variable with mean $\mu = 0$ and variance $\sigma^{2} = 1$)

$$P(X < r) = B(r-1;n,p) = \sum_{x=0}^{r-1}b(x;n,p)$$ $$P(X \leq r) = B(r;n,p) = \sum_{x=0}^{r}b(x;n,p)$$ $$P(X \geq r) = 1 - P(X < r) = 1 - B(r-1;n,p)$$ $$P(a \leq X \leq b) = B(a,b;n,p) = \sum_{x=a}^{b}b(x;n,p)$$ $$P(a \leq X \leq b) = P(X < B) - P(X < a)$$

Written by

Stian Jensen cristea oyvindrobertsen kristiap Alpherino Vages odd Beepboop trmd Zyron paulerikf jesperba fredo_lard
Last updated: Sun, 4 Dec 2022 18:43:02 +0100 .
  • Contact
  • Twitter
  • Statistics
  • Report a bug
  • Wikipendium cc-by-sa
Wikipendium is ad-free and costs nothing to use. Please help keep Wikipendium alive by donating today!