TMA4240: Statistics
Words and terms and stuff
Intersection | Snitt | $\cap$
The intersection of two events
If you're familiar with logic, think of
Example: Let
Mutual exclusion | disjoint sets | disjunkte mengder
Two events
Union | $\cup$
The union of two events
If you are familiar with logic, think of
Balanced object | Balansert objekt
An object (e.g. a die or a coin) is balanced if all possible outcomes are equally likely when it is rolled/flipped/thrown.
Discrete sample space | Diskrete utfallsrom
A discrete sample space contains either a finite number of possibilities or an unending sequence with as many elements as there are whole numbers.
Typically represents count data, such as the number of heads after
Continuous sample space | Kontinuerlig utfallsrom
A continuous sample space contains an infinite number of possibilities equal to the number of points on a line segment. Note that between any two different points on a line segment there are an infinite number of points.
Typically represents measured data, such as height or weight.
Degrees of freedom | Frihetsgrader
"The degrees of freedom is the number of values in the final calculation of a statistic that are free to vary."
It is the quantity
It could also be the parameter
Population | Populasjonen | Univers
The population (often of an experiment) is the totality of observations with which we are concerned.
The number of observations in the populations is defined to be the size of the population.
Each observation in a population is a value of a random variable
Sample | Utvalg
A sample is a subset of a population.
Statistic
A statistic is a function of the random variables that constitute a random sample.
Little blue book | Tabeller og formler i statistikk
A little blue book titled "Tabeller og formler i Statistikk", available from Akademika, which one is allowed to bring to this course's exam. Its ISBN is 978-82-519-1595-3.
Note that the one used at NTNU is a 2011-revision by NTNU's IMF institute of the original 2001-edition by Kvaløy, Jan Terje - Tjelmeland, Håkon.
Combinatorics | Kombinatorikk
Multiplication rule | Rule of product | Multiplication principle
More information: Wikipedia
"The fundamental principle of counting," states that if an operation can be performed in
Permutation | Permutasjon | $_{n}P_{r}$
More information: Wikipedia
A permutation is an ordered arrangement of all or part of the members of a set.
Example: The set
A set containing
Example:
Used when/for: Selecting people (distinct objects) from a group/set,
A circular permutation is an circular arrangement of objects. There are
Distinguishing things. If a set consists of
Combination | Kombinasjon | $_{n}C_{r}$
More information: Wikipedia
"A combination is an unordered permutation." - Lao Tze-Te, fictional chinese statistician/tea-leaf interpreter.
The number of combinations of
In general, when one is calculating a series of possible combinations (poker hands, anyone?) the order is irrelevant but each selection of
Note: While a combination ignores the elements' ordering, a series of selections from the same source (i.e. a card deck) has to be calculated from most specific to least specific, as the more specific selections tend to be subsets of the least specific selections. For example, any specific combination of
Example: You want to pull some cards out of a standard issue playing card deck at random without caring about the order. A standard card deck has
What if we wanted both cards to be of the same suit, but didn't care about their rank?
What if we wanted one of the cards to be a spade but didn't care about the other cards' suit? There are
But what if we want one spade, and one non-spade?
Used for/when: Playing cards (poker hands and such), tilfeldig trekk uten tilbakelegging.
Probability | Sannsynlighet
The probability of an event,
If each and every possible outcome is equally likely, then the probability of a specific event (
Example: What's the probability of drawing a spade from a deck of cards? There are
The probability that at least one of two events occurs is denoted by
Complementary events | Complement
Additive rule | addisjonssetningen
If
However if
Additive rule for three events,
Random Variables and Probability Distributions
Random Variable | stochastic variable | stokastisk variabel
A random variable is a function that associates a real number with each element in the sample space. Think of it as a variable in an experiment: if we sample a population (e.g. we select a bunch of people from Oslo at random), their height, weight, eye colour and feelings towards fish are all random variables. Anything that can vary between sample points can called a random variable.
In this course, random variables are usually denoted by capital letters (e.g.
Example: The sample space describing the possible outcomes of flipping a balanced coin -- that is a coin which is equally likely to land as heads or tails when flipped -- is
Discrete Probability Distributions | Diskrete sannsynlighetsfordelinger
"A discrete random variable assumes each of its values with a certain probability."
Example: If we toss a coin three times and denote
Probability function | Probability mass function | Probability distribution | Sannsynlighetsfordeling
The set of ordered pairs
$f(x)\geq 0$ $\sum_{x}f(x) = 1$ $P(X = x) = f(x)$
But what does this mean?
And the points on the list means:
- The probability of
$X$ assuming any possible$x$ is equal to or greater than zero. Negative probability is not covered by this course., so we assume it's impossible. - The probability of
$X$ not assuming any of the possible values$x$ is$0$ ; If we add up the probability of$X$ assuming each and every possible value$x$ the answer should be$1$ . In other words$X$ should not be able to assume any value not covered by$x$ , nor should the sum of the probability of each possible outcome exceed$1$ . $f(x)$ is shorthand for$P(X = x)$ . We could use$g(x)$ instead, or whatever you want to. Typically the corresponding lower case letter is used if a random variable is denoted by a capital letter.
Note: in the book,
Example: Susie receives a shipment of
Conditional probability and other stuff in within the probability distribution expression
Note that this is NOT the same as conditional distribution
What do you do if you're asked to find
(discrete) Cumulative distribution function | (diskret) Kumulativ fordelingsfunksjon
What if we want to know the probability of our random variable assuming a value less than or equal to some possible value of
The cumulative distribution function
Example: Returning to our friend Susie in the previous example (the gal with the bananas and apples), what is the probability of Susie selecting
Continuous Probability Distributions | Kontinuerlig sannsynlighetsfordeling
"A continuous random variable has a probability of
Wait, what?
Assume we have a device which allows us to accurately measure the weight of an object in kilograms to twenty decimal places.
Further, assume this device is being used to weigh every single adult above
The values of continuous random variables are plotted as a graph. The probability that the random variable assumes some value within a given range is equal to the area beneath the graph within the range.
Probability density function (pdf) | density function | tetthetsfunksjon
The function
$f(x) \geq 0\text{, for all } x \in \mathbb{R}\text{.}$ $\int_{-\infty}^{\infty}f(x)dx = 1\text{.}$ $P(a < X < b) = \int_{a}^{b} f(x)dx\text{.}$
Which means
- Negative probabilities are not covered by this course.
- It is not possible for
$X$ to assume a value outside the scope of$f(x)$ . - When we write
$P(a < X < b)$ it is shorthand for$\int_{a}^{b} f(x)dx$ .
Note: When
(continuous) Cumulative distribution function | (kontinuerlig) Kumulativ fordelingsfunksjon
The cumulative distribution function
Which allows us to write
Joint Probability Distributions | Simultan sannsynlighetsfordeling
Joint probability distribution (discrete case) | Simultan sannsynlighetsfordeling (diskret tilfelle)
For two discrete random values
$f(x,y) \geq 0$ for all$(x,y)$ ,$\sum \sum f(x,y) = 1,$ $P(X = x, Y = y) = f(x,y)$
Joint density function (continuous case)| Simultan tetthetsfunksjon (kontinuerlig tilfelle)
For continuous random values
$f(x,y) \geq 0$ for all$(x,y)$ ,$\int \int f(x,y) = 1,$ $P(X = x, Y = y) = f(x,y)$
Marginal distributions | marginalfordeling
The marginal distribution of
In other words, if we have a joint probability distribution or joint density function we can obtain the probability function or distribution for either of the random variables by summing or integrating over the values of the other random variable.
The marginal distributions of
Expected value of a random variable | mean | forventningsverdi | $\mu$ | $E(X)$
Also known as: mean of the random variable
Let
What if we have a random variable that depends on the random variable
Expected value of random variables with joint probability distribution
If
Note: if you are asked to find
Expected value/mean of linear combinations of random variables
If
Also:
Variance | varianse | $\sigma^{2}$
The variance of the random variable
The variance of a random variable
Variance of the random variable g(X)
If
Standard Deviation | standardavvik | $\sigma$
The standard deviation of a random variable
It shows how much variation exists from the mean/expected value. A low standard deviation indicates that the data points tend to be very close to the mean, while a high standard deviation indicates that the data points are spread out over a large range of values.
Covariance | kovarianse | Cov(X,Y)
The covariance of two random variables
Statistical indepedence | uavhengighet
Discrete Probabaility Distributions | Diskret sannsynlighetsfordelinger
A note on probability distributions and cumulative probability distributions:
Most often we are asked to find
However, the formulas and definitions given for the various probability distributions are almost always of the type
The Bernoulli Process
More information: Wikipedia
Bernoulli was a 18th century Swiss science-guy after whom the Bernoulli Process is named. Bernoulli's encounters with women were the first Bernoulli trials, and he'd divide them into two categories: "I totally hit that" and "haven't hit it yet". Today, it is more common to use a more PC example when introducing students to the Bernoulli Process, such as a coin flip.
Experiments often consist of repeated trials, each with two possible outcomes that could be labeled success or failure;
Strictly speaking, a Bernoulli process is a finite or infinite sequence of independent random variables
- For each
$i$ , the value of$X_{i}$ is either$0$ or$1$ .- (The experiment consists of repeated trials with the outcome
$X_{i}$ which is either$0$ or$1$ , i.e. the outcome is binary. We are free to redefine$1$ as "success" and$0$ as "failure" or$1$ as "vagina" and$0$ as "cats".)
- (The experiment consists of repeated trials with the outcome
- For all values of
$i$ , the probability that$X_{i} = 1$ is the same number$p$ .- (The probability of either of the two possible outcomes remain constant from trial to trial: this implies that the trials are independent.)
Binomial Distribution | Binomialfordeling | $b(x;n,p)$
A binomial distribution is the probability distribution of the discrete random variable
The probability distribution of the binomial random variable
Note that
Cumulative distribution function for Binomial distributions | binomial sums
We can find
Mean and variance of binomial distributions
Multinomial distribution | multinomisk fordeling
If we allow the trials of a binomial experiment to have more than two possible outcomes, it becomes a multinomial experiment.
If a given trial can result in one of the
Hypergeometric Distribution
Hypergeometric distributions are kind of like binomial ones except that they don't require independence between trials, which means that we can do stuff like draw from a deck of cards without replacing the card (and shuffling the deck) between each draw.
A hypergeometric distribution is the probability distribution of a hypergeometric random variable,
- A random sample of size
$n$ is selected without replacement from$N$ items - Of the
$N$ items,$k$ may be classified as successes and$N-k$ are classified as failures. As previously stated, the hypergeometric random variable$X$ is the number of successes in such an experiment.
The values of a hypergeometric distribution are denoted by
Mean and variance of a the hypergeometric distribution
Negative binomial distribution | negativ-binomisk fordeling
The negative binomial distribution is the probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occurs. Or the number of failures before a specified number of successes occur.
For example, we could throw a six sided die until we get a five for the third time. The probability distribution of the number of non-fives that we got will be negative binomial.
If repeated independent trials can result in success with probability
Example: Team
If we wanted to find the probability of team
Geometric distribution | Geometrisk fordeling
The geometric distribution is a special case of the negative binomial distribution where
The values of the geometric distribution is denoted by
Example: We are flipping a balanced coin. What is the probability that we don't get heads until the fifth flip?
Mean and variance of a geometric distribution
Note that the mean and variance of a random variable following the geometric distribution are:
Poisson Distribution
The Poisson distribution expresses the probability of a given number of events occuring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.
Given only the average rate of some event occuring for a certain period of observation (for example that Bob gets punched four times a day) and assuming that the process that produces the event is essentially random, the Poisson distribution specifies how likely it is that Bob will get punched 2, or 5, or 10, or any other number, during one period of observation.
The probability distribution of the Poisson random variable
Note that the little blue book uses
Variance and mean of the Poisson distribution
Continuous Probability Distributions | kontinuerlige sannsynlighetsfordelinger
Continuous Uniform Distribution | Kontinuerlig uniformfordeling
Normal Distribution | Gauss distribution | Normalfordeling | Gaussfordeling
The normal distribution is super double important.
The density of a normal random variable
Did you also notice how the table of values for the normal distribution in the little blue book only covers
Tasked with finding the probability that the normal random variable
If tasked to find
Note: This "formula" is in the little blue book on page 31 (the map of relations between probability distributions), above the arrow going from Normal(
Normal approximation to the Binomial
If
In other words, if you have a binomial random variable
Continuity correction:
(Long version) Note that since continuous random variables have a probability of
Check out page 189-90 in the book (9e) for an explanation with pictures.
(Short version) if
Quality of the approximation:
If
Gamma Distribution | Gammafordeling | $\Gamma$
The continuous random variable
The mean and variance of the gamma distribution are
(Note that
The gamma function is defined by
Properties of the gamma function
So yeah, the Gamma function is pretty dope and you better hope we don't have to do any calculations with it by hand.
Exponential distribution | Eksponensialdistribusjon
The exponential distribution is a special case of the gamma distribution where
The continuous random variable
The mean and variance of the exponential distribution are
The exponential function is pretty closely related to the Poisson process. This is apparently super useful. Check out page 196.
Chi-Squared Distribution | Kjikvadratfordeling | $\chi^{2}$ -distribution
The Chi-Squared distribution is another special case of the gamma distribution where
The continuous random variable
The mean and variance are
Degrees of freedom and the chi-squared distribution
For every piece of information we estimate based on supplied information, a degree of freedom is lost. In this course it would seem like we never lose more than one degree of freedom, and the whole thing is succinctly summarized on page 27 of the little blue book.
Functions of Random Variables
Transformation of Variables
Suppose that
What does this mean?
If we have a discrete random variable
Suppose that
Examples of one-to-one transformations
Example:
Let
Moment-generating functions | Momentgenererende funksjoner
The moment-generating function of a random variable is an alternative specification of its probability distribution.
The moment-generating function of the random variable
Neat properties of moment-generating functions
If
What is moment?
More information: Wikipedia
"Loosely speaking it is a quantative measure of the shape of a set of points."
Moments are ordered.
The function
The
Beyond that it's all a bit diffuse. The course only covers the formula for moment-generating functions.
Sampling Distributions
Random sample | tilfeldig utvalg
Random sampling means that a subset of the population is selected at random to elminiate any possibility of bias in the sampling procedure.
Let
Since the
If
Sample mean | Gjennomsnitt
The sample mean is the numerical average of the observations in a sample
Sample median
The sample median is the middle value of the observations in a sample.
It is obtained by sorting the values in either ascending or ascending order and then selecting the middle value (if the sample size is odd) or the average of the two values closest to the middle (if the sample size is even)
Sample mode | typetall | modus | modalverdi
The sample mode is the most occuring value in a sample.
Sample variance | varianse
The sample variance is defined as the average of the squares of the deviations of the observations from their mean.
If
We could also write the sample variance as
Sample standard deviation | standardavvik
The standard deviation is still the positive square root of the sample variance.
Sample range | variasjonsbredde
The sample range is the difference between the largest and smallest observed values.
Sampling distribution
The probability distribution of a statistic is called a sampling distribution.
The sample average (
The sampling distribution of
The central limit theorem
This is some hot shit right here.
If
According to the book, if
Note: The book says (pg 263) that the Central Limit Theorem cannot be used unless
Example:
You are told that the variance of a population is presumed to be
Central limit theorem for two populations
If independent samples of size
Comparing two populations
(this is a rewrite of example 8.6)
We have two populations with the following information:
We find
Sampling distribution of $S^{2}$
If
In other words, "the sampling distribution of
The values of the random variable
The probability that a random sample produces a variable with a
How to read the
So, given a
Example:
Given a random sample, from a population with a presumed standard deviation of
And then we look it up in the table and see that a
t-distribution | student t-distribution | t-fordeling
The
If our random sample was selected from a normal population we can write
Let
A
Finding t-values
The
Estimation problems
Estimators
A point estimate of some population parameter
Similarly,
An estimator
If we consider all possible unbiased estimators of some parameter
Confidence intervals | konfidensintervall
A
The general formulation of a request for a confidence interval is
If
Example:
How large should the sample size be?
Confidence intervals get better as the sample size increases. Ideally, all measurements would be made on the entire population. However that is usually either unfeasible or, like, a bunch of work, but we still might want to know how large our sample size should be in order to be certain that the confidence interval is tight (note that "tight" is not an actual statistical term).
Luckily, if
Finding a confidence interval
On
Example:
An estimate,
Since the variance is known, we can use the above method.
One-sided confidence bounds on
On
Prediction interval | prediksjonsintervall
A prediction interval is a prediction of the possible value of a future observation.
It is something we do if we don't know what
Prediction intervals can be used to determine if an observation is an outlier.
Maximum likelihood estimator | Sannsynsmaksimeringsestimator
Given independent observations
This is however, not that useful, as computing the MLE from this form can be quite challenging. To overcome this hurdle we use a trick which is quite simply to take the logarithm of our likelihood function
Using this trick allows us to rewrite our likelihood function into a more computable form by rewriting it as a sum. This is a valid approach because the logarithm is a strictly increasing function, which means maximizing this function is equivalent to maximizing our original equation.
Now simply (or not so simply) solve the equation
Hypotheses testing | hypothesis | hypoteser | hypotesetesting
A statistical hypothesis is an assertion or conjecture concerning one or more populations. The absolute truth or falsity of a statistical hypothesis cannot be known unless the entire population is examined. Which has been shown to be a major hassle. So instead we take a random sample and see if it either supports or refutes our hypothesis.
The null hypothesis | null-hypotesen
When performing a hypothesis test we present two mutually exclusive hypothesis: a null hypothesis (denoted
Typically, one of the following conclusions are reached when testing a hypothesis:
An analogy is the hypothesis testing done in a those american jury trials we've seen on TV, where the null hypothesis is that the defendant is not guilty (innocent until proven guilty) while the alternative hypothesis is that the defendant is guilty.
Typically, the null hypothesis states the the probability is equal to some value. The alternative hypothesis states that the probability is either higher or lower than the null hypothesis states, or equal to some other value.
Testing a statistical hypothesis
Illustrated with a table!
A hypothesis-test-experiment generally has a requirement that the outcome needs to fulfill in order for the experimenters to reject the null hypothesis. The probability of committing a type I error is computed by evaluating the probability that the requirement is exceeded or met given that the null hypothesis is true (there is an example on page 323).
The probability of committing a type II error is impossible to compute unless the alternative hypothesis is specific.
If the alternative hypothesis is