The central limit theorem revisited

Reminder: standard error of the mean:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

We can also standardize (scale) the sample means:

\[ {z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

These z-values follow a standard normal distribution (with mean=0 and sd=1).

If we know \(\mu\) and \(\sigma\), we can define an interval around \(\mu\) such that:

For a sample of size \(n\), we know the sample mean \(\bar(x)\) will occur within an interval with some probability (called confidence),

Putting the central limit theorem to work: Confidence intervals

We can also reverse this logic!
For a sample mean \(\bar{x}\), we can define an interval that includes \(\mu\) with a certain confidence

\[ z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \] \[ P(-z_\frac{\alpha}{2}\leq \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}\leq+z_\frac{\alpha}{2})=1-\alpha \] \[ P(\bar{x}-z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}\leq \mu \leq\bar{x}+z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

Putting the central limit theorem to work: Confidence intervals

\[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

This confidence interval is constructed symmetric around \(\bar{x}\).
With repeated sampling, 95% of similar intervals will contain \(\mu\) (assuming \(\alpha = 0.05\))
Cool! One sample is enough for a good interval estimate of \(\mu\).

But wait - how do I know \(\sigma\)??

The t-distribution

If the standard error is estimated using the sample standard deviation, then standardized sample means follow a Student’s t-distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}} = S.E.M. \] \[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

Relative to the normal distribution, the t distribution is:

Has fatter tails (i.e., is leptokurtic)
Has a shape defined by a location (usually 0) and degrees of freedom (usually \(n-1\))
Converges to a normal as \(n\) approaches \(\infty\)
Functions in R: qt, pt,dt and rt.

Above 100 df, t is basically equivalent to a normal distribution.

Confidence intervals: accuracy, precision, sample size

The t-distribution can be used to construct a confidence interval:

\[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \] \[ P(\bar{x}-t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}}\leq \mu \leq\bar{x}+t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

Allows an interval estimate of \(\mu\) with an estimate for \(\sigma\) derived from the same sample as \(\bar{x}\).

data = read.table("data/Glaciers.txt", header=TRUE)


# radiocarbon in dissolved organic carbon from glacier ice samples 
d14C = data$delta14C.permil

# Give mean and confidence interval with confidence = 0.95

(m = mean(d14C))
## [1] -426.2994

(s = sd(d14C))
## [1] 143.8792

(n = length(d14C))
## [1] 19

alpha = 0.05
accuracy = 1-alpha

(t = qt(p = 1 - alpha/2, df = n-1)) # two-sided, P goes 50/50 to both tails
## [1] 2.100922


(aprec = t*s/sqrt(n))
## [1] 69.34756

aprec # absolute precision = half the CI width
## [1] 69.34756

(rprec = aprec/m*100) # relative precision
## [1] -16.26734


# confidence limits
c(lower = m - aprec, upper = m + aprec)
##     lower     upper 
## -495.6469 -356.9518

Confidence intervals: accuracy, precision, sample size

Confidence = accuracy = the probability close to 1 with which \(\mu\) is included in intervals constructed around \(\bar{x}\)’s.

Confidence intervals: accuracy, precision, sample size

Confidence = accuracy = the probability close to 1 with which \(\mu\) is included in intervals constructed around \(\bar{x}\)’s.
Precision = half the width of the interval (in units of variable). Precision may be expressed relative to the mean as a percentage.

\[ AP= t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

\[ RP= \frac{t_{\frac{\alpha}{2},d.f.}}{\bar{x}} \frac{s}{\sqrt{n}} \]

Confidence intervals: accuracy, precision, sample size

Precision, accuracy and sample size n are interdependent. A low precision (i.e. narrow interval) and high accuracy (e.g. 99% probability of including \(\mu\)) needs a large sample size n.
With knowledge of s (pilot study!), we can pre-set RP, i.e. a needed precision, and a desired level of confidence in order to compute a necessary sample size:

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

# Regard data as pilot study
# Set accuracy and rprec as needed
accuracy = .95
rprec = 0.10 # assume more (or less) precise estimate is needed

alpha = 1-accuracy

n = 2   # left side of formula, low n as starting value
t = qt(1 - alpha/2, n-1)  
(rhs = (t * s/(rprec * m))^2)  # right side of formula
## [1] 1839.071


# right side minus left side
# when diff falls below zero - then enough sampling effort 
diff = rhs - n

while(diff >= 0) {
    n = n + 1
    t = qt(1 - alpha/2, n - 1)
    diff = ((t * s/(rprec * m))^2) - n
}
# sample size needed to hit the required precision and accuracy
n 
## [1] 47

Confidence intervals: accuracy, precision, sample size

Precision, accuracy and sample size n are interdependent. A low precision (i.e. narrow interval) and high accuracy (e.g. 99% probability of including \(\mu\)) needs a large sample size n.
With knowledge of s (pilot study!), we can pre-set RP, i.e. a needed precision, and a desired level of confidence in order to compute a necessary sample size:

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

The point where a function crosses zero is the function’s root. R has a built-in function uniroot that will find a root for us.

f = function(n, alpha = 0.05, rprec = 0.1) {
    t = qt(1 - alpha/2, n-1)
    ((t * s/(rprec * m))^2) - n
}
uniroot(f, lower = 2, upper = 1000)
## $root
## [1] 46.19821
## 
## $f.root
## [1] -1.037638e-05
## 
## $iter
## [1] 8
## 
## $init.it
## [1] NA
## 
## $estim.prec
## [1] 6.103516e-05

Accuracy and precision illustrated

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).

Statistical decision: a decision about the population based on sample information.
- I conclude that the mean of group 1 is larger than group 2
Statistical hypothesis: a statment about the population, usually informed by a research question
- I think the mean of group 1 might be larger than that of group 2

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).

Statistical decision: a decision about the population based on sample information.
- I conclude that the mean of group 1 is larger than group 2
Statistical hypothesis: a statment about the population, usually informed by a research question
- I think the mean of group 1 might be larger than that of group 2

Null hypothesis (\(H_0\)): The default assumption, that any result is observed entirely due to chance
- The means of group 1 and 2 are not different, if \(\bar{x}_1\) and \(\bar{x}_2\) are different it is just due to random chance
Alternative hypothesis (\(H_A\)): any hypothesis that differs from the null.
- The group means are different
- Group 1 is larger than group 2
- Group 1 is smaller than group 2

Statistical decision theory

A decision is made by rejecting/accepting the hypotheses, our aim is to reject (\(H_0\)) and accept (\(H_A\)) (the other direction is harder).
Now, imagine you observe a difference between two samples: \(\bar{x}_1 > \bar{x}_2\)
How can you make a decision?
Look at observed difference in means, and probably also the variability

Type I & II Errors

When making a decision based on (incomplete) sample information you can make an error!

There are two types of error: \(\alpha\) (type I), \(\beta\) (type II)

Type I & II Errors

Type I error: false alarm (or false positive)
- There is no real effect, but you think there is one.
Type II error: missed opportunity (or false negative)
- There is a real effect, but you failed to find it.

Type I & II Errors

Power (\(1- \beta\)): The probability to correctly identify an existing real effect.

Power increases

as the strength of the effect increases (e.g., larger difference between populations)
as the population variance decreases
as we increase alpha
as we increase sample size

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.

To test, we have collected empirical information, i.e. each population is represented by 1 sample (of some size).

Statistical hypothesis testing - strategy

We assume H0 is true.
- We are suspicious, however. Maybe the samples are really different.
- Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did

Statistical hypothesis testing - strategy

We assume H0 is true.
- We are suspicious, however. Maybe the samples are really different.
- Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
Compare \(p\) to a pre-set error threshold (\(\alpha\)).

Statistical hypothesis testing - strategy

We assume H0 is true.
- We are suspicious, however. Maybe the samples are really different.
- Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
Compare \(p\) to a pre-set error threshold (\(\alpha\)).
If \(p < \alpha\) (i.e., our empirical results are VERY unlikely) we decide H0 cannot be true and reject it.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Weight ~ Normal(\(\mu_0\), \(\sigma\))
On the way home between island and mainland we find a single mouse on a drifting log, it is surprisingly large.
Is it from the island? From the mainland? We will use a 1-sample test with a size n = 1.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Weight ~ Normal(\(\mu_0\), \(\sigma\))
On the way home between island and mainland we find a single mouse on a drifting log, it is surprisingly large.
Is it from the island? From the mainland? We will use a 1-sample test with a size n = 1.

data = read.table("data/IslandMice.txt", header=TRUE)

x = data$weight
x1 = 13 # Drifting mouse (x1)
(mu = mean(x)) # population mean of island mice
## [1] 10.04951

(n = length(x)) # population size
## [1] 999

(sigma = sqrt(sum((x - mu)^2)/n)) # population sd (sigma)
## [1] 2.04207

Mechanics of testing illustrated step-by-step: one-sample test

Testable hypotheses:

H0: The ‘new’ mouse belongs to the island population. Its weight is similar to those of other island mice: Its relatively high weight is entirely due to chance.

HA: The ‘new’ mouse does not belong to the island population. Its weight is so high that it must belong to some other mouse population, say from the mainland.

Mechanics of testing: assuming H0 but remaining suspicious

We believe in H0, yet we are suspicious (the mouse is heavy!).
How likely is it to find such a heavy mouse x in the island population? How likely is it to find an even heavier one?
We can directly compute this probability from \(\mu_0\), \(\sigma\) and \(x\).
This probability also corresponds to a standardized mouse weight, which is the test statistic in this case.

Mechanics of testing: assuming H0 but remaining suspicious

We believe in H0 (initial innocence), yet we are suspicious (the mouse is heavy!).
How likely is it to find such a heavy mouse x in the island population? How likely is it to find an even heavier one?
We can directly compute this probability from \(\mu_0\), \(\sigma\) and \(x\).
This probability also corresponds to a standardized mouse weight, which is the test statistic in this case.

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ P(z\geq z_1) = ? \]

Mechanics of testing: evaluating probabilities

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ p = P(z\geq z_1) = ? \]

Assume our population of body weights is normally distributed.

We can compute \(p\) as the integral of the normal PDF.

Mechanics of testing: assessing significance

Now we compare \(p\) to a pre-set threshold significance level \(\alpha\)
- conventionally in ecology \(\alpha = 0.05\)

Mechanics of testing: assessing significance

Now we compare \(p\) to a pre-set threshold significance level \(\alpha\)
- conventionally in ecology \(\alpha = 0.05\)

# z-score for our drifter
(z1 = (x1 - mu)/sigma)
## [1] 1.444853


# Probability to find a mouse as heavy or 
# heavier than x1 in the island population
# first on the standardized (z-) scale
(p = 1 - pnorm(z1, mean = 0, sd = 1))
## [1] 0.07424962

# and again on the original scale
(1 - pnorm(x1, mean = mu, sd = sigma))
## [1] 0.07424962

The probability that a mouse is as heavy as (or heavier than) the drifting mouse is 7.4 %. Therefore, we fail to reject the null hypothesis that the mouse is from the island.

Mechanics of testing: assessing error

For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

Mechanics of testing: assessing error

For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 > z_{crit}\)

Decision:

Reject H0 (and accept HA).
What is the chance of making an error with this decision?
Anytime we observe \(z_1 > z_{crit}\), we take the same risk
Thus, the type I error rate = \(\alpha\) (normally 0.05)

Mechanics of testing: assessing error

For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 < z_{crit}\)

Decision:

We cannot reject H0.
Do we accept it then?
What is the chance of making an error with this decision?
We cannot assess the type II error rate without knowledge of HA!

Mechanics of testing: computing beta and power

A well-defined HA could be: Mouse comes from mainland, where :

\(weight \sim \mathcal{N}(\mu_{main}, \sigma)\) with \(\mu_0 < \mu_{main}\).

A mouse found with \(x < x_{crit}\) (i.e., fail to reject H0) still has a 22% chance to be from the mainland!
Accepting H0 has a high chance of error!

# Read file of mainland mice
data_main = read.table("data/MainlandMice.txt", header=TRUE)

(mu_main = mean(data_main$weight))
## [1] 14.98701

# sd of mainland mice is ~identical to island mice
c(sd(data_main$weight), sigma) 
## [1] 2.00143 2.04207


# critical value on original scale
(x_crit = qnorm(0.95, mu, sigma)) 
## [1] 13.40842


# Type II error rate (orange)
# Probability to find a mouse lighter than xcrit
# under the assumption that HA is right
# (i.e., the mouse comes from the mainland)
(beta = pnorm(x_crit, mu_main, sigma))
## [1] 0.2197506


# Power is just 1 - beta
(1 - beta)
## [1] 0.7802494

Mechanics of testing: controlling decision errors

Should we play it safe by reducing alpha?

This will increase the type II error rate!
A conservative test (small \(\alpha\)) helps avoid false positives (reporting a difference when there is none)
But it increases our chance for a false negative (failing in detecting an existing effect).

Solution: Increase effort = sample with higher sample size!

Mechanics of testing: increasing sample size

What if we find more than one mouse?

A sample of n > 1 allows us to compute test statistics on sample means!

Sampling with higher effort decreases both errors and thus allows a conservative test with more power.

Mechanics of testing: increasing sample size

# Sample of 3 mice: do they belong to the island population?
x_sample = c(12.09, 15.48, 11.76)
(xbar = mean(x_sample))
## [1] 13.11

(SEM = sigma/sqrt(length(x_sample)))
## [1] 1.17899


# Standardised weight of the sample mean
# assuming it belongs to a "population of island means"
(z_sample = (xbar - mu)/SEM)
## [1] 2.595859


# Probability to find a mean at least as great as (or greater than)
# the sample mean (of 3 drifting mice) in the population of island means
(p = 1 - pnorm(z_sample, 0,1))
## [1] 0.004717744

# equivalent to:
(p = 1 - pnorm(xbar, mean = mu, sd = SEM))
## [1] 0.004717744


# Power (1 - beta)
1 - pnorm(x_crit, mean = mu_main, sd = SEM)
## [1] 0.9097045

One-sample z-test

For the mice, we knew the underlying population was normally distributed
Additionally, we knew the population standard deviation \(\sigma\).
The test is known as a z-test (sometimes Gauss test), with test statistic \(z \sim \mathcal{N(0,1)}\)

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

One-sample t-test

For the mice, we knew the underlying population was normally distributed
Additionally, we knew the population standard deviation \(\sigma\).
The test is known as a z-test (sometimes Gauss test), with test statistic \(z \sim \mathcal{N(0,1)}\)

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

Generally, we must estimate \(\sigma\) from a sample
The equivalent statistic using a sample standard deviation is:

\[ t = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

This test statistic follows Student’s t-distribution with degrees of freedom \(\nu = n - 1\):

\[ t \sim \mathcal{T}(\nu) \]

One-sample t-test

data = read.table("data/Crabs.txt", header=TRUE)

# Body temperature of crabs
x = data$body_temp

# null hypothesis: body temp is equal to air temp
# HA: body temp is greater (metabolic heat)
# one-sided test!
air_temp = 24.3

(xbar = mean(x))
## [1] 25.028

(s = sd(x))
## [1] 1.341802

n = length(x)
(SEM = s/sqrt(n))
## [1] 0.2683605


(t = (xbar - air_temp)/SEM)
## [1] 2.712769

(t_crit = qt(p = 0.95, df= n - 1))
## [1] 1.710882

(p = 1 - pt(t, n - 1))
## [1] 0.006072769

One-sample t-test

# Maybe simpler:
# note default alternative="two.sided"
t.test(x, mu = air_temp, alternative = "greater") 
## 
##  One Sample t-test
## 
## data:  x
## t = 2.7128, df = 24, p-value = 0.006073
## alternative hypothesis: true mean is greater than 24.3
## 95 percent confidence interval:
##  24.56887      Inf
## sample estimates:
## mean of x 
##    25.028

One-sample t-test

# confidence interval: 2-sided critical t-value
t_quantile = qt(1 - 0.05/2, df = n - 1)

# re-scale to the original measurement scale
precision = t_quantile * SEM

(interval = xbar + c(-precision, precision))
## [1] 24.47413 25.58187

# Air temperature is outside of this interval!

For 2-sided tests, we can equivalently construct a confidence interval using the quantiles of the t-distribution

Interpretation: if H0 is true (i.e., our population has a mean = \(\mu\), 95% of intervals constructed from similar samples will include \(\mu\).

Our interval does not include \(\mu\), so we can reject the hypothesis that the population mean is \(\mu\)

Steps to conduct a (two-sided) statistical test.

Set \(\alpha\) (before collecting data!)
Define \(H_0\) and \(H_a\)
Collect data \(X\)
Compute the test statistic \(TS\) and if needed the degrees of freedom \(\nu\)
Compute the probability \(P(TS | \nu, H_0)\) (this is the \(p\)-value)
Decide:

Steps to conduct a (two-sided) statistical test.

Set \(\alpha\) (before collecting data!)
Define \(H_0\) and \(H_a\)
Collect data \(X\)
Compute the test statistic \(TS\) and if needed the degrees of freedom \(\nu\)
Compute the probability \(P(TS | \nu, H_0)\) (this is the \(p\)-value)
Decide:
- \(p \ge \alpha\): Fail to reject \(H_0\)
- \(p < \alpha\): Reject \(H_0\)

Significance levels and example statements

One-sided vs two-sided hypotheses

Parametric tests

A test is parametric if it assumes that the underlying population comes from a known parameterized probability distribution
Often this distribution is normal, but many others are also used
Note that the distribution is a population parameter!
Checking a sample for normality is useful, but not diagnostic!

Normality checks

Graphical checks

Histograms with normal density curves

Normality checks

Graphical checks

Histograms with normal density curves

Compare mean and median

Normality checks

Graphical checks

Boxplots can compare multiple groups simultanously
- Careful! Easy to see skew, but harder to see kurtosis

Normality checks

Graphical checks

QQ plots are perhaps the most useful diagnostic
Compare observed quantiles (i.e., z-score of a sample) to those from a standard normal distribution
Expectation is that samples fall along the line
use qqnorm and qqline in R to make the plots

Normality tests

It is tempting to seek a formal test for normality
Example: shapiro.test(x) evaluates the null hypothesis that x comes from a normally distributed population
Resist this temptation!
Remember that if we fail to reject H0, that doesn’t mean we accept H0
These tests have low power with small samples: often we fail to reject H0, but type II error rate is high!
If power is high (high n), the test is very sensitive. Small departures from normality will be significant.
But most parametric tests work just fine with small departures from normality when sample size is large
This test will cause us to often make incorrect decisions!

Normality tests

Conclusions/advice:

shapiro.test and similar tests are very commonly used and almost completely useless. Forget them!
Graphically evaluate your data
If graphically obviously non-normal, consider the robustness of the chosen test, or transformations
Fall back to other distributions or to non-parametric tests

F-distribution

Another thought experiment:

Take two samples from a population (\(\mathcal{N}[\mu, \sigma]\))
Calculate \(s_1^2\) (from sample 1 with \(n_1\)) and Calculate \(s_2^2\) (from sample 2 with \(n_2\))
Calculate test statistic F: \[ F=\frac{s_1^2}{s_2^2} \]
Repeat 1.-3. and build distribution of F-values

Since \(s_1^2\) and \(s_2^2\) are from 2 samples from the same population, they are both estimates for \(\sigma^2\), we thus expect \(F\approx1\).

The F-distribution is asymmetrical. Its shape is determined by \(df_1=n_1-1\) and \(df_2=n_2-1\). There is a separate F-distribution for each combination of \(df_1\) and \(df_2\).

F-test: Variance homogeneity?

H0: The sample variances estimate the same parametric variance. Or: \(\sigma_1^2 = \sigma_2^2\)

Variance homogeneity = homoscedasticity

HA: The sample variances estimate different parametric variances. Or: \(\sigma_1^2 \neq \sigma_2^2\)

Variance heterogeneity = heteroscedasticity

Only use this to test variance heterogeneity (HA). Do not use it to check whether two samples have the same variance (same problems as Shapiro test)!

Standard tests for location

Considerations:

Parametric (populations follow a known distribution) vs non-parametric
- Parametric tests are more powerful
Number of groups/samples: 2 or more than 2
Independence of samples
- Independent samples do not depend on each other, were independently collected.
- Dependent/paired samples depend on each other; e.g., the same subjects measured multiple times.
- Tests for dependent samples are generally more powerful.

Standard tests for location

Two-sample tests: independent t-test

Tests for differences of means between two independent samples
H0: Samples are from populations with the same mean \(\mu\).
Student’s test assumes populations have equal variance \(\sigma\)
- Very difficult to evaluate, so this test should be used rarely

\[ \begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{2}{(n_1+n_2)}}} \\ \\ s_p &= \sqrt{\frac{s^2_1 + s^2_2}{2}} \\ \nu & = n_1 + n_2 - 2 \end{align} \]

Two-sample tests: independent t-test

Tests for differences of means between two independent samples
H0: Samples are from populations with the same mean \(\mu\).
Student’s test assumes populations have equal variance \(\sigma\)
- Very difficult to evaluate, so this test should be used rarely
Welch’s test relaxes this assumption
- Default in R (t.test)
- Power approaches Student’s test if variances are similar and sample sizes are large

\[\begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s^2_1}{n_1}}+\sqrt{\frac{s^2_2}{n_2}}} \\ \\ \nu & \approx \frac{\left( \frac{s^2_1}{n_1} + \frac{s^2_2}{n_2} \right )^2} {\frac{s^4_1}{n^2_1\nu_1} + \frac{s^4_2}{n^2_2\nu_2}} \\ \end{align}\]

Two-sample tests: paired t-test

Tests for differences of means between two paired samples.
Exactly equal to a one-sample t-test on the difference in the samples, on the null hypothesis that the mean of differences is zero.

x1 = c(gandalf = 6, saruman = 4, arwen = 7, frodo = 3)
x2 = c(gandalf = 4, saruman = 3, arwen = 5, frodo = 2)
x1 - x2
## gandalf saruman   arwen   frodo 
##       2       1       2       1


t.test(x1, x2, paired = TRUE)
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean difference 
##             1.5

t.test(x1 - x2)
## 
##  One Sample t-test
## 
## data:  x1 - x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean of x 
##       1.5

Nonparametric location tests

These tests search for a difference in medians
Convert data to ordinal scale (i.e., rank data)
Use for small (ish) samples when population normality cannot be assumed
Loss of information == low statistical power!

Independent samples: Mann-Whitney U-test

Dependent samples: Wilcoxon test

rank(c(x1, x2))
## gandalf saruman   arwen   frodo gandalf saruman   arwen   frodo 
##     7.0     4.5     8.0     2.5     4.5     2.5     6.0     1.0


wilcox.test(x1, x2)
## Warning in wilcox.test.default(x1, x2): cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x1 and x2
## W = 12, p-value = 0.3065
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(x1, x2, paired = TRUE)
## Warning in wilcox.test.default(x1, x2, paired = TRUE): cannot compute exact
## p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x1 and x2
## V = 10, p-value = 0.09467
## alternative hypothesis: true location shift is not equal to 0

Univariate Statistics

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

Putting the central limit theorem to work: Confidence intervals

Putting the central limit theorem to work: Confidence intervals

The t-distribution

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

Accuracy and precision illustrated

Statistical decision theory

Statistical decision theory

Statistical decision theory

Type I & II Errors

Type I & II Errors

Type I & II Errors

Statistical hypothesis testing - strategy

Statistical hypothesis testing - strategy

Statistical hypothesis testing - strategy

Statistical hypothesis testing - strategy

Statistical hypothesis testing - strategy

Mechanics of testing illustrated step-by-step: one-sample test

Mechanics of testing illustrated step-by-step: one-sample test

Mechanics of testing illustrated step-by-step: one-sample test

Mechanics of testing: assuming H0 but remaining suspicious

Mechanics of testing: assuming H0 but remaining suspicious

Mechanics of testing: evaluating probabilities

Mechanics of testing: assessing significance

Mechanics of testing: assessing significance

Mechanics of testing: assessing error

Mechanics of testing: assessing error

Mechanics of testing: assessing error

Mechanics of testing: assessing error

Mechanics of testing: computing beta and power

Mechanics of testing: controlling decision errors

Mechanics of testing: increasing sample size

Mechanics of testing: increasing sample size

One-sample z-test

One-sample t-test

One-sample t-test

One-sample t-test

One-sample t-test

Steps to conduct a (two-sided) statistical test.

Steps to conduct a (two-sided) statistical test.

Significance levels and example statements

One-sided vs two-sided hypotheses

Parametric tests

Normality checks

Graphical checks

Normality checks

Graphical checks

Normality checks

Graphical checks

Normality checks

Graphical checks

Normality tests

Normality tests

F-distribution

F-test: Variance homogeneity?

Standard tests for location

Considerations:

Standard tests for location

Two-sample tests: independent t-test

Two-sample tests: independent t-test

Two-sample tests: paired t-test

Nonparametric location tests