Univariate Statistics
Lauren Talluto
14.01.2025
The central limit theorem revisited
- Take a variable
x
that is normally distributed.
The central limit theorem revisited
- Take a variable
x
that is normally distributed.
- We can take a small sample (\(n =
3\)), and compute the mean (vertical bar).
The central limit theorem revisited
- Take a variable
x
that is normally distributed.
- We can take a small sample (\(n =
3\)), and compute the mean (vertical bar).
- We can repeat this many times!
The central limit theorem revisited
- Take a variable
x
that is normally distributed.
- We can take a small sample (\(n =
3\)), and compute the mean (vertical bar).
- The mean of all of those sample means will converge on \(\bar{x}\)!
- The standard deviation of sample means is the standard error
of the mean
- Increasing sample size will reduce the standard error!
The central limit theorem revisited
Reminder: standard error of the mean:
\[
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
\]
We can also standardize (scale) the sample means:
\[
{z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}
\]
These z-values follow a standard normal distribution
(with mean=0 and sd=1).
The central limit theorem revisited
Reminder: standard error of the mean:
\[
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
\]
We can also standardize (scale) the sample means:
\[
{z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}
\]
These z-values follow a standard normal distribution
(with mean=0 and sd=1).
If we know \(\mu\) and \(\sigma\), we can define an interval around
\(\mu\) such that:
- For a sample of size \(n\), we know
the sample mean \(\bar(x)\) will occur
within an interval with some probability (called
confidence),
Putting the central limit theorem to work: Confidence intervals
- We can also reverse this logic!
- For a sample mean \(\bar{x}\), we
can define an interval that includes \(\mu\) with a certain
confidence
\[
z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}
\] \[
P(-z_\frac{\alpha}{2}\leq
\frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}\leq+z_\frac{\alpha}{2})=1-\alpha
\] \[
P(\bar{x}-z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}\leq \mu
\leq\bar{x}+z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}})=1-\alpha
\] \[
CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}
\]
Putting the central limit theorem to work: Confidence intervals
\[
CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}
\]
This confidence interval is constructed
symmetric around \(\bar{x}\).
With repeated sampling, 95% of similar intervals will contain
\(\mu\) (assuming \(\alpha = 0.05\))
Cool! One sample is enough for a good
interval estimate of \(\mu\).
But wait - how do I know \(\sigma\)??
The t-distribution
If the standard error is estimated using the sample
standard deviation, then standardized sample means
follow a Student’s t-distribution:
\[
\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}} =
S.E.M.
\] \[
{t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}}
\]
Relative to the normal distribution, the t distribution is:
- Has fatter tails (i.e., is leptokurtic)
- Has a shape defined by a location (usually 0) and
degrees of freedom (usually \(n-1\))
- Converges to a normal as \(n\)
approaches \(\infty\)
- Functions in R:
qt
, pt
,dt
and
rt
.

Above 100 df, t is basically equivalent to a normal distribution.
Confidence intervals: accuracy, precision, sample size
The t-distribution can be used to construct a confidence
interval:
\[
{t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}}
\] \[
P(\bar{x}-t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}}\leq \mu
\leq\bar{x}+t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}})=1-\alpha
\] \[
CI: \bar{x}\pm t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}}
\]
Allows an interval estimate of \(\mu\) with an estimate for \(\sigma\) derived from the same sample as
\(\bar{x}\).
data = read.table("data/Glaciers.txt", header=TRUE)
# radiocarbon in dissolved organic carbon from glacier ice samples
d14C = data$delta14C.permil
# Give mean and confidence interval with confidence = 0.95
(m = mean(d14C))
## [1] -426.2994
(s = sd(d14C))
## [1] 143.8792
(n = length(d14C))
## [1] 19
alpha = 0.05
accuracy = 1-alpha
(t = qt(p = 1 - alpha/2, df = n-1)) # two-sided, P goes 50/50 to both tails
## [1] 2.100922
(aprec = t*s/sqrt(n))
## [1] 69.34756
aprec # absolute precision = half the CI width
## [1] 69.34756
(rprec = aprec/m*100) # relative precision
## [1] -16.26734
# confidence limits
c(lower = m - aprec, upper = m + aprec)
## lower upper
## -495.6469 -356.9518
Confidence intervals: accuracy, precision, sample size
- Confidence = accuracy = the probability close to 1 with which \(\mu\) is included in intervals constructed
around \(\bar{x}\)’s.
Confidence intervals: accuracy, precision, sample size
- Confidence = accuracy = the probability close to 1 with which \(\mu\) is included in intervals constructed
around \(\bar{x}\)’s.
- Precision = half the width of the interval (in units of variable).
Precision may be expressed relative to the mean as a percentage.
\[
AP= t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}}
\]
\[
RP= \frac{t_{\frac{\alpha}{2},d.f.}}{\bar{x}} \frac{s}{\sqrt{n}}
\]
Confidence intervals: accuracy, precision, sample size
- Precision, accuracy and sample size n are interdependent. A
low precision (i.e. narrow interval) and high accuracy
(e.g. 99% probability of including \(\mu\)) needs a large sample size n.
- With knowledge of s (pilot study!), we can pre-set RP, i.e. a needed
precision, and a desired level of confidence in order to compute a
necessary sample size:
\[
n=
\left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2
\]
We must solve this formula iteratively (\(n\) is on both sides)
# Regard data as pilot study
# Set accuracy and rprec as needed
accuracy = .95
rprec = 0.10 # assume more (or less) precise estimate is needed
alpha = 1-accuracy
n = 2 # left side of formula, low n as starting value
t = qt(1 - alpha/2, n-1)
(rhs = (t * s/(rprec * m))^2) # right side of formula
## [1] 1839.071
# right side minus left side
# when diff falls below zero - then enough sampling effort
diff = rhs - n
while(diff >= 0) {
n = n + 1
t = qt(1 - alpha/2, n - 1)
diff = ((t * s/(rprec * m))^2) - n
}
# sample size needed to hit the required precision and accuracy
n
## [1] 47
Confidence intervals: accuracy, precision, sample size
- Precision, accuracy and sample size n are interdependent. A
low precision (i.e. narrow interval) and high accuracy
(e.g. 99% probability of including \(\mu\)) needs a large sample size n.
- With knowledge of s (pilot study!), we can pre-set RP, i.e. a needed
precision, and a desired level of confidence in order to compute a
necessary sample size:
\[
n=
\left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2
\]
We must solve this formula iteratively (\(n\) is on both sides)
The point where a function crosses zero is the function’s
root. R has a built-in function uniroot
that will find a root for us.
f = function(n, alpha = 0.05, rprec = 0.1) {
t = qt(1 - alpha/2, n-1)
((t * s/(rprec * m))^2) - n
}
uniroot(f, lower = 2, upper = 1000)
## $root
## [1] 46.19821
##
## $f.root
## [1] -1.037638e-05
##
## $iter
## [1] 8
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05
Accuracy and precision illustrated

Statistical decision theory
… at the heart of inferential statistics: learning about the
population from sample(s).
- Statistical decision: a decision about the
population based on sample information.
- I conclude that the mean of group 1 is larger than group 2
- Statistical hypothesis: a statment about the
population, usually informed by a research question
- I think the mean of group 1 might be larger than that of group
2
Statistical decision theory
… at the heart of inferential statistics: learning about the
population from sample(s).
- Statistical decision: a decision about the
population based on sample information.
- I conclude that the mean of group 1 is larger than group 2
- Statistical hypothesis: a statment about the
population, usually informed by a research question
- I think the mean of group 1 might be larger than that of group
2
- Null hypothesis (\(H_0\)): The default assumption, that any
result is observed entirely due to chance
- The means of group 1 and 2 are not different, if \(\bar{x}_1\) and \(\bar{x}_2\) are different it is just due to
random chance
- Alternative hypothesis (\(H_A\)): any hypothesis that differs from
the null.
- The group means are different
- Group 1 is larger than group 2
- Group 1 is smaller than group 2
Statistical decision theory
A decision is made by rejecting/accepting the hypotheses, our aim
is to reject (\(H_0\)) and accept
(\(H_A\)) (the other direction is
harder).
Now, imagine you observe a difference between two samples: \(\bar{x}_1 > \bar{x}_2\)
How can you make a decision?
Look at observed difference in means, and probably also the
variability
Type I & II Errors
When making a decision based on (incomplete) sample
information you can make an error!
There are two types of error: \(\alpha\) (type I), \(\beta\) (type II)

Type I & II Errors
- Type I error: false alarm (or false
positive)
- There is no real effect, but you think there is one.
- Type II error: missed opportunity (or false
negative)
- There is a real effect, but you failed to find it.
Type I & II Errors
Power (\(1- \beta\)): The
probability to correctly identify an existing real effect.
Power increases
- as the strength of the effect increases (e.g., larger difference
between populations)
- as the population variance decreases
- as we increase alpha
- as we increase sample size
Statistical hypothesis testing - strategy
Consider the example from before as a classical situation of a two
sample-test:
H0: Two populations do not differ. An eventually
observed difference obtained from two samples is entirely due to chance.
If the studied property is naturally variable, then samples will never
be identical.
HA: Two populations differ. An observed difference
between two samples reflects this difference, the samples were collected
from two different underlying populations.
Statistical hypothesis testing - strategy
Consider the example from before as a classical situation of a two
sample-test:
H0: Two populations do not differ. An eventually
observed difference obtained from two samples is entirely due to chance.
If the studied property is naturally variable, then samples will never
be identical.
HA: Two populations differ. An observed difference
between two samples reflects this difference, the samples were collected
from two different underlying populations.
To test, we have collected empirical information, i.e. each
population is represented by 1 sample (of some size).
Statistical hypothesis testing - strategy
- We assume H0 is true.
- We are suspicious, however. Maybe the samples are really
different.
- Thus we compute \(p\), the
probability that, if H0 is true, we would get a result as
extreme as we did
Statistical hypothesis testing - strategy
- We assume H0 is true.
- We are suspicious, however. Maybe the samples are really
different.
- Thus we compute \(p\), the
probability that, if H0 is true, we would get a result as
extreme as we did
- Compare \(p\) to a pre-set error
threshold (\(\alpha\)).
Statistical hypothesis testing - strategy
- We assume H0 is true.
- We are suspicious, however. Maybe the samples are really
different.
- Thus we compute \(p\), the
probability that, if H0 is true, we would get a result as
extreme as we did
- Compare \(p\) to a pre-set error
threshold (\(\alpha\)).
- If \(p < \alpha\) (i.e., our
empirical results are VERY unlikely) we decide H0 cannot be true and
reject it.
Mechanics of testing illustrated step-by-step: one-sample test
Example: Mice population on an island. We do a
census and identify body size (as weight) of ALL MICE.
- Weight ~ Normal(\(\mu_0\), \(\sigma\))
- On the way home between island and mainland we find a single mouse
on a drifting log, it is surprisingly large.
- Is it from the island? From the mainland? We will use a 1-sample
test with a size n = 1.
Mechanics of testing illustrated step-by-step: one-sample test
Example: Mice population on an island. We do a
census and identify body size (as weight) of ALL MICE.
- Weight ~ Normal(\(\mu_0\), \(\sigma\))
- On the way home between island and mainland we find a single mouse
on a drifting log, it is surprisingly large.
- Is it from the island? From the mainland? We will use a 1-sample
test with a size n = 1.
data = read.table("data/IslandMice.txt", header=TRUE)
x = data$weight
x1 = 13 # Drifting mouse (x1)
(mu = mean(x)) # population mean of island mice
## [1] 10.04951
(n = length(x)) # population size
## [1] 999
(sigma = sqrt(sum((x - mu)^2)/n)) # population sd (sigma)
## [1] 2.04207
Mechanics of testing illustrated step-by-step: one-sample test
Testable hypotheses:
H0: The ‘new’ mouse belongs to the island
population. Its weight is similar to those of other island
mice: Its relatively high weight is entirely due to chance.
HA: The ‘new’ mouse does not belong to the island
population. Its weight is so high that it must belong to some
other mouse population, say from the mainland.
Mechanics of testing: assuming H0 but remaining suspicious
- We believe in H0, yet we are suspicious (the mouse is heavy!).
- How likely is it to find such a heavy mouse x in the island
population? How likely is it to find an even heavier one?
- We can directly compute this probability from \(\mu_0\), \(\sigma\) and \(x\).
- This probability also corresponds to a standardized mouse weight,
which is the test statistic in this case.
Mechanics of testing: assuming H0 but remaining suspicious
- We believe in H0 (initial innocence), yet we are suspicious (the
mouse is heavy!).
- How likely is it to find such a heavy mouse x in the island
population? How likely is it to find an even heavier one?
- We can directly compute this probability from \(\mu_0\), \(\sigma\) and \(x\).
- This probability also corresponds to a standardized mouse weight,
which is the test statistic in this case.
\[
{TS} = z_1 = \frac{(x_1-\mu)}{\sigma}
\] \[
P(z\geq z_1) = ?
\]
Mechanics of testing: evaluating probabilities
\[
{TS} = z_1 = \frac{(x_1-\mu)}{\sigma}
\] \[
p = P(z\geq z_1) = ?
\]
Assume our population of body weights is normally distributed.
We can compute \(p\) as the integral
of the normal PDF.
Mechanics of testing: assessing significance
- Now we compare \(p\) to a pre-set
threshold significance level \(\alpha\)
- conventionally in ecology \(\alpha =
0.05\)

Mechanics of testing: assessing significance
- Now we compare \(p\) to a pre-set
threshold significance level \(\alpha\)
- conventionally in ecology \(\alpha =
0.05\)

# z-score for our drifter
(z1 = (x1 - mu)/sigma)
## [1] 1.444853
# Probability to find a mouse as heavy or
# heavier than x1 in the island population
# first on the standardized (z-) scale
(p = 1 - pnorm(z1, mean = 0, sd = 1))
## [1] 0.07424962
# and again on the original scale
(1 - pnorm(x1, mean = mu, sd = sigma))
## [1] 0.07424962
The probability that a mouse is as heavy as (or heavier than) the
drifting mouse is 7.4 %. Therefore, we fail to reject the null
hypothesis that the mouse is from the island.
Mechanics of testing: assessing error
Mechanics of testing: assessing error

For every test statistic, there will exist a critical test
statistic where \(P(z \ge z_{crit}) =
\alpha\).
Mechanics of testing: assessing error
For every test statistic,
there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).
\(z_1 > z_{crit}\)

Decision:
- Reject H0 (and accept HA).
- What is the chance of making an error with this decision?
- Anytime we observe \(z_1 >
z_{crit}\), we take the same risk
- Thus, the type I error rate = \(\alpha\) (normally 0.05)
Mechanics of testing: assessing error
For every test statistic,
there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).
\(z_1 < z_{crit}\)

Decision:
- We cannot reject H0.
- Do we accept it then?
- What is the chance of making an error with this decision?
- We cannot assess the type II error rate without
knowledge of HA!
Mechanics of testing: computing beta and power
A well-defined HA could be: Mouse comes from mainland, where :
\(weight \sim \mathcal{N}(\mu_{main},
\sigma)\) with \(\mu_0 <
\mu_{main}\).

- A mouse found with \(x <
x_{crit}\) (i.e., fail to reject H0) still has a 22% chance to be
from the mainland!
- Accepting H0 has a high chance of error!
# Read file of mainland mice
data_main = read.table("data/MainlandMice.txt", header=TRUE)
(mu_main = mean(data_main$weight))
## [1] 14.98701
# sd of mainland mice is ~identical to island mice
c(sd(data_main$weight), sigma)
## [1] 2.00143 2.04207
# critical value on original scale
(x_crit = qnorm(0.95, mu, sigma))
## [1] 13.40842
# Type II error rate (orange)
# Probability to find a mouse lighter than xcrit
# under the assumption that HA is right
# (i.e., the mouse comes from the mainland)
(beta = pnorm(x_crit, mu_main, sigma))
## [1] 0.2197506
# Power is just 1 - beta
(1 - beta)
## [1] 0.7802494
Mechanics of testing: controlling decision errors
- Should we play it safe by reducing alpha?
- This will increase the type II error rate!
- A conservative test (small \(\alpha\)) helps avoid false positives
(reporting a difference when there is none)
- But it increases our chance for a false negative (failing
in detecting an existing effect).
- Solution: Increase effort = sample with higher sample size!
Mechanics of testing: increasing sample size
What if we find more than one mouse?
A sample of n > 1 allows us to compute test
statistics on sample means!
Sampling with higher effort decreases both errors and thus allows a
conservative test with more power.
Mechanics of testing: increasing sample size
# Sample of 3 mice: do they belong to the island population?
x_sample = c(12.09, 15.48, 11.76)
(xbar = mean(x_sample))
## [1] 13.11
(SEM = sigma/sqrt(length(x_sample)))
## [1] 1.17899
# Standardised weight of the sample mean
# assuming it belongs to a "population of island means"
(z_sample = (xbar - mu)/SEM)
## [1] 2.595859
# Probability to find a mean at least as great as (or greater than)
# the sample mean (of 3 drifting mice) in the population of island means
(p = 1 - pnorm(z_sample, 0,1))
## [1] 0.004717744
# equivalent to:
(p = 1 - pnorm(xbar, mean = mu, sd = SEM))
## [1] 0.004717744
# Power (1 - beta)
1 - pnorm(x_crit, mean = mu_main, sd = SEM)
## [1] 0.9097045
One-sample z-test
- For the mice, we knew the underlying population was normally
distributed
- Additionally, we knew the population standard deviation \(\sigma\).
- The test is known as a z-test (sometimes Gauss
test), with test statistic \(z \sim
\mathcal{N(0,1)}\)
\[
TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}
\]
One-sample t-test
- For the mice, we knew the underlying population was normally
distributed
- Additionally, we knew the population standard deviation \(\sigma\).
- The test is known as a z-test (sometimes Gauss
test), with test statistic \(z \sim
\mathcal{N(0,1)}\)
\[
TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}
\]
- Generally, we must estimate \(\sigma\) from a sample
- The equivalent statistic using a sample standard deviation is:
\[
t = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}}
\]
- This test statistic follows Student’s
t-distribution with degrees of freedom \(\nu = n - 1\):
\[
t \sim \mathcal{T}(\nu)
\]
One-sample t-test
data = read.table("data/Crabs.txt", header=TRUE)
# Body temperature of crabs
x = data$body_temp
# null hypothesis: body temp is equal to air temp
# HA: body temp is greater (metabolic heat)
# one-sided test!
air_temp = 24.3
(xbar = mean(x))
## [1] 25.028
(s = sd(x))
## [1] 1.341802
n = length(x)
(SEM = s/sqrt(n))
## [1] 0.2683605
(t = (xbar - air_temp)/SEM)
## [1] 2.712769
(t_crit = qt(p = 0.95, df= n - 1))
## [1] 1.710882
(p = 1 - pt(t, n - 1))
## [1] 0.006072769
One-sample t-test
# Maybe simpler:
# note default alternative="two.sided"
t.test(x, mu = air_temp, alternative = "greater")
##
## One Sample t-test
##
## data: x
## t = 2.7128, df = 24, p-value = 0.006073
## alternative hypothesis: true mean is greater than 24.3
## 95 percent confidence interval:
## 24.56887 Inf
## sample estimates:
## mean of x
## 25.028
One-sample t-test
# confidence interval: 2-sided critical t-value
t_quantile = qt(1 - 0.05/2, df = n - 1)
# re-scale to the original measurement scale
precision = t_quantile * SEM
(interval = xbar + c(-precision, precision))
## [1] 24.47413 25.58187
# Air temperature is outside of this interval!
For 2-sided tests, we can equivalently construct a confidence
interval using the quantiles of the t-distribution

Interpretation: if H0 is true (i.e., our population has a mean =
\(\mu\), 95% of intervals constructed
from similar samples will include \(\mu\).
Our interval does not include \(\mu\), so we can reject the hypothesis that
the population mean is \(\mu\)
Steps to conduct a (two-sided) statistical test.
- Set \(\alpha\) (before collecting
data!)
- Define \(H_0\) and \(H_a\)
- Collect data \(X\)
- Compute the test statistic \(TS\)
and if needed the degrees of freedom \(\nu\)
- Compute the probability \(P(TS | \nu,
H_0)\) (this is the \(p\)-value)
- Decide:
Steps to conduct a (two-sided) statistical test.
- Set \(\alpha\) (before collecting
data!)
- Define \(H_0\) and \(H_a\)
- Collect data \(X\)
- Compute the test statistic \(TS\)
and if needed the degrees of freedom \(\nu\)
- Compute the probability \(P(TS | \nu,
H_0)\) (this is the \(p\)-value)
- Decide:
- \(p \ge \alpha\): Fail to reject
\(H_0\)
- \(p < \alpha\): Reject \(H_0\)
Significance levels and example statements

One-sided vs two-sided hypotheses

Parametric tests
- A test is parametric if it assumes that the
underlying population comes from a known parameterized probability
distribution
- Often this distribution is normal, but many others are also
used
- Note that the distribution is a population
parameter!
- Checking a sample for normality is useful, but not
diagnostic!
Normality checks
Graphical checks
Histograms with normal density curves
Normality checks
Graphical checks
Histograms with normal density curves
Compare mean and median
Normality checks
Graphical checks
- Boxplots can compare multiple groups simultanously
- Careful! Easy to see skew, but harder to see kurtosis
Normality checks
Graphical checks
- QQ plots are perhaps the most useful diagnostic
- Compare observed quantiles (i.e., z-score of a sample) to
those from a standard normal distribution
- Expectation is that samples fall along the line
- use
qqnorm
and qqline
in R to make the
plots
Normality tests
- It is tempting to seek a formal test for normality
- Example:
shapiro.test(x)
evaluates the null hypothesis
that x comes from a normally distributed population
- Resist this temptation!
- Remember that if we fail to reject H0, that doesn’t mean we accept
H0
- These tests have low power with small samples: often we fail to
reject H0, but type II error rate is high!
- If power is high (high n), the test is very sensitive. Small
departures from normality will be significant.
- But most parametric tests work just fine with small departures from
normality when sample size is large
- This test will cause us to often make incorrect decisions!
Normality tests
Conclusions/advice:
shapiro.test
and similar tests are very commonly used
and almost completely useless. Forget them!
- Graphically evaluate your data
- If graphically obviously non-normal, consider the robustness of the
chosen test, or transformations
- Fall back to other distributions or to
non-parametric tests
F-distribution
Another thought experiment:
- Take two samples from a population (\(\mathcal{N}[\mu, \sigma]\))
- Calculate \(s_1^2\) (from sample 1
with \(n_1\)) and Calculate \(s_2^2\) (from sample 2 with \(n_2\))
- Calculate test statistic F: \[
F=\frac{s_1^2}{s_2^2}
\]
- Repeat 1.-3. and build distribution of F-values
- Since \(s_1^2\) and \(s_2^2\) are from 2 samples from the same
population, they are both estimates for \(\sigma^2\), we thus expect \(F\approx1\).
- The F-distribution is asymmetrical. Its shape is determined by \(df_1=n_1-1\) and \(df_2=n_2-1\). There is a separate
F-distribution for each combination of \(df_1\) and \(df_2\).
F-test: Variance homogeneity?
H0: The sample variances estimate the same parametric variance. Or:
\(\sigma_1^2 = \sigma_2^2\)
Variance homogeneity = homoscedasticity
HA: The sample variances estimate different parametric variances. Or:
\(\sigma_1^2 \neq \sigma_2^2\)
Variance heterogeneity = heteroscedasticity
Only use this to test variance heterogeneity (HA).
Do not use it to check whether two samples have the same variance (same
problems as Shapiro test)!

Standard tests for location
Considerations:
- Parametric (populations follow a known
distribution) vs non-parametric
- Parametric tests are more powerful
- Number of groups/samples: 2 or more than 2
- Independence of samples
- Independent samples do not depend on each other,
were independently collected.
- Dependent/paired samples depend on each other;
e.g., the same subjects measured multiple times.
- Tests for dependent samples are generally more powerful.
Standard tests for location

Two-sample tests: independent t-test
- Tests for differences of means between two independent samples
- H0: Samples are from populations with the same mean \(\mu\).
- Student’s test assumes populations have equal variance \(\sigma\)
- Very difficult to evaluate, so this test should be
used rarely
\[
\begin{align}
t &= \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{2}{(n_1+n_2)}}}
\\ \\
s_p &= \sqrt{\frac{s^2_1 + s^2_2}{2}} \\
\nu & = n_1 + n_2 - 2
\end{align}
\]
Two-sample tests: independent t-test
- Tests for differences of means between two independent samples
- H0: Samples are from populations with the same mean \(\mu\).
- Student’s test assumes populations have equal variance \(\sigma\)
- Very difficult to evaluate, so this test should be
used rarely
- Welch’s test relaxes this assumption
- Default in R (
t.test
)
- Power approaches Student’s test if variances are similar and sample
sizes are large
\[\begin{align}
t &= \frac{\bar{x}_1 -
\bar{x}_2}{\sqrt{\frac{s^2_1}{n_1}}+\sqrt{\frac{s^2_2}{n_2}}} \\ \\
\nu & \approx \frac{\left( \frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}
\right )^2}
{\frac{s^4_1}{n^2_1\nu_1} + \frac{s^4_2}{n^2_2\nu_2}} \\
\end{align}\]
Two-sample tests: paired t-test
- Tests for differences of means between two paired
samples.
- Exactly equal to a one-sample t-test on the
difference in the samples, on the null hypothesis that
the mean of differences is zero.

x1 = c(gandalf = 6, saruman = 4, arwen = 7, frodo = 3)
x2 = c(gandalf = 4, saruman = 3, arwen = 5, frodo = 2)
x1 - x2
## gandalf saruman arwen frodo
## 2 1 2 1
t.test(x1, x2, paired = TRUE)
##
## Paired t-test
##
## data: x1 and x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 0.5813069 2.4186931
## sample estimates:
## mean difference
## 1.5
t.test(x1 - x2)
##
## One Sample t-test
##
## data: x1 - x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.5813069 2.4186931
## sample estimates:
## mean of x
## 1.5
Nonparametric location tests
- These tests search for a difference in medians
- Convert data to ordinal scale (i.e., rank
data)
- Use for small (ish) samples when population normality cannot be
assumed
- Loss of information == low statistical power!
Independent samples: Mann-Whitney U-test
Dependent samples: Wilcoxon test
rank(c(x1, x2))
## gandalf saruman arwen frodo gandalf saruman arwen frodo
## 7.0 4.5 8.0 2.5 4.5 2.5 6.0 1.0
wilcox.test(x1, x2)
## Warning in wilcox.test.default(x1, x2): cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: x1 and x2
## W = 12, p-value = 0.3065
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(x1, x2, paired = TRUE)
## Warning in wilcox.test.default(x1, x2, paired = TRUE): cannot compute exact
## p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: x1 and x2
## V = 10, p-value = 0.09467
## alternative hypothesis: true location shift is not equal to 0