Univariate Statistics

Lauren Talluto

14.01.2025

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

Reminder: standard error of the mean:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

We can also standardize (scale) the sample means:

\[ {z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

These z-values follow a standard normal distribution (with mean=0 and sd=1).

The central limit theorem revisited

Reminder: standard error of the mean:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

We can also standardize (scale) the sample means:

\[ {z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

These z-values follow a standard normal distribution (with mean=0 and sd=1).

If we know \(\mu\) and \(\sigma\), we can define an interval around \(\mu\) such that:

Putting the central limit theorem to work: Confidence intervals

\[ z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \] \[ P(-z_\frac{\alpha}{2}\leq \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}\leq+z_\frac{\alpha}{2})=1-\alpha \] \[ P(\bar{x}-z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}\leq \mu \leq\bar{x}+z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

Putting the central limit theorem to work: Confidence intervals

\[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

But wait - how do I know \(\sigma\)??

The t-distribution

If the standard error is estimated using the sample standard deviation, then standardized sample means follow a Student’s t-distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}} = S.E.M. \] \[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

Relative to the normal distribution, the t distribution is:

Above 100 df, t is basically equivalent to a normal distribution.

Confidence intervals: accuracy, precision, sample size

The t-distribution can be used to construct a confidence interval:

\[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \] \[ P(\bar{x}-t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}}\leq \mu \leq\bar{x}+t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

Allows an interval estimate of \(\mu\) with an estimate for \(\sigma\) derived from the same sample as \(\bar{x}\).

data = read.table("data/Glaciers.txt", header=TRUE)

# radiocarbon in dissolved organic carbon from glacier ice samples 
d14C = data$delta14C.permil

# Give mean and confidence interval with confidence = 0.95

(m = mean(d14C))
## [1] -426.2994
(s = sd(d14C))
## [1] 143.8792
(n = length(d14C))
## [1] 19
alpha = 0.05
accuracy = 1-alpha

(t = qt(p = 1 - alpha/2, df = n-1)) # two-sided, P goes 50/50 to both tails
## [1] 2.100922

(aprec = t*s/sqrt(n))
## [1] 69.34756
aprec # absolute precision = half the CI width
## [1] 69.34756
(rprec = aprec/m*100) # relative precision
## [1] -16.26734

# confidence limits
c(lower = m - aprec, upper = m + aprec)
##     lower     upper 
## -495.6469 -356.9518

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

\[ AP= t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

\[ RP= \frac{t_{\frac{\alpha}{2},d.f.}}{\bar{x}} \frac{s}{\sqrt{n}} \]

Confidence intervals: accuracy, precision, sample size

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

# Regard data as pilot study
# Set accuracy and rprec as needed
accuracy = .95
rprec = 0.10 # assume more (or less) precise estimate is needed

alpha = 1-accuracy

n = 2   # left side of formula, low n as starting value
t = qt(1 - alpha/2, n-1)  
(rhs = (t * s/(rprec * m))^2)  # right side of formula
## [1] 1839.071

# right side minus left side
# when diff falls below zero - then enough sampling effort 
diff = rhs - n

while(diff >= 0) {
    n = n + 1
    t = qt(1 - alpha/2, n - 1)
    diff = ((t * s/(rprec * m))^2) - n
}
# sample size needed to hit the required precision and accuracy
n 
## [1] 47

Confidence intervals: accuracy, precision, sample size

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

The point where a function crosses zero is the function’s root. R has a built-in function uniroot that will find a root for us.

f = function(n, alpha = 0.05, rprec = 0.1) {
    t = qt(1 - alpha/2, n-1)
    ((t * s/(rprec * m))^2) - n
}
uniroot(f, lower = 2, upper = 1000)
## $root
## [1] 46.19821
## 
## $f.root
## [1] -1.037638e-05
## 
## $iter
## [1] 8
## 
## $init.it
## [1] NA
## 
## $estim.prec
## [1] 6.103516e-05

Accuracy and precision illustrated

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).


Statistical decision theory

Type I & II Errors

When making a decision based on (incomplete) sample information you can make an error!

There are two types of error: \(\alpha\) (type I), \(\beta\) (type II)

Type I & II Errors

  • Type I error: false alarm (or false positive)
    • There is no real effect, but you think there is one.
  • Type II error: missed opportunity (or false negative)
    • There is a real effect, but you failed to find it.

Type I & II Errors

Power (\(1- \beta\)): The probability to correctly identify an existing real effect.


Power increases

  • as the strength of the effect increases (e.g., larger difference between populations)
  • as the population variance decreases
  • as we increase alpha
  • as we increase sample size

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.


To test, we have collected empirical information, i.e. each population is represented by 1 sample (of some size).

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
  2. Compare \(p\) to a pre-set error threshold (\(\alpha\)).

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
  2. Compare \(p\) to a pre-set error threshold (\(\alpha\)).
  3. If \(p < \alpha\) (i.e., our empirical results are VERY unlikely) we decide H0 cannot be true and reject it.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

data = read.table("data/IslandMice.txt", header=TRUE)
x = data$weight
x1 = 13 # Drifting mouse (x1)
(mu = mean(x)) # population mean of island mice
## [1] 10.04951
(n = length(x)) # population size
## [1] 999
(sigma = sqrt(sum((x - mu)^2)/n)) # population sd (sigma)
## [1] 2.04207

Mechanics of testing illustrated step-by-step: one-sample test

Testable hypotheses:

H0: The ‘new’ mouse belongs to the island population. Its weight is similar to those of other island mice: Its relatively high weight is entirely due to chance.

HA: The ‘new’ mouse does not belong to the island population. Its weight is so high that it must belong to some other mouse population, say from the mainland.

Mechanics of testing: assuming H0 but remaining suspicious

Mechanics of testing: assuming H0 but remaining suspicious

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ P(z\geq z_1) = ? \]

Mechanics of testing: evaluating probabilities

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ p = P(z\geq z_1) = ? \]

Assume our population of body weights is normally distributed.

We can compute \(p\) as the integral of the normal PDF.

Mechanics of testing: assessing significance

Mechanics of testing: assessing significance

# z-score for our drifter
(z1 = (x1 - mu)/sigma)
## [1] 1.444853

# Probability to find a mouse as heavy or 
# heavier than x1 in the island population
# first on the standardized (z-) scale
(p = 1 - pnorm(z1, mean = 0, sd = 1))
## [1] 0.07424962
# and again on the original scale
(1 - pnorm(x1, mean = mu, sd = sigma))
## [1] 0.07424962

The probability that a mouse is as heavy as (or heavier than) the drifting mouse is 7.4 %. Therefore, we fail to reject the null hypothesis that the mouse is from the island.

Mechanics of testing: assessing error

Mechanics of testing: assessing error



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

Mechanics of testing: assessing error



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 > z_{crit}\)

Decision:

Mechanics of testing: assessing error



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 < z_{crit}\)

Decision:

Mechanics of testing: computing beta and power

A well-defined HA could be: Mouse comes from mainland, where :

\(weight \sim \mathcal{N}(\mu_{main}, \sigma)\) with \(\mu_0 < \mu_{main}\).

# Read file of mainland mice
data_main = read.table("data/MainlandMice.txt", header=TRUE)
(mu_main = mean(data_main$weight))
## [1] 14.98701
# sd of mainland mice is ~identical to island mice
c(sd(data_main$weight), sigma) 
## [1] 2.00143 2.04207

# critical value on original scale
(x_crit = qnorm(0.95, mu, sigma)) 
## [1] 13.40842

# Type II error rate (orange)
# Probability to find a mouse lighter than xcrit
# under the assumption that HA is right
# (i.e., the mouse comes from the mainland)
(beta = pnorm(x_crit, mu_main, sigma))
## [1] 0.2197506

# Power is just 1 - beta
(1 - beta)
## [1] 0.7802494

Mechanics of testing: controlling decision errors


Mechanics of testing: increasing sample size

What if we find more than one mouse?

A sample of n > 1 allows us to compute test statistics on sample means!

Sampling with higher effort decreases both errors and thus allows a conservative test with more power.

Mechanics of testing: increasing sample size

# Sample of 3 mice: do they belong to the island population?
x_sample = c(12.09, 15.48, 11.76)
(xbar = mean(x_sample))
## [1] 13.11
(SEM = sigma/sqrt(length(x_sample)))
## [1] 1.17899

# Standardised weight of the sample mean
# assuming it belongs to a "population of island means"
(z_sample = (xbar - mu)/SEM)
## [1] 2.595859

# Probability to find a mean at least as great as (or greater than)
# the sample mean (of 3 drifting mice) in the population of island means
(p = 1 - pnorm(z_sample, 0,1))
## [1] 0.004717744
# equivalent to:
(p = 1 - pnorm(xbar, mean = mu, sd = SEM))
## [1] 0.004717744

# Power (1 - beta)
1 - pnorm(x_crit, mean = mu_main, sd = SEM)
## [1] 0.9097045

One-sample z-test

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

One-sample t-test

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

\[ t = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

\[ t \sim \mathcal{T}(\nu) \]

One-sample t-test

data = read.table("data/Crabs.txt", header=TRUE)
# Body temperature of crabs
x = data$body_temp

# null hypothesis: body temp is equal to air temp
# HA: body temp is greater (metabolic heat)
# one-sided test!
air_temp = 24.3

(xbar = mean(x))
## [1] 25.028
(s = sd(x))
## [1] 1.341802
n = length(x)
(SEM = s/sqrt(n))
## [1] 0.2683605

(t = (xbar - air_temp)/SEM)
## [1] 2.712769
(t_crit = qt(p = 0.95, df= n - 1))
## [1] 1.710882
(p = 1 - pt(t, n - 1))
## [1] 0.006072769

One-sample t-test

# Maybe simpler:
# note default alternative="two.sided"
t.test(x, mu = air_temp, alternative = "greater") 
## 
##  One Sample t-test
## 
## data:  x
## t = 2.7128, df = 24, p-value = 0.006073
## alternative hypothesis: true mean is greater than 24.3
## 95 percent confidence interval:
##  24.56887      Inf
## sample estimates:
## mean of x 
##    25.028

One-sample t-test

# confidence interval: 2-sided critical t-value
t_quantile = qt(1 - 0.05/2, df = n - 1)

# re-scale to the original measurement scale
precision = t_quantile * SEM

(interval = xbar + c(-precision, precision))
## [1] 24.47413 25.58187
# Air temperature is outside of this interval!

For 2-sided tests, we can equivalently construct a confidence interval using the quantiles of the t-distribution

Interpretation: if H0 is true (i.e., our population has a mean = \(\mu\), 95% of intervals constructed from similar samples will include \(\mu\).

Our interval does not include \(\mu\), so we can reject the hypothesis that the population mean is \(\mu\)

Steps to conduct a (two-sided) statistical test.

Steps to conduct a (two-sided) statistical test.

Significance levels and example statements


One-sided vs two-sided hypotheses


Parametric tests

Normality checks

Graphical checks

Histograms with normal density curves

Normality checks

Graphical checks

Histograms with normal density curves

Compare mean and median

Normality checks

Graphical checks

Normality checks

Graphical checks

Normality tests

Normality tests

Conclusions/advice:

F-distribution

Another thought experiment:

  1. Take two samples from a population (\(\mathcal{N}[\mu, \sigma]\))
  2. Calculate \(s_1^2\) (from sample 1 with \(n_1\)) and Calculate \(s_2^2\) (from sample 2 with \(n_2\))
  3. Calculate test statistic F: \[ F=\frac{s_1^2}{s_2^2} \]
  4. Repeat 1.-3. and build distribution of F-values

F-test: Variance homogeneity?

H0: The sample variances estimate the same parametric variance. Or: \(\sigma_1^2 = \sigma_2^2\)

Variance homogeneity = homoscedasticity


HA: The sample variances estimate different parametric variances. Or: \(\sigma_1^2 \neq \sigma_2^2\)

Variance heterogeneity = heteroscedasticity


Only use this to test variance heterogeneity (HA). Do not use it to check whether two samples have the same variance (same problems as Shapiro test)!

Standard tests for location

Considerations:

Standard tests for location

Two-sample tests: independent t-test

\[ \begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{2}{(n_1+n_2)}}} \\ \\ s_p &= \sqrt{\frac{s^2_1 + s^2_2}{2}} \\ \nu & = n_1 + n_2 - 2 \end{align} \]

Two-sample tests: independent t-test

\[\begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s^2_1}{n_1}}+\sqrt{\frac{s^2_2}{n_2}}} \\ \\ \nu & \approx \frac{\left( \frac{s^2_1}{n_1} + \frac{s^2_2}{n_2} \right )^2} {\frac{s^4_1}{n^2_1\nu_1} + \frac{s^4_2}{n^2_2\nu_2}} \\ \end{align}\]

Two-sample tests: paired t-test

x1 = c(gandalf = 6, saruman = 4, arwen = 7, frodo = 3)
x2 = c(gandalf = 4, saruman = 3, arwen = 5, frodo = 2)
x1 - x2
## gandalf saruman   arwen   frodo 
##       2       1       2       1

t.test(x1, x2, paired = TRUE)
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean difference 
##             1.5
t.test(x1 - x2)
## 
##  One Sample t-test
## 
## data:  x1 - x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean of x 
##       1.5

Nonparametric location tests

Independent samples: Mann-Whitney U-test

Dependent samples: Wilcoxon test

rank(c(x1, x2))
## gandalf saruman   arwen   frodo gandalf saruman   arwen   frodo 
##     7.0     4.5     8.0     2.5     4.5     2.5     6.0     1.0

wilcox.test(x1, x2)
## Warning in wilcox.test.default(x1, x2): cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x1 and x2
## W = 12, p-value = 0.3065
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(x1, x2, paired = TRUE)
## Warning in wilcox.test.default(x1, x2, paired = TRUE): cannot compute exact
## p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x1 and x2
## V = 10, p-value = 0.09467
## alternative hypothesis: true location shift is not equal to 0