Hypothesis testing: discovery

Notes and in-class exercises

Notes

You can download a template file to work with here.
File organization: Save this file in the “Activities” subfolder of your “STAT155” folder.

Learning goals

By the end of this lesson, you should be able to:

Understand how standard errors and confidence intervals enable us to make statistical inferences
Articulate how we can formalize a research question as a testable, statistical hypothesis

Readings and videos

This is a discovery activity, so no assigned readings/videos today.

Exercises

Let’s return to the fish dataset. Recall that rivers contain small concentrations of mercury which can accumulate in fish. Scientists studied this phenomenon among 171 largemouth bass in the Wacamaw and Lumber rivers of North Carolina, recording the following:

variable	meaning
River	Lumber or Wacamaw
Station	Station number where the fish was caught (0, 1, …, 15)
Length	Fish’s length (in centimeters)
Weight	Fish’s weight (in grams)
Concen	Fish’s mercury concentration (in parts per million; ppm)

# Load the data & packages
library(tidyverse)
library(readr)
fish <- read_csv("https://mac-stat.github.io/data/Mercury.csv")

head(fish)

Exercise 1

Research question: Is there evidence that the mercury concentration in fish (Concen) differs according to the River they were sampled from?

part a: fit the model

Fit a simple linear regression model that would address our research question

mod_fish <- ___
summary(mod_fish)

Interpret the intercept from this model.

part b: construct a CI

Using the 68-95-99.7 rule, construct an approximate 95% confidence interval for the intercept term, and provide an appropriate interpretation.

Compare your CI to an exact 95% confidence interval for the model coefficients:

confint(mod_fish, level=0.95)

part c: what can we conclude from multiple samples?

Suppose we take 200 different samples of fish from the Lumber River. Based on these results, in how many of those samples would you expect to observe mean mercury concentration greater than 1.25ppm?

part d: intuition for constructing & interpreting test statistics

Suppose previous environmental studies have found little evidence of mercury pollution in other rivers in the area, so perhaps our “default” assumption is that fish from the Lumber river should have an expected mercury concentration of 0ppm. How many standard errors is our sample estimate (1.078ppm) away from this expectation? What are three possible conclusions?

part e: do individual observations contradict our conclusions?

Now suppose we sample a single fish from the Lumber River and find it has a mercury concentration of 2.5ppm. Are you surprised by this result? Why or why not? (Hint: create a code chunk that calculates the mean, standard deviation, and maximum of the Concen variable in each river in our original sample)

Exercise 2

Let’s look at the model summary output again:

summary(mod_fish)

part a: interpret model coefficient

Now, let’s interpret the RiverWacamaw coefficient. Based only on the coefficient (don’t think about the standard error yet), what can we say about the difference in mercury concentration among fish in the two rivers?

part b: construct a CI

Using the 68-95-99.7 rule, construct an approximate 95% confidence interval for the RiverWacamaw coefficient, and provide an appropriate interpretation.

part c: interpreting the CI

Do you believe it plausible that the mean mercury concentration of the fish population in the Wacamaw River is approximately the same as that of the fish population in the Lumber River? How would you confirm this? What assumptions are you making?

part d: effect of sample size on our conclusions

Suppose we sample 10 times as many fish from the Wacamaw River, and get a similar coefficient estimate (0.2). Thinking back to the Central Limit Theorem, what should happen to the standard error of the RiverWacamaw coefficient? How small of a standard error would we need to more conclusively say that there is an actual difference in mean mercury concentrations of the Lumber River and Wacamaw River fish populations?

part e: reconciling parameter estimates and uncertainty

Suppose the true population coefficient for the RiverWacamawparameter is 0.02 (i.e. the average mercury concentration is 0.02ppm higher for the Wacamaw River fish population compared to that of the Lumber River). Is this meaningful?

part f (CHALLENGE)

Using the model summary output, report the mean mercury concentration for our sample of fish from the Wacamaw River:

summary(mod_fish1)

Which of the following values do you think is the standard error of the sample mean for the Wacamaw River?

0.11712
0.08866
0.11712 + 0.08866 = 0.20578
0.11712 - 0.08866 = 0.02846
something else

To answer this question, look at the code chunk below, which fits the same model, but uses the Wacamaw River as our reference category instead of the Lumber River:

mod_fish2 <- lm(Concen ~ River, data=fish %>% mutate(River=ifelse(River == "Wacamaw", paste0("_", River), River)))
summary(mod_fish2)

Compare this to the output for mod_fish1. What do you notice about the standard errors of the intercepts (i.e., the standard errors of the means for each river) compared to the standard errors of the RiverWacamaw and RiverLumber coefficients (i.e., the standard errors of the differences between the means)?

Reflection

Based on this activity and the inference tools you’ve learned about so far (sampling distributions, standard errors, confidence intervals), can you think of and describe a way that you can quantify evidence “for” or “against” a coefficient being equal to some particular value? (for example, we have evidence that the average mercury concentration in Lumber River fish is ~1.08ppm, and the standard error of this estimate suggests that observing a fish with 0ppm is very unlikely. How can we quantify that evidence?)

Solutions

Exercise 1

Research question: Is there evidence that the mercury concentration in fish (Concen) differs according to the River they were sampled from?

part a: fit the model

Fit a simple linear regression model that would address our research question

mod_fish <- lm(Concen ~ River, data=fish)
summary(mod_fish)

Interpret the intercept from this model.

Our model estimates an average mercury concentration of 1.078ppm among fish in the Lumber River.

part b: construct a CI

Using the 68-95-99.7 rule, construct an approximate 95% confidence interval for the intercept term, and provide an appropriate interpretation.

1.078 +/- 2*0.089 –> [0.90, 1.256]

Preferred interpretation: It is plausible that the true mean mercury concentration among fish in the Lumber River is between 0.90ppm and 1.25ppm.

(technical addendum to this interpretation): …specifically, we expect that if we take many different samples and obtain a set of corresponding parameter estimates and confidence intervals, we expect that 95% of the resulting intervals will contain the true mean mercury concentration of the entire Lumber River fish population. We hope that our interval is one of the lucky 95% and not one of the unlucky 5% that don’t contain the true population parameter.

Not as preferred interpretation: We are 95% confident that the mean mercury concentration among fish in the Lumber River is between 0.90ppm and 1.25ppm.

Compare your CI to an exact 95% confidence interval for the model coefficients:

confint(mod_fish, level=0.95)

part c: what can we conclude from multiple samples?

Suppose we take 200 different samples of fish from the Lumber River. In how many of those samples would you expect to observe an estimated mean mercury concentration greater than 1.25ppm?

We don’t/can’t actually know! This depends on the true population parameter and the accuracy of our sampling distribution.

What we can say is that if our sampling distribution model is accurate, then we should expect that about 10 out of 200 samples (5% of them) will produce confidence intervals that don’t contain the population parameter. We should expect that half of these–so 5 samples–are overestimates and the other half are underestimates.

part d: intuition for constructing & interpreting test statistics

If we assume that 0ppm is the “true” mercury concentration, then our estimate of Beta_0 = 1.078ppm with a standard error of 0.08866 means that our estimate is (1.07808-0)/0.08866 = 12.16 standard errors away from what we should expect.

Possible conclusions:

Our working assumption that 0ppm should be the “true” mercury concentration in the Lumber river fish population was wrong! The confidence interval we constructed above suggests that a true value of 0ppm is extremely implausible.

Perhaps ~0ppm is actually the true average mercury concentration in the population, we just got extremely, outrageously unlucky with our sample.

Perhaps there was a measurement/data entry error, and the units are actually parts per billion, not million.

part e: do individual observations contradict our conclusions?

fish %>% 
  group_by(River) %>% 
  summarise(mean=mean(Concen), 
            sd=sd(Concen), 
            max=max(Concen))

Observing a single fish with a mercury concentration of 2.5ppm is actually not that surprising! 2.5ppm is a little more than 2 standard deviations away from the mean mercury concentration in our sample of fish from the Lumber River (1.08+2*0.64=2.43), but there are certainly fish in the sample with even higher mercury concentrations (max=3.5ppm), so this isn’t outside the bounds of what we’d expect.

Exercise 2

Let’s look at the model summary output again:

summary(mod_fish)

part a: interpret model coefficient

The RiverWacamaw coefficient is 0.19835, meaning that the mean mercury concentration among fish in the Wacamaw River is, on average, about 0.20ppm higher than that of fish in the Lumber River.

part b: construct a CI

Using the 68-95-99.7 rule, construct an approximate 95% confidence interval for the RiverWacamaw coefficient, and provide an appropriate interpretation.

0.20 +/- 2*0.11 –> [-0.02, 0.42]

Preferred interpretation: It is plausible that the true difference in mean mercury concentration among fish in the Wacamaw River compared to the Lumber River is between -0.02 ppm and 0.42ppm.

Not as preferred interpretation: We are 95% confident that the mean mercury concentration among fish in the Wacamaw River somewhere between 0.02ppm less than that of fish in the Lumber River and 0.42ppm more than that of fish in the Lumber River.

part c: interpreting the CI

Answers may vary–this is certainly plausible, since our 95% CI contains 0 (i.e., there is no difference in means between the two rivers). However, we might also argue that there is SOME evidence of a difference, since most of the CI is > 0.

part d: effect of sample size on our conclusions

A larger sample should result in a smaller standard error of the RiverWacamaw coefficient. If the standard error is smaller than 0.1 (say 0.098), then a 95% confidence interval would be [0.004, 0.396]. Since this interval doesn’t include 0, we could conclude that fish in the Wacamaw River, on average, have a higher mercury concentration than fish in the Lumber River. More importantly, the lower standard error of the coefficient allows us to say there is evidence that this difference should be observable across new samples.

part e: reconciling parameter estimates and uncertainty

This will depend on context–a priori, this difference appears to be negligible, and we could potentially chalk it up to uncontrolled confounders (e.g., perhaps fish in one river tend to be older/bigger and therefore have slightly higher mercury concentrations, even if there is no underlying difference in mercury pollution). We also might consider: what is considered a “harmful” mercury concentration, and are fish in either river near that threshold? Has this changed over time, and by how much?

part f: (CHALLENGE)

Using the model summary output, report the mean mercury concentration for our sample of fish from the Wacamaw River:

summary(mod_fish1)

1.07808 + 0.19835 = 1.27643ppm

Which of the following values do you think is the standard error of the sample mean for the Wacamaw River?

To answer this question, look at the code chunk below, which fits the same model, but uses the Wacamaw River as our reference category instead of the Lumber River:

mod_fish2 <- lm(Concen ~ River, data=fish %>% mutate(River=ifelse(River == "Wacamaw", paste0("_", River), River)))
summary(mod_fish2)

SEs for the means are 0.08866 and 0.07652 for the Lumber and Wacamaw rivers, respectively. The SE for the difference is the same in both models (0.11712), which is greater than the SE of either mean. Because standard errors quantify uncertainty in a given parameter estimate, this tells us that the uncertainty of the estimated difference between two means is greater than the uncertainty in our estimate of either mean by itself.