Statistical methods for archaeological data analysis I: Basic methods

class: title-slide, center, middle

#  Statistical methods for archaeological data analysis I: Basic methods

##  05 - Nonparametric Tests

###  Martin Hinz

####  Institut für Archäologische Wissenschaften, Universität Bern

01.04.2025

.footnote[
.right[
.tiny[
You can download a [pdf of this presentation](smada05.pdf).
]
]
]
---

## Inductive statistics or statistical inference

**Is used to draw conclusions about (unknown) parameters of the population on basis of a sample**
The results are always statistical ;-)

i.e. all statements are true with a certain probability but could be also false with a certain probability

The basis of statistical inference is probability theory (stochastic)

---

## Population and sample [1]

### Repetition:
**Population**

Amount of all items of relevance for an analysis.

**Sample**

Selection of items on basis of certain criteria (e.g. representativity) which will be analysed instead of the population

The difference should always be kept in mind

In archaeology only sampling is possible! The population can never be investigated!

---

## Population and sample [2]

Features of the population: *parameters*

Parameters always exist, they have a certain value, but they are unknown and often (most of the time) also uncheckable.

**Example:**
.pull-left[
`$\mu:$` mean of the population

`$\bar{x}$`: mean of the sample
]

.pull-right[
`$\sigma$`: standard deviation of the population

`$s$`: standard deviation of the sample
]

In statistical tests only features of the sample could be checked. The quality of the statement of a test therefore depends on the choice of the sample (representativity)!

---
## Statistical hypothesis testing

### Validation of an assumption about the population

A assumption (hypothesis) about the population is made and than its probability is checked against the sample.

### Usual questions:

**How probable is it that two or more samples descend from the different/the same population?**

(eg. Is the custom of grave goods for man and women so different that two different social groups are visible?)

**How probable is it that a given sample descend from a population with certain parameters?**

(Is the amount of grave goods random or is a pattern visible?)

---

## Null hypothesis [1]

### Validation through falsification

In statistical tests most of the times not the statement is tested which one expects to be true but one tries to disprove the statement which one expects to be wrong: the null hypothesis.

This hypothesis states mostly, that a association do not exists or that there is no differences between the samples and the distribution of the observations is by chance.

Example: Is the composition of grave goods different between male and female deceased?

`$H_0$`: The compositionisthe same

`$H_1$`: The composition is different

### Reason
1. It is (logical) easier to prove, that a statement is wrong (falsify) then to prove that a statement is true (verify).
2. Most of the times it is easier to formulate a null hypothesis (How exactly is the composition different?). It doesn't make a assumption about how the character of a association/difference exactly is.

---

## Null hypothesis [2]

### „Workflow“ of a statistical test

**Construction of a alternative hypothesis:**

The composition of the grave goods is different between male and female deceased.

**Construction of the null hypothesis:**

The composition of the grave goods is the same in male and female burials.

**Test of the null hypothesis**

**If the result of the test is significant:**

Rejection of the null hypothesis, choice of the alternativ hypothesis. The composition of the grave goods is different between male and female deceased.
If the result of the test is not significant:

**The null hypothesis could not be rejected.**

We can not say if the composition of the grave goods is different between male and female deceased or not!

---

## One-tailed/Two-tailed hypothesis

### one-tailed oder two-tailed

Dependend on the question there could be a different number of alternative hypothesis.

**Example:**

*Is the number of grave goods in female burials higher than in male?*

One-tailed hypothesis, possible answers are yes or no.

*Is the number of grave goods in female burials different from male?*

Two-tailed hypothesis, possible answers smaller-equal-greater.

That's why in statistical tests the result is often two significances (onetailed, two-tailed).
.center[
![:width 25%](data:image/png;base64,#../images/05_session/cm_fig2d.png) ![:width 25%](data:image/png;base64,#../images/05_session/cm_fig2c.png)
]

---

## Stat. Significance

### How true is true?

Statistical significance is effectively a measurement how probable a error is.

On basis of the significance the null hypothesis will be rejected and the alternative hypothesis will be choosen … or not.

There are classic boundary values for significance (significance levels):

0.05: significant, with 95% probability the decision is right.

0.01: very significant, with 99% probability the decision is right.

0.001: highly significant, with 99,9% probability the decision is right.

Often named with p-value or `$\alpha$`.

---
## α- und β-error [1]

### If statistics go wrong...

There are two kinds of possible errors:

**The null hypothesis was rejected although it is true** -> *Type I error, false positive, `$\alpha$`-error*

The result of a pregnancy test is false positive if it shows a pregnancy although there is none.

**The null hypothesis was not rejected although it is wrong** -> *Type II error, false negative, `$\beta$`-error*

The result of a pregnancy test is false negative if it shows no pregnancy although there is one.

---
## α- und β-error [2]

### Tests and errors

**Statistical tests should avoid both types of errors**

balancing act (not to strict/not strict enought)

**General Type I Errors are more serious than Type II Errors**

This type leads to wrong assuptions because with it the alternative hypothesis seems to be proven, in case of a Type I Error nothing is proven

**Power of a test**

A test has more power if he avoids Type II Errors without risking more Type I errors.

A more powerful test helps to clarify issues better

---

## Nonparametric tests

### Parametric vs. nonparametric

**Parametric**: The distribution of the values have to be in a certain form (e.g. normal distribution); assumptions about the distribution of the population are needed

**non-parametric**: no assumptions about the distribution of the sample and the population are needed

### Nonparametric tests, advantages and disadvantages:

**Advantage**: Also appropriate if no statements about the distribution are possible or the distribution fits no for parametric tests.

Also smaller samples are possible.

**Disadvantages**: Tests have general a lesser power.

---

## `$\chi^2$` test
.center[
(![](data:image/png;base64,#../images/06_session/640px-Tai_chi_chuan.jpg) - tai)^2
]

---

## `$\chi^2$` test [1]

### Possible Questions

**Do settlements tend to be situated on rather good soil or is the distribution random?**

Conclusions about settlement behaviour and economy would be possible

**Do older individuals have more shoe-last celt as grave goods than younger?**

If shoe-last celt would be signs of social rank than this situation would make conclusions possible about heredity or acquisition of social rank during life time.

**Tests for nominal scaled variables are possible!**

Therefore of particular value for archaeology because we have often to deal with such data.

---

## `$\chi^2$` test [2]

### Test for independence of two distributions

**Requirements**: at least 1 nominal scaled variable (one sample case) and
1 nominal scaled grouping variable (two sample case)

**Procedure with one sample**: observed values are compared with expected values given a certain distribution, no expected value should be < 5; n should be > 50

**Procedure with two samples**: observed values of both distributions are
compared with expected values if the samples would be even distributed,
no expected value should be < 5; n should be > 50

**If sample size is too small**: Fishers Exact Test

**Test statistics**: `$\chi^2$`

Significance depend on degree of freedom (df)

---

## Excursus degree of freedom