Statistical methods for archaeological data analysis I: Basic methods

class: title-slide, center, middle

#  Statistical methods for archaeological data analysis I: Basic methods

##  03 - Explorative statistics & graphical display

###  Martin Hinz

####  Institut für Archäologische Wissenschaften, Universität Bern

11.03.2025

.footnote[
.right[
.tiny[
You can download a [pdf of this presentation](smada03.pdf).
]
]
]
---

## Loading data for the following steps

### download data
* [muensingen_fib.csv](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/03/muensingen_fib.csv)

### Read the Data on Muensingen Fibulae

``` r
muensingen <- read.csv2("muensingen_fib.csv")
head(muensingen)
```

```
##    X Grave Mno FL BH BFA FA CD BRA ED FEL  C   BW  BT FEW Coils Length
## 1  1   121 348 28 17   1 10 10   2  8   6 20  2.5 2.6 2.2     4     53
## 2  2   130 545 29 15   3  8  6   3  6  10 17 11.7 3.9 6.4     6     47
## 3  3   130 549 22 15   3  8  7   3 13   1 17  5.0 4.6 2.5    10     47
## 4  8   157  85 23 13   3  8  6   2 10   7 15  5.2 2.7 5.4    12     41
## 5 11   181 212 94 15   7 10 12   5 11  31 50  4.3 4.3  NA     6    128
## 6 12   193 611 68 18   7  9  9   7  3  50 18  9.3 6.5  NA     4    110
##   fibula_scheme
## 1             B
## 2             B
## 3             B
## 4             B
## 5             C
## 6             C
```

---

## Cross tables (contingency tables)

### For summary of data:

``` r
my_table <- table(muensingen$fibula_scheme, muensingen$Grave) 
my_table
```

```
##    
##     6 23 31 44 48 49 61 68 80 91 121 130 157 181 193
##   A 1  1  1  1  0  0  0  0  0  0   0   0   0   0   0
##   B 0  0  0  0  1  1  2  1  1  1   1   2   1   0   0
##   C 0  0  0  0  0  0  0  0  0  0   0   0   0   1   1
```

``` r
addmargins(my_table)
```

```
##      
##        6 23 31 44 48 49 61 68 80 91 121 130 157 181 193 Sum
##   A    1  1  1  1  0  0  0  0  0  0   0   0   0   0   0   4
##   B    0  0  0  0  1  1  2  1  1  1   1   2   1   0   0  11
##   C    0  0  0  0  0  0  0  0  0  0   0   0   0   1   1   2
##   Sum  1  1  1  1  1  1  2  1  1  1   1   2   1   1   1  17
```

---

## Basics about charts

.pull-left[
### Principles for good charts according to E. Tufte:
(The Visual Display of Quantitative Information. Cheshire/
Connecticut: Graphics Press, 1983)

- „Graphical exellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.”
- Data-ink ratio = „proportion of a graphic’s ink devoted to the non-redundant display of data-information“ (kein chartjunk!)
- „Graphical excellence is often found in simplicity of design and complexity of data.“

\- after Müller-Scheeßel
]

.pull-right[
![](data:image/png;base64,#../images/03_session/plot_elements.png)
]

---

## Plot [1]

.pull-left[
Basic drawing function of R:

``` r
plot(muensingen$Length)
```

options:
- p – points (default)
- l – solid line
- b – line with points for the values
- c – line with gaps for the values
- o – solid line with points for the values
- h – vertical lines up to the values
- s – stepped line from value to value
- n – empty coordinate system
]

.pull-right[
![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-9-1.png)

]

---
## Plot [2]

.pull-left[

``` r
plot(muensingen$Length,type="b")
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-10-1.png)
]

.pull-right[
Intelligent system: automatic determination of variable type, drawing of
the appropriate chart
.tiny[

``` r
plot(as.factor(muensingen$fibula_scheme))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-11-1.png)
]
]

---

## Plot [3]

Enhancing the plot with optional components & Text
.tiny[

``` r
plot(muensingen$Length, muensingen$FL,
     xlim=c(0, 140), # limits of the x axis
     ylim = c(0, 100), # limits of the y axis
     xlab = "Fibula Length", # label of the y axis
     ylab = "Foot Length", # label of the x axis
     main = "Fibula total length vs. Foot Length", # main title
     sub="example plot" # subtitle
     )
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-12-1.png)
]

---

## Plot [4]

Plot do a lot for you:
- Opens a window for display
- Determines the optimal size of the frame of reference
- Draws the coordinate system
- Draws the values

Gives a „handle“ back for further additions to the plot, e.g.:
- lines – additional lines to an existing plot
- points – additional points to an existing plot
- abline – additional special lines to an existing plot
- text – additional text on choosen position to an existing plot

Additional possiblities for “decorations”: ? par

---

## Plot [5]

Add additional elements:
Drawing lines

``` r
abline(v = mean(muensingen$Length), col = "red")         # draw a red vertical line
abline(h = mean(muensingen$FL), col = "green")           # draw a green vertical line
abline(lm(FL~Length, data = muensingen), col = "blue")   # draw a blue diagonal line
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-13-1.png)

---

## Export the graphics

With the GUI:

Export → Save as...

With the commando line:
As vector file

``` r
dev.copy2eps(file="test.eps")
dev.copy2pdf(file="test.pdf")
```

``` r
savePlot(filename="test.tif", type="tiff")
```

Possible are “png”, “jpeg”, “tiff”, “bmp”
SavePlot can save sometimes also vector files (dependent on operation system and installation)

---

## Pie chart [1]

The classical one – but also with R not much better...

Used to display proportions, suitable for nominal data

$$
a_i = \frac{n_i} {N} * 360°
$$

Disadvantages:
- Color selection can influence the perception (red is seen larger then gray)
- Small differences are not easy visible

**totally No-Go: 3d-pies!!!**

---

## Pie chart [2]

I eat pie...

![](data:image/png;base64,#../images/03_session/3dpie.png)

.caption[source: http://www.lrz-muenchen.de/~wlm]

*The pieces »viel zu wenig«, »etwas zu wenig« und »gerade richtig« have exactly the same size, the piece »viel zu viel« is a bit smaller.*

---

## Pie chart [3]
.tiny[
.pull-left[
Data are a vector of counts

``` r
table(muensingen$fibula_scheme)
```

```
## 
##  A  B  C 
##  4 11  2
```

``` r
pie(table(muensingen$fibula_scheme))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-16-1.png)
]
.pull-right[
### Color palette:
The standard palette is pastel, if you prefer another:

``` r
pie(table(muensingen$fibula_scheme),
    col=c("red","green","blue"))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-17-1.png)

]
]

---

## Bar plot [1]

Generally the better alternative...
Bar plots are suitable for display of proportions as well as for absolute data. They can be used for every level of measurement.

.small[
.pull-left[

``` r
barplot(table(muensingen$fibula_scheme))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-18-1.png)
]

.pull-right[

``` r
barplot(muensingen$Length)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-19-1.png)

]

---

## Bar plot [2]

With names:

``` r
par(las=2)                          # turn labels 90°
barplot(muensingen$Length,          # plot fibulae length
        names.arg=muensingen$Grave) # with names of the graves
title("Fibulae length")             # add title
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-20-1.png)

---

## Bar plot [3]
Horizontal:

``` r
par(las=1)                          # turn labels back again
barplot(table(muensingen$fibula_scheme), # Plot counts fibulae scheme
        horiz=T,                         # horizontal
        cex.names=2)                     # make the labels bigger
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-21-1.png)

---

## Bar plot [4]

Display of counts
.tiny[
.pull-left[

``` r
my_new_table <- table(muensingen$fibula_scheme,
                      muensingen$Coils)
my_new_table
```

```
##    
##     3 4 6 10 12
##   A 1 3 0  0  0
##   B 0 3 6  1  1
##   C 0 1 1  0  0
```

``` r
barplot(my_new_table)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-23-1.png)
]
.pull-right[

``` r
barplot(my_new_table, beside=T, legend.text=T)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-24-1.png)
]
]

---

## Bar Plot [5]

Display of proportions
.tiny[
.pull-left[

``` r
table.prop<-prop.table(my_new_table,2)
table.prop
```

```
##    
##             3         4         6        10        12
##   A 1.0000000 0.4285714 0.0000000 0.0000000 0.0000000
##   B 0.0000000 0.4285714 0.8571429 1.0000000 1.0000000
##   C 0.0000000 0.1428571 0.1428571 0.0000000 0.0000000
```

``` r
barplot(table.prop)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-25-1.png)
]

.pull-right[

``` r
tmp<-barplot(table.prop,
             legend.text=T,  # add a legend
             col=rainbow(3)  # make it more colorful
             )

# add a title
title("ratio of fibulae schemes \n by number of coils",
      outer=TRUE,            # outside the plot area
      line=- 3)              # on line -3 above
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-26-1.png)
]
]

---

## Bar Plot [6]

Problems with bar plots – and also with many other charts

.pull-left[
Percent vs. count: percents often distort the relations
.tiny[

``` r
par(mfrow=c(2,1))
barplot(my_new_table,beside=T)
barplot(table.prop,beside=T)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-27-1.png)
]
]

.pull-right[
Scales: the choosen limits of the axes can distort the relations
.tiny[

``` r
par(mfrow=c(1,2))
barplot(muensingen$Length[1:2],xpd=F,ylim=c(45,55))
barplot(muensingen$Length[1:2],xpd=F)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-28-1.png)

``` r
par(mfrow=c(1,1))
```
]
]

---

## Box-plot (Box-and-Whiskers-Plot)

One of the best (my precious)!
.pull-left[
Used to display the distribution of values in a data vector of metrical (interval, ratio) scale
<pre>
1 2 3 4 5 6 7 8 9
____|___|___|____
</pre>

- thick line: mean
- Box: the inner both quantiles
- Whisker: last value < than 1.5 times the distance of the inner quantile
]

.pull-right[

``` r
boxplot(1:9)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-29-1.png)
]

---

## Box Plot [2]

.pull-left[

``` r
boxplot(muensingen$Length)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-30-1.png)
]

.pull-right[

``` r
boxplot(muensingen$Length ~
          muensingen$fibula_scheme)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-31-1.png)
]

---

## Box Plot [3]

More beautiful:
.pull-left[

``` r
par(las=1)
boxplot(Length ~ fibula_scheme,
        data = muensingen,
        main = "Length by type",
        col="grey",
        xlab="fibulae scheme",
        ylab= "length"
        )
```
]
.pull-right[
![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-33-1.png)
]

---

## Scatterplot [1]
.pull-left[

For 2 variables

Used to display a variable in relation to another one. Generally for all scales suitable, but for nominal and ordinal scale other charts are often better.

``` r
plot(muensingen$Length, muensingen$FL)
abline(
  lm(muensingen$FL~muensingen$Length),
  col="red")
```
]

.pull-right[
![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-35-1.png)
]

---

## Scatterplot [2]

Call additional libraries:
.tiny[
.pull-left[

``` r
library(car) # library for regression analysis
scatterplot(FL ~ Length, data = muensingen)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-36-1.png)
]

.pull-right[

``` r
library(ggplot2) # advanced plots library
b<- ggplot(muensingen,aes(x=Length,y=FL))
graph<-b + geom_point()
show(graph)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-37-1.png)
]
]

---

## Histogramm [1]
Used for classified display of distributions
Data reduction vs. precision: Display of count values of classes of values

.pull-left[

``` r
hist(muensingen$Length)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-38-1.png)

]

.pull-right[

``` r
hist(muensingen$Length, labels = T)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-39-1.png)

]

---

## Histogramm [2]

Custom breaks of classes

.pull-left[

``` r
hist(muensingen$Length,
     labels = T)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-40-1.png)

]

.pull-right[

``` r
hist(muensingen$Length,
     labels = T,
     breaks = 10)
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-41-1.png)

]

---

## Histogramm [3]

.pull-left[
More beautiful

``` r
hist(muensingen$Length,breaks=10,
     labels=T,
     col="red",
     xlab="Length",
     main="Histogram Fibulae Length")
```
]

.pull-right[
![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-43-1.png)
]

Disadvantages:
- Data reduction vs. precision → loss of information
- Actual display depends strongly on the choosen class width

---

## steam-and-leaf chart

An attempt to overcome the disadvantages of a histogram

Is not very often used. Scales like histograms.

``` r
stem(muensingen$Length)
```

```
## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##   0 | 34444
##   0 | 5555566677
##   1 | 13
```

Advantage:
- Information about the distribution inside the classes and the absolute values are (partly) visible.

---

## kernel smoothing (kernel density estimation)

Another attempt to overcome the disadvantages of a histogram

The distribution of the values is considered and a distribution curve is
calculated. Continuous distributions are better displayed, without artificial
breaks. Scales like histograms.

.pull-left[

``` r
plot(density(muensingen$Length))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-45-1.png)
]

.pull-right[
Histogram and kernel-density-plot together

``` r
hist(muensingen$Length, prob=T)
lines(density(muensingen$Length))
```

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-46-1.png)
]

---

## Style of charts

### Stay honest!

* [dax.csv](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/03/dax.csv)

Choice of display has a strong influence on the statement.

![](data:image/png;base64,#smada03_files/figure-html/unnamed-chunk-47-1.png)

---

## Style of charts

### Stay honest!
Choice of display has a strong influence on the statement.

### Clear layout!
Minimise Ratio of ink per shown information!

### Use the suitable chart for the data!
Consider nominal-ordinal-interval-ratio scale

---

## Suggestions for charts