Computer Aided Archaeology

class: title-slide, center, middle

#  Computer Aided Archaeology

##  05 - Visualisation I

###  Martin Hinz

####  Institut für Archäologische Wissenschaften, Universität Bern

18/10/23
---

## Why data visualisation

Converting raw data to a form that is viewable and understandable to humans

.pull-left[
* humans are visual animals -> we evolved to identify patterns visually
* helps to map complex information into an meaningful picture
* enables to "see" more than two dimensions of the data and their interplay
]

.pull-right[
![](data:image/png;base64,#../images/05_session/escher-63-rob-hans.jpg)
]

---

## Data, variables, values

.pull-left[
- variable:
  - What ist measured or analysed.
  - e.g. height
- item:
  - That whichs variable is measure
  - e.g. me as „possessor“ of a height, graves, persons...
- values:
  - The actual measurement.
  - e.g. my height is 1.81 m.
]

.pull-right[
![](data:image/png;base64,#../images/05_session/variable_item_value.png)
]

---

## Levels of measurement

![](data:image/png;base64,#../images/05_session/categorical-data.jpg)

### nominal or categorical:
  - You can only decide if something belongs to a category
  - Categories which do not have a defined relationship among each other, only counting is possible (e.g. sex)
  
---

## Levels of measurement

![:width 70%](data:image/png;base64,#../images/05_session/ordinal_chilis.png)

### ordinal:
  - Categories which are comparable and differ from each other in their characteristic [size/power/intensity]
  - their rank is determinable (e.g. preservation conditions – bad < medium < good)

---

## Levels of measurement

![:width 70%](data:image/png;base64,#../images/05_session/Kelvin_og_Celsius_temperaturskalaer.png)

### metric:
  - Variable has a defined system of measurement, all calculations are possible. To distinguish are

1. interval: The variable has an arbitrary choosen neutral point (°C)
2. ratio: The variable has an absolute neutral point (°K)

- Sometimes also used: absolut scale
  - counts (number of inhabitans)

---

## Levels of measurement

![](data:image/png;base64,#../images/05_session/scales_of_measurements.png)
---

## continuous vs. discrete

.pull-left[
### discrete variable:
  - Variable which can take only certain values without intermediate values
  - e.g. income, counts of ceramic objects, sex (?)
  - 'counted'
  
### continuous variables:
  - Variable which can take all value and intermediate value
  - e.g. height, temperature, proportion value
  - 'measured'
]

.pull-right[
![](data:image/png;base64,#../images/05_session/quantitative-data.jpg)

.caption[Source: https://statsthewayilikeit.com]
]

---
## Cross tables (contingency tables)

### For summary of data

Cross tabulations summarise (mostly) categorical data by counting the co-occurrence of 2 (or more) categories per unit.

.pull-left[
![:width 75%](data:image/png;base64,#../images/05_session/crosstab_1.png)
]
.pull-right[
![](data:image/png;base64,#../images/05_session/crosstab_2.png)
]

-> "Pivot-Table"
---
## Successful visualisation of the past

![:width 70%](data:image/png;base64,#../images/05_session/playfair.jpg)

.caption[William Playfair, 1786. source: wikipedia]
---
## Successful visualisation of the past

![:width 75%](data:image/png;base64,#../images/05_session/Nightingale.jpg)

.caption[Florence Nightingale, 1857. source: wikipedia]

---
## Successful visualisation of the past

![:width 100%](data:image/png;base64,#../images/05_session/minard.jpg)
.caption[Charles Joseph Minard, 1869. source: wikipedia]

---
## Objects of Visualisation

.pull-left[
### Items

Objects you like to display (entities in DB speach)

![:width 100%](data:image/png;base64,#../images/05_session/visualisation_items.png)
]
.pull-right[
### Links

The relationships of these objects, if there are any (Relationship also in DB speach)

![:width 100%](data:image/png;base64,#../images/05_session/visualisation_links.png)
]
.caption[Machiraju 2020]

---
## Dimensions of Visualisation

.pull-left[
* Position (coordinates, slope, orientation, ...)
* Color (Hue, Saturation, Transparency)
* Texture
* Shape
* Size (length, area, [Volume])

* Proximity/Density

![:width 90%](data:image/png;base64,#../images/05_session/dimensions_visualisation.png)
.caption[Machiraju 2020]
]

.pull-right[
![:width 100%](data:image/png;base64,#../images/05_session/bertin_1967.png)
.caption[Bertin 1967]

]
---
## Combining Dimensions of Visualisation

![:width 100%](data:image/png;base64,#../images/05_session/adding_multiple_channels.png)
.caption[Machiraju 2020]

---
## Dimensions of Visualisation and levels of measurement

* Position (coordinates, slope, orientation, ...)
* Color (Hue, Saturation, Transparency)
* Texture
* Shape
* Size (length, area, [Volume])

* Proximity/Density

![:width 100%](data:image/png;base64,#../images/05_session/channel_mapping.png)
.caption[Machiraju 2020]

---

## Basics about charts

.pull-left[
### Principles for good charts according to E. Tufte:
(The Visual Display of Quantitative Information. Cheshire/
Connecticut: Graphics Press, 1983)

- „Graphical exellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.”
- Data-ink ratio = „proportion of a graphic’s ink devoted to the non-redundant display of data-information“ (kein chartjunk!)
- „Graphical excellence is often found in simplicity of design and complexity of data.“

\- after Müller-Scheeßel
]

.pull-right[
![](data:image/png;base64,#../images/05_session/plot_elements.png)
]

---
## Pie chart [1]

.pull-left[
The classical one – but comes with destinct flaws...

Used to display proportions, suitable for nominal data

$$
a_i = \frac{n_i} {N} * 360°
$$

Disadvantages:
- Color selection can influence the perception (red is seen larger then gray)
- Small differences are not easy visible

**totally No-Go: 3d-pies!!!**
]

.pull-right[
![](data:image/png;base64,#../images/05_session/pie_example.png)
]

---

## Pie chart [2]

I eat pie...

![](data:image/png;base64,#../images/05_session/3dpie.png)

.caption[source: http://www.lrz-muenchen.de/~wlm]

*The pieces »viel zu wenig«, »etwas zu wenig« und »gerade richtig« have exactly the same size, the piece »viel zu viel« is a bit smaller.*

---
## Pie chart [3]
.pull-left[
![](data:image/png;base64,#../images/05_session/comp_size_pie_3x.png)
3x
]

.pull-right[
![](data:image/png;base64,#../images/05_session/comp_size_pie_5x.png)
5x
]

---

## Bar plot [1]
.pull-left[
![](data:image/png;base64,#../images/05_session/comp_size_bar_2x.png)
2x
]

.pull-right[
![](data:image/png;base64,#../images/05_session/comp_size_bar_4x.png)
4x
]

---

## Bar plot [2]

Generally the better alternative...
Bar plots are suitable for display of proportions as well as for absolute data. They can be used for every level of measurement.

.pull-left[
![](data:image/png;base64,#../images/05_session/bar_basic.png)
]

.pull-right[
![](data:image/png;base64,#../images/05_session/bar_stacked.png)

]

---

## Bar plot [3]

Combination of different information and proportional visualisation is possible.

.pull-left[
![](data:image/png;base64,#../images/05_session/bar_side_by_side.png)
]

.pull-right[
![](data:image/png;base64,#../images/05_session/bar_proportional.png)

]

---
## Scatterplot

Shows the relationship between two (metric) variables

.pull-left[
You can see:

- values of items (points) on both variables
- relationship between variables
  - positive or negative relationship (or no at all)
- you can compute quantitative values describing the relationship
  - regression analysis
]

.pull-right[
![](data:image/png;base64,#../images/05_session/scatter_miscovice.png)
]

---
## Scatterplot

Shows the relationship between two (metric) variables

.pull-left[
### R

![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-8-1.png)
]

.pull-right[
### Libre Office

![](data:image/png;base64,#../images/05_session/scatter_miscovice.png)
]
---

## Box-plot (Box-and-Whiskers-Plot)

One of the best (my precious)!
.pull-left[
Used to display the distribution of values in a data vector of metrical (interval, ratio) scale
<pre>
1 2 3 4 5 6 7 8 9
____|___|___|____
</pre>

- thick line: median
- Box: the inner both quantiles
- Whisker: last value < than 1.5 times the distance of the inner quantile

]

.pull-right[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-9-1.png)

![](data:image/png;base64,#../images/05_session/boxplot_schema.png)
]

---

## Box Plot [2]

.pull-left[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-10-1.png)
]

.pull-right[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-11-1.png)
]

---

## Histogramm
Used for classified display of distributions
Data reduction vs. precision: Display of count values of classes of values

.pull-left[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-12-1.png)

]

.pull-right[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-13-1.png)
]

Disadvantages:
- Data reduction vs. precision → loss of information
- Actual display depends strongly on the choosen class width

---

## kernel smoothing (kernel density estimation)

Another attempt to overcome the disadvantages of a histogram

The distribution of the values is considered and a distribution curve is
calculated. Continuous distributions are better displayed, without artificial
breaks. Scales like histograms.

.pull-left[
![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-14-1.png)
]

.pull-right[
Histogram and kernel-density-plot together

![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-15-1.png)
]

---

## Style of charts

### Stay honest!

Choice of display has a strong influence on the statement.

![](data:image/png;base64,#session_05_visualisation_1_files/figure-html/unnamed-chunk-16-1.png)

---

## Style of charts

### Stay honest!
Choice of display has a strong influence on the statement.

### Clear layout!
Minimise Ratio of ink per shown information!

### Use the suitable chart for the data!
Consider nominal-ordinal-interval-ratio scale

---

## Suggestions for charts

---
class: inverse, middle, center
# Any questions?

.footnote[
.right[
.tiny[
You might find the course material (including the presentations) at

https://berncodalab.github.io/caa

You can contact me at

<a href="mailto:martin.hinz@iaw.unibe.ch">martin.hinz@iaw.unibe.ch</a>
]
]
]