class: title-slide, center, middle # Statistical methods for archaeological data analysis I: Basic methods ## 02 - Introduction into R ### Martin Hinz #### Institut für Archäologische Wissenschaften, Universität Bern 25.02.2025 .footnote[ .right[ .tiny[ You can download a [pdf of this presentation](smada02.pdf). ] ] ] --- ## Start R-Studio  --- ## Using R ### Start of the system: After R is started, you end on the prompt. ``` > ``` ### Change the working directory: ``` r getwd() # or something else setwd("U:\R") # or something else ``` Change the path according to your needs --- ## R as calculator ### Simplest way of use: ``` r 2+2 ``` ``` ## [1] 4 ``` ``` r 2^2 ``` ``` ## [1] 4 ``` ### Multiple commands are separated by ; ``` r (1 - 2) * 3; 1 - 2 * 3 ``` ``` ## [1] -3 ``` ``` ## [1] -5 ``` --- ## R as calculator ### Using functions: ``` r sqrt(2) #square root ``` ``` ## [1] 1.414214 ``` ``` r log(10) #logarith base e ``` ``` ## [1] 2.302585 ``` ``` r log(10, 10) #logarith base 10, like log(10, base=10) ``` ``` ## [1] 1 ``` --- ## Getting help Call of the help function: ``` r help(sqrt) ``` Even simpler? ``` r ? sqrt ``` Searching the help: ``` r help.search('logarithm') ``` --- ## Assignment of data to variables Naming variables for Values (Assignment): ``` r x <- 2 # no message will be given back x ``` ``` ## [1] 2 ``` ``` r pi # build in variable ``` ``` ## [1] 3.141593 ``` ### Arrow or equal sign? Classic assignment symbol in R is the arrow. Also possible: ``` r x=2 ``` Both are possible. Matter of tast. <- is clearer, I am using it that way --- ## Working with variables Display of already uses variables: ``` r ls() ``` ``` ## [1] "x" ``` Delete a variable: ``` r rm(x) # no message will be given back ls() ``` ``` ## character(0) ``` --- ## Using variables Calculations with variables: ``` r x <- 2 y <- 2 * x z <- sqrt(x) # no message will be given back ``` ``` r ls() ``` ``` ## [1] "x" "y" "z" ``` ``` r y ``` ``` ## [1] 4 ``` ``` r z ``` ``` ## [1] 1.414214 ``` --- class: inverse ## Exercise variables ### Calculation of a circle: Given is a circle with the radius r=5. Calculate the diameter d (2 \* r), the circumference u (2 \* π \* r) and the area a (π \* r^2). Add area a and circumference u, assign the result to the variable v and delete u and a. --- ## Scalars, vectors, matrices, data frames Data types in R ### Scalar A single number or date ``` r pi ``` ``` ## [1] 3.141593 ``` ### Vector A row of numbers or data ``` r ls() ``` ``` ## [1] "x" "y" "z" ``` --- ## Scalars, vectors, matrices, data frames Data types in R ### Matrix: A table of data of the same kind ``` r euro.cross ``` ``` ## ATS BEF DEM ESP FIM FRF ## ATS 1.000000000 2.93161486 0.142135709 12.0917422 0.432093050 0.476702543 ## BEF 0.341108927 1.00000000 0.048483759 4.1246012 0.147390797 0.162607493 ## DEM 7.035529673 20.62546336 1.000000000 85.0718109 3.040003477 3.353854885 ## ESP 0.082701069 0.24244768 0.011754775 1.0000000 0.035734557 0.039423810 ## FIM 2.314316324 6.78468413 0.328946992 27.9841163 1.000000000 1.103240477 ## FRF 2.097744212 6.14977811 0.298164361 25.3653822 0.906420695 1.000000000 ## IEP 17.471976881 51.22110711 2.483391826 211.2666399 7.549519785 8.328935807 ## ITL 0.007106602 0.02083382 0.001010102 0.0859312 0.003070713 0.003387735 ## LUF 0.341108927 1.00000000 0.048483759 4.1246012 0.147390797 0.162607493 ## NLG 6.244151907 18.30544854 0.887516960 75.5026750 2.698054644 2.976603092 ## PTE 0.068636087 0.20121457 0.009755639 0.8299299 0.029657176 0.032718997 ## IEP ITL LUF NLG PTE ## ATS 0.0572345080 140.714229 2.93161486 0.160149851 14.5695951 ## BEF 0.0195232016 47.998880 1.00000000 0.054628544 4.9698190 ## DEM 0.4026750791 989.999131 20.62546336 1.126739032 102.5048189 ## ESP 0.0047333550 11.637217 0.24244768 0.013244564 1.2049211 ## FIM 0.1324587561 325.657236 6.78468413 0.370637415 33.7186519 ## FRF 0.1200633578 295.182459 6.14977811 0.335953424 30.5632839 ## IEP 1.0000000000 2458.555749 51.22110711 2.798134501 254.5596294 ## ITL 0.0004067429 1.000000 0.02083382 0.001138121 0.1035403 ## LUF 0.0195232016 47.998880 1.00000000 0.054628544 4.9698190 ## NLG 0.3573809621 878.641019 18.30544854 1.000000000 90.9747653 ## PTE 0.0039283527 9.658074 0.20121457 0.010992059 1.0000000 ``` --- ## Scalars, vectors, matrices, data frames Data types in R ### Data frame: A table of data of different kind ``` r mtcars ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ``` --- ## Download data for further tasks * [height.RData](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/02/height.RData) * [kursmatrix.txt](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/02/kursmatrix.txt) * [kursdata.txt](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/02/kursdata.txt) * [kursdata.csv](https://raw.githubusercontent.com/BernCoDALab/smada/refs/heads/main/lectures/02/kursdata.csv) --- ## Data import through reading of files remember: ``` r getwd() setwd("my/location/of/my/working/directory") ``` Simple text file: ``` r kursmatrix <- matrix(scan("kursmatrix.txt"),ncol=2) ``` Data frame as simple text file: ``` r kursdata <- read.table("kursdata.txt") ``` Data frame as csv file: ``` r kursdata <- read.csv2("kursdata.csv") ``` Read with rownames ``` r kursdaten <- read.csv2("kursdata.csv",row.names = 1) ``` --- ## Using c() for data entry Assignment of values to a vector: ``` r places <- c("Leubingen", "Melz", "Bruszczewo") ``` ``` r categories <- c("Grab", "Hort", "Siedlung") categories ``` ``` ## [1] "Grab" "Hort" "Siedlung" ``` ``` r c(places, categories) ``` ``` ## [1] "Leubingen" "Melz" "Bruszczewo" "Grab" "Hort" ## [6] "Siedlung" ``` Naming the positions in a vector ``` r names(places)<-categories places ``` ``` ## Grab Hort Siedlung ## "Leubingen" "Melz" "Bruszczewo" ``` --- ## Functions on vectors [1] ### Data: .pull-left[ ``` r load("height.RData") height ``` ``` ## Bilbo Frodo ## 181 170 ## Aragorn Boromir ## 185 163 ## Pippin Gandalf grey ## 175 163 ## Merry Samwise ## 162 172 ## Theoden Eowyn ## 172 180 ## Arwen Gandalf white ## 187 158 ## Gimly Gollum ## 184 156 ``` ] .small[ .pull-right[ ``` r # Sum: sum(height) ``` ``` ## [1] 2408 ``` ``` r # Count: length(height) ``` ``` ## [1] 14 ``` ``` r # Mean: sum(height)/length(height) ``` ``` ## [1] 172 ``` ``` r # Or more convenient: mean(height) ``` ``` ## [1] 172 ``` ] ] --- ## Functions on vectors [2] .small[ ``` r # sort: sort(height) ``` ``` ## Gollum Gandalf white Merry Boromir Gandalf grey ## 156 158 162 163 163 ## Frodo Samwise Theoden Pippin Eowyn ## 170 172 172 175 180 ## Bilbo Gimly Aragorn Arwen ## 181 184 185 187 ``` ``` r # minimum: min(height) ``` ``` ## [1] 156 ``` ``` r # maximum: max(height) ``` ``` ## [1] 187 ``` ``` r # Or more convenient: range(height) ``` ``` ## [1] 156 187 ``` ] --- ## Functions on vectors [3] Change of the values through calculation: ``` r height.in.m <- height/100 height.in.m ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 1.81 1.70 1.85 1.63 1.75 ## Gandalf grey Merry Samwise Theoden Eowyn ## 1.63 1.62 1.72 1.72 1.80 ## Arwen Gandalf white Gimly Gollum ## 1.87 1.58 1.84 1.56 ``` but: ``` r test<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14) height.in.m + test ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 2.81 3.70 4.85 5.63 6.75 ## Gandalf grey Merry Samwise Theoden Eowyn ## 7.63 8.62 9.72 10.72 11.80 ## Arwen Gandalf white Gimly Gollum ## 12.87 13.58 14.84 15.56 ``` --- class: inverse ## Exercise vectors Data collection ceramics: An excavation produced the following numbers of flint artefacts: | flakes | blades | cores | debris | | ------ | ------ | ----- | ------ | | 506 | 104 | 30 | 267 | Assign the values to a named vector, calculate the proportion of the artefacts and sort the vector according to their percentage During the data collection on box with artefacts was missing, the following numbers has to be added to the vector: | flakes | blades | cores | debris | | ------ | ------ | ----- | ------ | | 52 | 24 | 15 | 83 | Moreover were 10 items each artefact type missing. Make a vector for the box, add it and the 10 missing to the original data and repeat the calculations. --- ## Sequences and repeated data Simple sequence: ``` r 1:10 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` Sequence with start value, end value and step size: ``` r seq(1,10,by=2) ``` ``` ## [1] 1 3 5 7 9 ``` ``` r seq(1,20,length=5) ``` ``` ## [1] 1.00 5.75 10.50 15.25 20.00 ``` Repeated data: ``` r rep(1,10) ``` ``` ## [1] 1 1 1 1 1 1 1 1 1 1 ``` ``` r rep(1:3,3) ``` ``` ## [1] 1 2 3 1 2 3 1 2 3 ``` ``` r rep(c("Anton","Berta","Claudius"),3) ``` ``` ## [1] "Anton" "Berta" "Claudius" "Anton" "Berta" "Claudius" "Anton" ## [8] "Berta" "Claudius" ``` --- ## Data access by index .tiny[ Access by position: ``` r height[1] ``` ``` ## Bilbo ## 181 ``` ``` r height[5] ``` ``` ## Pippin ## 175 ``` ``` r height[1:3] ``` ``` ## Bilbo Frodo Aragorn ## 181 170 185 ``` ``` r height[-(1:3)] ``` ``` ## Boromir Pippin Gandalf grey Merry Samwise ## 163 175 163 162 172 ## Theoden Eowyn Arwen Gandalf white Gimly ## 172 180 187 158 184 ## Gollum ## 156 ``` Access by name: ``` r height["Frodo"] ``` ``` ## Frodo ## 170 ``` ] --- ## Data entry into vectors .small[ Entry by position: ``` r height ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 181 170 185 163 175 ## Gandalf grey Merry Samwise Theoden Eowyn ## 163 162 172 172 180 ## Arwen Gandalf white Gimly Gollum ## 187 158 184 156 ``` ``` r height[1] <- 168 height ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 168 170 185 163 175 ## Gandalf grey Merry Samwise Theoden Eowyn ## 163 162 172 172 180 ## Arwen Gandalf white Gimly Gollum ## 187 158 184 156 ``` Entry by name: ``` r height["Bilbo"] <- 181 height ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 181 170 185 163 175 ## Gandalf grey Merry Samwise Theoden Eowyn ## 163 162 172 172 180 ## Arwen Gandalf white Gimly Gollum ## 187 158 184 156 ``` ] --- ## Logical values true/false-values: ``` r pi>4 ``` ``` ## [1] FALSE ``` ``` r height > 175 ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## TRUE FALSE TRUE FALSE FALSE ## Gandalf grey Merry Samwise Theoden Eowyn ## FALSE FALSE FALSE FALSE TRUE ## Arwen Gandalf white Gimly Gollum ## TRUE FALSE TRUE FALSE ``` --- ## Logical values Can be used for selection of values: ``` r height[height>175] ``` ``` ## Bilbo Aragorn Eowyn Arwen Gimly ## 181 185 180 187 184 ``` ``` r which(height>175) ``` ``` ## Bilbo Aragorn Eowyn Arwen Gimly ## 1 3 10 11 13 ``` ``` r sum(height>175)/length(height) ``` ``` ## [1] 0.3571429 ``` --- # Factors For encoding nominal values: ``` r sex <- factor(c("m", "m", "m", "m", "m", "m", "m", "m", "m", "f", "f", "m", "m", "m")) sex ``` ``` ## [1] m m m m m m m m m f f m m m ## Levels: f m ``` --- ## missing (NA) values .small[ Problem: values are missing ``` r height["Arwen"] <- 0 mean(height) ``` ``` ## [1] 158.6429 ``` ``` r sum(height)/13 ``` ``` ## [1] 170.8462 ``` therefore: code as N(ot)A(vailable) ``` r height["Arwen"] <- NA mean(height) ``` ``` ## [1] NA ``` ``` r mean(height, na.rm=T) ``` ``` ## [1] 170.8462 ``` ] --- ## matrices [1] Data of the same kind (numbers, factors...) .pull-left[ ``` r kursmatrix ``` ``` ## [,1] [,2] ## [1,] 39 181 ## [2,] 34 170 ## [3,] 23 185 ## [4,] 38 163 ## [5,] 23 175 ## [6,] 21 163 ## [7,] 23 162 ## [8,] 31 172 ## [9,] 25 172 ## [10,] 31 180 ## [11,] 24 187 ## [12,] 23 158 ## [13,] 23 184 ## [14,] 39 156 ``` ] .pull-right[ ``` r rownames(kursmatrix) <- names(height) colnames(kursmatrix)<-c("height","age") kursmatrix ``` ``` ## height age ## Bilbo 39 181 ## Frodo 34 170 ## Aragorn 23 185 ## Boromir 38 163 ## Pippin 23 175 ## Gandalf grey 21 163 ## Merry 23 162 ## Samwise 31 172 ## Theoden 25 172 ## Eowyn 31 180 ## Arwen 24 187 ## Gandalf white 23 158 ## Gimly 23 184 ## Gollum 39 156 ``` ] --- ## matrices [2] Operations on matrices .pull-left[ ``` r kursmatrix / 100 ``` ``` ## height age ## Bilbo 0.39 1.81 ## Frodo 0.34 1.70 ## Aragorn 0.23 1.85 ## Boromir 0.38 1.63 ## Pippin 0.23 1.75 ## Gandalf grey 0.21 1.63 ## Merry 0.23 1.62 ## Samwise 0.31 1.72 ## Theoden 0.25 1.72 ## Eowyn 0.31 1.80 ## Arwen 0.24 1.87 ## Gandalf white 0.23 1.58 ## Gimly 0.23 1.84 ## Gollum 0.39 1.56 ``` ] .pull-right[ ``` r kursmatrix[, 1] / 100 ``` ``` ## Bilbo Frodo Aragorn Boromir Pippin ## 0.39 0.34 0.23 0.38 0.23 ## Gandalf grey Merry Samwise Theoden Eowyn ## 0.21 0.23 0.31 0.25 0.31 ## Arwen Gandalf white Gimly Gollum ## 0.24 0.23 0.23 0.39 ``` ``` r kursmatrix / c(1:14, rep(2, 14)) ``` ``` ## height age ## Bilbo 39.000000 90.5 ## Frodo 17.000000 85.0 ## Aragorn 7.666667 92.5 ## Boromir 9.500000 81.5 ## Pippin 4.600000 87.5 ## Gandalf grey 3.500000 81.5 ## Merry 3.285714 81.0 ## Samwise 3.875000 86.0 ## Theoden 2.777778 86.0 ## Eowyn 3.100000 90.0 ## Arwen 2.181818 93.5 ## Gandalf white 1.916667 79.0 ## Gimly 1.769231 92.0 ## Gollum 2.785714 78.0 ``` ] --- ## Data frames [1] .pull-left[ ``` r kursdata <- data.frame(age = kursmatrix[,2], height = kursmatrix[,1], sex=sex) kursdata ``` ``` ## age height sex ## Bilbo 181 39 m ## Frodo 170 34 m ## Aragorn 185 23 m ## Boromir 163 38 m ## Pippin 175 23 m ## Gandalf grey 163 21 m ## Merry 162 23 m ## Samwise 172 31 m ## Theoden 172 25 m ## Eowyn 180 31 f ## Arwen 187 24 f ## Gandalf white 158 23 m ## Gimly 184 23 m ## Gollum 156 39 m ``` ] .pull-right[ ``` r kursdata[,"age"] ``` ``` ## [1] 181 170 185 163 175 163 162 172 172 180 187 158 184 156 ``` ``` r kursdata$age ``` ``` ## [1] 181 170 185 163 175 163 162 172 172 180 187 158 184 156 ``` ] --- ## Data frames [2] ### Operation on data frames ``` r kursdata$height / 100 ``` ``` ## [1] 0.39 0.34 0.23 0.38 0.23 0.21 0.23 0.31 0.25 0.31 0.24 0.23 0.23 0.39 ``` ``` r summary(kursdata) ``` ``` ## age height sex ## Min. :156.0 Min. :21.00 f: 2 ## 1st Qu.:163.0 1st Qu.:23.00 m:12 ## Median :172.0 Median :24.50 ## Mean :172.0 Mean :28.36 ## 3rd Qu.:180.8 3rd Qu.:33.25 ## Max. :187.0 Max. :39.00 ``` ``` r tapply(kursdata$height, kursdata$sex, mean, na.rm=T) ``` ``` ## f m ## 27.5 28.5 ``` --- ## Build in datasets ``` r data() ``` ``` r Data sets in package 'datasets': AirPassengers Monthly Airline Passenger Numbers 1949-1960 BJsales Sales Data with Leading Indicator BJsales.lead (BJsales) Sales Data with Leading Indicator BOD Biochemical Oxygen Demand CO2 Carbon Dioxide Uptake in Grass Plants ChickWeight Weight versus age of chicks on different diets DNase Elisa assay of DNase EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998 Formaldehyde Determination of Formaldehyde HairEyeColor Hair and Eye Color of Statistics Students Harman23.cor Harman Example 2.3 Harman74.cor Harman Example 7.4 Indometh Pharmacokinetics of Indomethacin InsectSprays Effectiveness of Insect Sprays JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share LakeHuron Level of Lake Huron 1875-1972 LifeCycleSavings Intercountry Life-Cycle Savings Data Loblolly Growth of Loblolly pine trees Nile Flow of the River Nile Orange Growth of Orange Trees ``` --- ## Data export through save Simple text file: ``` r write(kursmatrix,"kursmatrix.txt") ``` Data frame as simple text file: ``` r write.table(kursdata,"kursdata.txt") ``` Data frame as csv file: ``` r write.csv2(kursdata,"kursdata.csv") ``` Attention: decimal separator is . not , ``` r kursdata$height <- kursdata$height/100 write.csv(kursdata,"kursdata.csv") ``` problems with importing such csv into e.g. Excel therefore: ``` r write.csv2(kursdata,"kursdata.csv") ``` --- ## R <-> Excel Always save as csv There are packages for R to read and write Excel files but for them additional software (Perl, Python e.a.) is neccessary