DOM from Berlin´s surface waters

Data source: Romero Gonzalez-Quijano 2022. Biogeosciences 19: 2841-2853, https://doi.org/10.5194/bg-19-2841-2022 A range of urban water bodies in Berlin´s extensive ‘aquascape’ were investigated for properties of dissolved organic matter (DOM). There were 4 seasonal campaigns, water bodies were classified into 2 lotic (rivers and streams) and 2 lentic (lakes and ponds) types. DOM can be considered to drive and indicate various ecosystem functions like respiration, primary production. It may also indicate pollution and react to input from the terrestrial surrounding.

We will need the ‘vegan’ package (a lot of ordinations coded by ecologists for ecologists). Some more classical ordinations like PCA are available in the package ‘MASS’.

library(vegan)
library(MASS)

DOM<-read.table(file="data/DOM_Berlin_Romero.txt",header=TRUE)
names(DOM)

The dataset includes DOC indicating pure quantity and several descriptors of DOM quality measured by optical instruments (SR:fresh,C1:C6), chromatography (HMWS.C:humic.N) and mass spectrometry (avgmolmass:O_C). See the Excel sheet or the original paper for detailed variable description. Mass-spectrometric information only exist for three seasons. The dataset also includes variables that should be considered as drivers of DOM composition: nutrient concentrations, Chlorophyll-a, landuse percentages (NH4:URG).

  1. DOM consists of thousands of chemical structures. We try to understand its ‘composition’ by throwing analytics on it to produce many descriptors, which are likely at least partially collinear. Reduce the dimensionality of the DOM dataset to a meaningful number of dimensions by PCA. Leave away incomplete mass-spectrometric information. How much variance do individual PC-axes explain? How many axes do you need to properly represent the dataset? Note that the latter question may be answered based on PCA-results or based on theoretical considerations considering the study design of 4 water body types sampled across 4 seasons.
DOM2<-DOM[,8:27]
boxplot(scale(DOM2))
# also make histograms
# maybe consider transformation of variables to improve linearity of relationships
plot(scale(DOM2))
rda()
summary()
screeplot()
  1. Create a biplot for PC1 and PC2 with the most important variables defining these two axes. Plot the various water bodies and code symbol shape and symbol color to represent the study design targeting 4 water body types and 4 seasons. Note that sites and variables (or subsets thereof) do not have to be plotted onto a single biplot.
levels(DOM$Type)
DOM$col.type<-DOM$Type
levels(DOM$col.type)<-c("darkgreen","lightgreen","darkblue","lightblue")
DOM$col.type<-as.character(DOM$col.type)
  1. Assess differences between water body types with respect to average composition of DOM and with respect to seasonal turnover of DOM composition. This question can be translated to an assessment of location and of seasonal variation along PC1 and PC2. You may add univariate formal hypothesis tests (Anova, t-test, Bartlett-test, F-test) using the PCs as response metavariables. Use various graphical means for plotting of the site scores for illustration. Which water bodies differ markedly among each other in DOM? Which water bodies show more seasonal DOM turnover?
ordispider()
ordiellipse()
ordihull()
  1. Adding incomplete descriptors: Mass-spectrometric information is incomplete (we lack data from one season). It could still be informative to show how these data behave in the ordination space. Compute correlations with PCs and plot these as (maybe differently colored) arrows into ordination space.

  2. Assess potential drivers: The dataset includes a few variables that could be considered drivers of DOM composition (or drivers of processes that influence DOM). Assess these drivers by plotting them into ordination space and running correlations/regressions with PC1 and PC2.

envfit() # take care: behaviour of variables not necessarily monotonous in ordination space
ordisurf() # as contourplot
  1. Assessing drivers of DOM composition by post-hoc analysis of an unconstrained ordination like a PCA may not work well if those drivers exert only weak control or if important drivers were missed. Then the (variance-driven) PCA-dimensions may not be well aligned with the available drivers. RDA looks explicitly for an ordination that can be well aligned with a matrix of constraining variables. Run a RDA and compare its outcomes to PCA. As drivers of DOM composition you can consider nutrient concentrations, Chlorophyll-a, landuse percentages (NH4:URG). What do you learn when PCA and RDA result in either similar or different ordinations?
zDOM2<-scale(DOM2) # must at least be centered even if dimensionally homogeneous!
xmat<- # the constraints (or drivers or predictors)
rda<-rda(zDOM2~...+...+...,data=xmat) # take care: confusing X and Y argument names if you work without formula notation
summary()
RsquareAdj()
anova() # note argument "by"
scores()
plot()
Arrows()