(Dis)similarity coefficients

Two examples of dissimilarity indices here computed between any two sites for \(p\) variables:

Euclidean distance

\[ d_{Euc}=\sqrt{\Delta{x_1}^2+\Delta{x_2}^2+...+\Delta{x_p}^2} \] where \(\Delta{x}\) is the distance between two sites along any \(X\)-variable.

Binary version: \[ d_{Euc}=\sqrt{A+B-2J} \] A and B are numbers of species at compared sites, and J is the number of species that occur on both compared sites.

Bray-Curtis dissimilarity \[ d_{BC}=\frac{\sum_{j=1}^p{|x_{1j}-x_{2j}|}}{\sum_{j=1}^p{(x_{1j}+x_{2j})}} \]
Binary version: \[ d_{BC}=\frac{A+B-2J}{A+B} \]

Double-zero (joint absences for two sites) problem: The fact that a species is missing does not say anything about ecological similarity or difference between both sites.

Double-zero can mean:

Sites are located outside of the species niche on the same side (sites A and B), which means sites are similar, or on opposite sites (A and C), which means sites are different.
Sites are located inside species niche (D and E), but the species does not occur because it didn’t get there (dispersal limitation), or it was present, but overlooked (sampling bias)

Bray-Curtis is well-suited for data on species composition because it ignores double-zeros.

Euclidean distance is useful for environmental data and especially for representing geographic space (from x/y coordinates), but rarely for species data.

(Dis)similarity coefficients

(Dis)similarity coefficients: some properties

symmetrical: consider a double zero as contributing to resemblance just like any other value
asymmetrical: skip double zeros, considering that these are not informative (absence of a species at two sites does not tell anything about how similar these sites are!)
metric: minimum 0, positive for inequality, symmetric (\(D_{(a,b)}=D_{(b,a)}\)), triangle inequality holds

\[ D_{(a,b)}+D_{(b,c)}\geq{D_{(a,c)}} \] * semimetric: do not follow the triangle inequality

\[ D_{(a,b)}+D_{(b,c)}<{D_{(a,c)}} \] … in which case perfect projections into Euclidean space are not possible. Two often used transformations to make semimetric coefficients analyzable in Euclidean space are adding a constant or \(\sqrt(D)\).

Triangle inequality: Distance via an intermediate point must be higher than or equal to distance without such an intermediate point. E.g. consider B as intermediate between A and C as in a triangle: the distance from A to C via B must be larger than the direct distance from A to C.

Using dissimilarity: Non-metric multidimensional scaling (NMDS)

The goal: Collapse information from multiple dimensions into just a few, so they can be visualized and interpreted.

Iterative procedure:

Compute matrix of dissimilarities.
Decide on k, the number of reduced dimensions (typically 2).
Arrange objects (e.g. sites) in a random starting configuration.
Compute a measure of fit that expresses the match between inter-object distances of the configuration and the observed dissimilarities. A Shepard-plot shows residuals as stress which is inversely related to fit. The measure of fit is computed using ranks of observed dissimilarities and configuration distances. (smallest rank corresponding to smallest dissimilarity)
Reiteratively reposition the objects in the low-dimensional space and recompute fit to improve the match between inter-object distances and observed dissimilarities.
A final configuration is achieved when no more repositioning improves the fit (or reduces the stress). Steps 3) to 6) may be repeated with different random starting positions to avoid getting trapped in local minima.

Non-metric multidimensional scaling (NMDS) in R

# We sampled stream biofilm at 42 different streams across the Vjosa river network. Algal species were derived by 18S rRNA gene amplification
vjosa<-read.csv(file="data/data_vjosa.csv", sep=";") #load data
dim(vjosa) #
## [1]  42 662

com<-vjosa[,grep("ASV",names(vjosa))] # choose only ASV columns, these are the species
env<-vjosa[,-grep("ASV",names(vjosa))] # choose everything expect ASV


# We try to find the "best" reproduction of dissimilarities among sites in a 2 dimensional space.
# We use non-metric multidimensional scaling on Bray-Curtis dissimilarity matrix.

mds_vjosa <- metaMDS(comm = com, distance = "bray", k = 2) # to run a NMDS, $points to get scores, $stress to get information about fit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1742853 
## Run 1 stress 0.1853766 
## Run 2 stress 0.1742853 
## ... New best solution
## ... Procrustes: rmse 2.64764e-05  max resid 0.0001220507 
## ... Similar to previous best
## Run 3 stress 0.1555025 
## ... New best solution
## ... Procrustes: rmse 0.07805432  max resid 0.3845205 
## Run 4 stress 0.1720438 
## Run 5 stress 0.1572333 
## Run 6 stress 0.1555025 
## ... New best solution
## ... Procrustes: rmse 2.876438e-06  max resid 1.029209e-05 
## ... Similar to previous best
## Run 7 stress 0.1555026 
## ... Procrustes: rmse 0.0001253696  max resid 0.0004936047 
## ... Similar to previous best
## Run 8 stress 0.1555025 
## ... New best solution
## ... Procrustes: rmse 2.630782e-05  max resid 9.127394e-05 
## ... Similar to previous best
## Run 9 stress 0.1624049 
## Run 10 stress 0.1637225 
## Run 11 stress 0.1555025 
## ... Procrustes: rmse 5.632574e-05  max resid 0.000213458 
## ... Similar to previous best
## Run 12 stress 0.1624049 
## Run 13 stress 0.1833159 
## Run 14 stress 0.1555026 
## ... Procrustes: rmse 5.176445e-05  max resid 0.0002051233 
## ... Similar to previous best
## Run 15 stress 0.1555025 
## ... New best solution
## ... Procrustes: rmse 9.128518e-06  max resid 3.578814e-05 
## ... Similar to previous best
## Run 16 stress 0.1555025 
## ... Procrustes: rmse 2.029508e-05  max resid 6.963497e-05 
## ... Similar to previous best
## Run 17 stress 0.1555025 
## ... Procrustes: rmse 2.84255e-05  max resid 9.313011e-05 
## ... Similar to previous best
## Run 18 stress 0.1979301 
## Run 19 stress 0.1894487 
## Run 20 stress 0.2024924 
## *** Best solution repeated 3 times
mds_vjosa$stress
## [1] 0.1555025
## % of dissimilarities unrepresented
## Generally, stress < 0.05 provides an excellent represention in reduced 
# dimensions, < 0.1 is great, < 0.2 is good, and stress > 0.3 provides a poor representation
# if high stress is a problem, the argument trymax (number of default iterations) or number of dimensions (e.g. k=3) can be increased 

stressplot(mds_vjosa) # to compare configuration distances with observed dissimilarities

goodness(mds_vjosa) # sample-specific goodness of fit
##  [1] 0.01797343 0.02117174 0.03075435 0.02266354 0.01875799 0.03030840
##  [7] 0.01756087 0.02204496 0.02533707 0.01858164 0.02069291 0.02263214
## [13] 0.02771618 0.02030960 0.01414603 0.01925585 0.03355964 0.01377055
## [19] 0.01551199 0.03023133 0.01754644 0.01956208 0.01845611 0.01855135
## [25] 0.03660484 0.02693934 0.03854815 0.02876442 0.02292746 0.04194374
## [31] 0.02756328 0.01684616 0.02297853 0.03021929 0.02105799 0.02601296
## [37] 0.01712468 0.02016921 0.01700964 0.01543156 0.01517806 0.02517505

#quick plot
plot(mds_vjosa, type="t")

#Do different sizes of streams (approximated by catchment area) have different communities?
ordisurf(mds_vjosa, env$catchment)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
## 
## Estimated degrees of freedom:
## 3.9  total = 4.9 
## 
## REML score: 367.5864

#customized plot
#use catchment_class as point size in plot
cex.catch <- as.factor(env$catchment_class)
levels(cex.catch)<-c(3,2,1)
cex.catch<-as.numeric(as.character(cex.catch))

plot(mds_vjosa$points,asp=1,cex=cex.catch, xlab="MDS dimension 1",ylab="MDS dimension 2")

#connect catchment classes in plot
ordihull(mds_vjosa, env$catchment_class)
#alternative: ordispider(mds_vjosa, env$catchment_class)

legend("topleft", pch=21, pt.cex=c(1,2,3), legend=c("small","medium", "large"))

Hypothesis tests in the distance world

All known study designs with factors or continuous predictors may be transferred to the distance domain.

PERMANOVA: An ANOVA-type of analysis testing effects of factors (also in interaction) on a multivariate matrix, thus MANOVA. Useful to test effects on composition. In the distance domain known as permutational MANOVA. In R implemented as vegan::adonis.
dbRDA (distance-based RDA) and CAP (canonical analysis of principal coordinates): multiple-step analysis to test for effects of continuous predictors on a dissimilarity matrix. In R vegan::capscale.
Mantel-test: An old-school test for correlation between two distance/dissimilarity matrices (e.g. one describing environment or physical distance, the other describing community turnover or genetic differentiation). In R vegan::mantel.

Permutational MANOVA (PERMANOVA)

Uses a test statistic based on distances within groups (to a group centroid or averaged among all pairs) versus distances from group centroids to the overall centroid.

The within-group sum of squares (\(SS_w\)) is the sum of squared distances from individual replicates to their group centroid.
The among-group sum of squares (\(SS_A\)) is the sum of squared distances from group centroids to the overall centroid.
A (pseudo-) F-value is computed using number of groups (a) and the total number of observations (N) as: \[ F=\frac{SS_A/(a-1)}{SS_w/(N-a)} \]
Significance is assessed by recomputing the test-statistic after permutations of group assignment.

Prerequisite similar to ANOVA: homogeneous dispersion (multivariate variance, cloud shape). Tested using within-group distances to centroids.

Permutational MANOVA (PERMANOVA) in R

# PERMANOVA - non-parametric permutational MANOVA #
# a multivariate hypothesis test: 
# 1 factor (catchment class) and 1 multivariate response = "community composition"

# compute a distance matrix:
dmat = vegdist(com, method = "bray") # compute a dissimilarity matrix

# first testing for homogeneity of dispersion (homogeneous distances to group centroids)
catch_fac<-as.factor(env$catchment_class)

disp.check<-betadisper(dmat, catch_fac)
disp.check$distances
##  [1] 0.2662608 0.5158472 0.5197804 0.6105727 0.2552048 0.7176511 0.2110770
##  [8] 0.7107280 0.5253533 0.1690616 0.2339827 0.3725949 0.7270872 0.4702762
## [15] 0.4190021 0.2343220 0.5979947 0.4018676 0.4552224 0.7038676 0.4321580
## [22] 0.5239102 0.5041751 0.6330266 0.7008383 0.7711862 0.7500328 0.6306801
## [29] 0.4179180 0.5134026 0.7289303 0.4089514 0.6384028 0.7877881 0.6125022
## [36] 0.4844181 0.3890526 0.5498081 0.3591060 0.3516448 0.3929632 0.5724630

boxplot(disp.check$distances~catch_fac)

# or directly: boxplot(disp.check)

anova(lm(disp.check$distances~catch_fac))
## Analysis of Variance Table
## 
## Response: disp.check$distances
##           Df  Sum Sq  Mean Sq F value  Pr(>F)   
## catch_fac  2 0.28973 0.144865  6.4718 0.00374 **
## Residuals 39 0.87297 0.022384                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#or directly on anova(disp.check) #also works directly on disp.check

# the actual PERMANOVA would work like this, however, we do not have homogeneous dispersion...
adonis2(dmat~catch_fac)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dmat ~ catch_fac)
##           Df SumOfSqs      R2    F Pr(>F)  
## catch_fac  2   1.0397 0.08149 1.73  0.026 *
## Residual  39  11.7190 0.91851              
## Total     41  12.7587 1.00000              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#... however, we found another interesting result, beta diversity among small stream is larger then among medium stream and large streams.

Cluster analysis

Hierarchical group-forming (clustering) based on pairwise (dis)similarity.

Three types:

Divisive: start with all, successively split into 2.
Agglomerative: start with individual observations and cluster pairwise, continue grouping clusters.
Non-hierarchical: e.g. K-means clustering (forcing K clusters)

Various “linkage rules” to group clusters in agglomerative clustering:

Single linkage looks for the nearest neighbour of any group member to other observations (tends to chaining).
Average linkage computes dissimilarities between two groups as the average dissimilarity among all pairwise group member dissimilarities (one from each cluster).
UPGMA (unweighted-pair-groups method using arithmetic averages) is a version of an average linkage rule.

lipids<-read.table(file="data/BacterialMembrane.txt",header=TRUE)
names(lipids)
##  [1] "case"        "isolate"     "temperature" "FA1_SAnb"    "FA2_MU"     
##  [6] "FA3_SAnb"    "FA4_SAb"     "FA5_SAb"     "FA6_SAnb"    "FA7_MU"     
## [11] "FA8_SAb"     "FA9_SAnb"    "FA10_MU"     "FA11_SAb"    "FA12_SAnb"  
## [16] "FA13_MUb"    "FA14_MU"     "FA15_SAnb"   "FA16_MU"     "FA17_MU"    
## [21] "sum_MU"      "sum_SA"      "sum_SAbran"  "sum_SAnbran" "SA_branprop"
lip_data<-lipids[,4:20]
# abbreviations: MU = mono-unsaturated, SA = saturated, nb = non-branched, b = branched
# SA_branprop = proportion of branched and saturated FAs
# theory: unsaturated and branched FAs increase fluidity of membrane
# test (i) adaptation vs. (ii) acclimatization to temperature by changing FA composition of membranes

isolate<-factor(lipids$isolate)
temperature<-factor(lipids$temperature)
combifac<-factor(paste(temperature,"_",isolate,sep=""))

## distance matrix: Euclidean distance on arcsine-sqr-data
as_lip_data<-asin(sqrt(lip_data)) # very old-school, better don´t do ;-)
lipids_distE<-vegdist(as_lip_data, method="euclidean") 

## alternatively: Bray-Curtis distance with proportional data
lipids_distBC<-vegdist(lip_data, method="bray")

# cluster analysis #
# various agglomeration methods available and the choice is important, explore method=
# "single": nearest neighbour counts (good for gradients, but makes chains)
# "complete": all group members must be close, farthest group member counts (makes small spheric groups, good to find outliers)
# "average": compromise average strategy, new member joins at mean distance to all group members, actually UPGMA
# "ward.D": aims at minimizing within-group sums of squares of distances
lipids_cluster<-hclust(lipids_distBC, method = "ward.D")
lipids_cluster$height<-sqrt(lipids_cluster$height) # may help

# then compare effect of agglomeration method on dendrogram
plot(lipids_cluster, hang = -1, labels = combifac, ylab = "BC")


# which agglomeration method (and thus dendrogram) is best?
#cophenetic(lipids_cluster) # linkage distances in dendrogram
plot(lipids_distBC,cophenetic(lipids_cluster))

cor(lipids_distBC,cophenetic(lipids_cluster),method="spearman")
## [1] 0.8600969

cutree(lipids_cluster,k=4)
##  [1] 1 1 1 1 2 2 2 3 3 3 3 3 3 2 2 2 1 1 1 1 4 3 3 3 3 2 4 4
cutree(lipids_cluster,h=0.05)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12  9 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27

plot(lipids_cluster, hang = -1, labels = combifac, ylab = "BC")
rect.hclust(lipids_cluster,k=4) # or specify height h instead of k

Using dissimilarity: Principal Coordinate Analysis (PCoA)

(aka metric multidimensional scaling)

PCoA = distance-based unconstrained ordination

The goal of PCoA is to project sites onto Cartesian (Euclidean) coordinates from pairwise dissimilarities.

Imagine a road distance matrix between major towns. A 2D-projection of all pairwise distances produces a map (which gets better with less mountains and straighter roads in the area).

Steps in PCoA:

Place the first point (sample site) at the origin of a coordinate system.
Add the second point the correct distance away from the first point along the first axis.
Position the third point the correct distance away from each of the first two points, adding a second axis if necessary.
Continue until all points have been added. The result is a set of no more than n-1 axes.

But how do we get back down to fewer (e.g. 2) dimensions? Well, simply do a PCA on these constructed points.

Conduct a PCA on the constructed points to organize the variation among the points in a series of axes of diminishing importance.

lipids<-read.table(file="data/BacterialMembrane.txt",header=TRUE)
names(lipids)
##  [1] "case"        "isolate"     "temperature" "FA1_SAnb"    "FA2_MU"     
##  [6] "FA3_SAnb"    "FA4_SAb"     "FA5_SAb"     "FA6_SAnb"    "FA7_MU"     
## [11] "FA8_SAb"     "FA9_SAnb"    "FA10_MU"     "FA11_SAb"    "FA12_SAnb"  
## [16] "FA13_MUb"    "FA14_MU"     "FA15_SAnb"   "FA16_MU"     "FA17_MU"    
## [21] "sum_MU"      "sum_SA"      "sum_SAbran"  "sum_SAnbran" "SA_branprop"
lip_data<-lipids[,4:20]
# abbreviations: MU = mono-unsaturated, SA = saturated, nb = non-branched, b = branched
# SA_branprop = proportion of branched and saturated FAs
# theory: unsaturated and branched FAs increase fluidity of membrane
# test (i) adaptation vs. (ii) acclimatization to temperature by changing FA composition of membranes

isolate<-factor(lipids$isolate)
temperature<-factor(lipids$temperature)
combifac<-factor(paste(temperature,"_",isolate,sep=""))

## distance matrix: Bray-Curtis distance with proportional data
lipids_distBC<-vegdist(lip_data, method="bray")

# using the same distance matrix as cluster analysis 
pcoa<-cmdscale(lipids_distBC,k=2,eig=TRUE,add=TRUE)
# argument add=TRUE means a constant is added to distances to avoid negative eigenvalues

cumsum(pcoa$eig/sum(pcoa$eig)) # contributions of various PCoA axes
##  [1] 0.5108797 0.6713554 0.7243661 0.7626502 0.7943804 0.8232193 0.8499307
##  [8] 0.8685434 0.8850589 0.8969014 0.9077931 0.9172892 0.9256833 0.9338634
## [15] 0.9419715 0.9496211 0.9562653 0.9622121 0.9677456 0.9731763 0.9782572
## [22] 0.9829655 0.9875741 0.9920859 0.9960663 1.0000000 1.0000000 1.0000000
# first two axes cover 67% of variation of distances

pcoa$points # the site scores (coordinates in reduced space)
##               [,1]         [,2]
##  [1,] -0.039034416 -0.063039145
##  [2,] -0.052535090 -0.034222468
##  [3,] -0.027399434 -0.087663904
##  [4,]  0.009878077 -0.004018133
##  [5,]  0.212039581  0.030589392
##  [6,]  0.216084698  0.020983290
##  [7,]  0.259894649  0.051819890
##  [8,] -0.165391223  0.116835950
##  [9,] -0.150166208  0.118147706
## [10,] -0.185980907  0.073277130
## [11,] -0.128185960  0.114502815
## [12,] -0.163993524  0.117098404
## [13,] -0.149963125  0.117191012
## [14,]  0.219431512  0.064195917
## [15,]  0.210959910  0.057130086
## [16,]  0.251459219  0.070724865
## [17,] -0.040185127 -0.182894184
## [18,] -0.034791299 -0.174443966
## [19,] -0.036014921 -0.211286639
## [20,] -0.012082138 -0.103177494
## [21,]  0.160218882 -0.026898255
## [22,] -0.202366734 -0.008237486
## [23,] -0.198633241 -0.005774575
## [24,] -0.198199603 -0.068765798
## [25,] -0.194183841  0.040168498
## [26,]  0.233079249  0.023071464
## [27,]  0.078445944 -0.018582519
## [28,]  0.127615070 -0.026731854

col.isolate<-isolate
levels(col.isolate)<-c("white","red")
col.isolate<-as.character(col.isolate)

pch.temperature<-as.numeric(as.character(temperature))
pch.temperature[pch.temperature==6]<-21
pch.temperature[pch.temperature==28]<-23

plot(pcoa$points,pch=pch.temperature,bg=col.isolate)
legend("topleft",pch=c(21,21,23,23),pt.bg=c("white","red","white","red"),
       legend=c("6°C - warm isolate","6°C - cold isolate","28°C - warm isolate","28°C - cold isolate"),cex=0.6)

# how to relate species (=fatty acids) to ordination?
wascores(pcoa$points,lip_data) # as weighted averages of site (=sample) scores
text(wascores(pcoa$points,lip_data),labels=names(lip_data),cex=0.5)

ordisurf(pcoa$points,lip_data$FA7_MU,col="darkgreen",add=TRUE) # as contourplot

plot(envfit(pcoa$points,lip_data)) # take care: behaviour of species not necessarily monotonous in ordination space

Canonical analysis of principal coordinates (CAP)

Following scheme taken from description of db-RDA. Essentially 3 steps:

Computation of a (square) dissimilarity matrix D from (Cartesian) raw data. Choose coefficient well!
PCoA based on D recreates Cartesian coordinates with a dimensionality imposed by n (or p if p<n). If D was semimetric, (minor) axes with negative eigenvalues may occur.
RDA on the PCoA-axes and constraints of choice (factors/dummy variables, continuous predictors). Instead of RDA other constrained methods are possible as well, e.g. discriminant analysis, other classification routines, etc.

Canonical analysis of principal coordinates (CAP)

cap<-capscale(lipids_distBC~isolate*temperature)
summary(cap)
## 
## Call:
## capscale(formula = lipids_distBC ~ isolate * temperature) 
## 
## Partitioning of squared Bray distance:
##               Inertia Proportion
## Total          0.7018     1.0000
## Constrained    0.4670     0.6654
## Unconstrained  0.2348     0.3346
## 
## Eigenvalues, and their contribution to the squared Bray distance 
## 
## Importance of components:
##                         CAP1    CAP2     CAP3   MDS1    MDS2    MDS3    MDS4
## Eigenvalue            0.4027 0.05493 0.009286 0.1233 0.03707 0.02196 0.01708
## Proportion Explained  0.5739 0.07827 0.013231 0.1756 0.05283 0.03129 0.02434
## Cumulative Proportion 0.5739 0.65214 0.665369 0.8410 0.89384 0.92514 0.94947
##                          MDS5     MDS6     MDS7     MDS8     MDS9    MDS10
## Eigenvalue            0.01462 0.007081 0.004262 0.002948 0.001993 0.001796
## Proportion Explained  0.02083 0.010090 0.006073 0.004200 0.002840 0.002558
## Cumulative Proportion 0.97030 0.980392 0.986465 0.990665 0.993505 0.996063
##                          MDS11     MDS12     MDS13     MDS14     MDS15
## Eigenvalue            0.001280 0.0008798 0.0003408 0.0002224 3.548e-05
## Proportion Explained  0.001824 0.0012536 0.0004856 0.0003169 5.055e-05
## Cumulative Proportion 0.997887 0.9991404 0.9996260 0.9999429 1.000e+00
##                           MDS16
## Eigenvalue            4.604e-06
## Proportion Explained  6.561e-06
## Cumulative Proportion 1.000e+00
## 
## Accumulated constrained eigenvalues
## Importance of components:
##                         CAP1    CAP2     CAP3
## Eigenvalue            0.4027 0.05493 0.009286
## Proportion Explained  0.8625 0.11764 0.019885
## Cumulative Proportion 0.8625 0.98011 1.000000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  2.055196 
## 
## 
## Species scores
## 
##       CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## Dim1                               
## Dim2                               
## Dim3                               
## Dim4                               
## Dim5                               
## Dim6                               
## Dim7                               
## Dim8                               
## Dim9                               
## Dim10                              
## Dim11                              
## Dim12                              
## Dim13                              
## Dim14                              
## Dim15                              
## Dim16                              
## 
## 
## Site scores (weighted sums of species scores)
## 
##         CAP1      CAP2      CAP3      MDS1      MDS2     MDS3
## 1  -0.118345  0.175455 -0.047145 -0.506299 -0.423715  0.09533
## 2  -0.140838  0.003717 -0.302707 -0.387221 -0.320024  0.19578
## 3  -0.103306  0.327975  0.270279 -0.628190 -0.318823 -0.24296
## 4   0.008757 -0.173947  0.898199 -0.589237  0.789919 -0.20068
## 5   0.541369 -0.065379  0.560416 -0.024970  0.049197 -0.32037
## 6   0.543308 -0.004912  0.439088 -0.109438  0.269108 -0.18813
## 7   0.685271 -0.125175  0.720434 -0.172794  0.505845 -0.26832
## 8  -0.368405 -0.840786 -0.581799  0.409126  0.004382 -0.02050
## 9  -0.326275 -0.854491 -0.118938  0.345743  0.065634  0.17900
## 10 -0.447862 -0.569762 -0.479908  0.361270 -0.132091 -0.31818
## 11 -0.271896 -0.855062  0.008803  0.250268  0.264949  0.14661
## 12 -0.363883 -0.842636 -0.568017  0.403435  0.006882 -0.02042
## 13 -0.326080 -0.847913 -0.109894  0.341106  0.062888  0.18602
## 14  0.583087 -0.105603 -0.017661  0.150584 -0.402423  0.38119
## 15  0.552282 -0.099067  0.201860  0.148800 -0.333354  0.02658
## 16  0.682659 -0.146090  0.692094  0.007818 -0.088373  0.36904
## 17 -0.173766  1.053027  0.318616 -0.491137 -0.268441  0.04607
## 18 -0.155959  0.986716  0.683869 -0.490135 -0.299839  0.25781
## 19 -0.184776  1.353849  0.619417 -0.607413 -0.356310 -0.53605
## 20 -0.072462  0.578351  0.794922 -0.392980  0.560962  0.11002
## 21  0.406687  0.358960 -0.891817 -0.118231 -0.441071  1.28428
## 22 -0.526047 -0.062290 -0.245361  0.511732  0.101190  0.06100
## 23 -0.513488 -0.064273 -0.313550  0.508564  0.124313  0.07243
## 24 -0.553768  0.299526  0.216670  0.371002 -0.195805 -0.26911
## 25 -0.482668 -0.375509 -0.390361  0.590368  0.333931  0.25783
## 26  0.614101  0.181112 -0.844084 -0.305500  1.133393  0.20914
## 27  0.194381  0.284271 -0.578667  0.313382 -0.272787 -0.93446
## 28  0.317922  0.429937 -0.934759  0.110348 -0.419536 -0.55896
## 
## 
## Site constraints (linear combinations of constraining variables)
## 
##       CAP1     CAP2    CAP3      MDS1      MDS2     MDS3
## 1  -0.2458 -0.44775 -0.1031 -0.506299 -0.423715  0.09533
## 2  -0.2458 -0.44775 -0.1031 -0.387221 -0.320024  0.19578
## 3  -0.2458 -0.44775 -0.1031 -0.628190 -0.318823 -0.24296
## 4  -0.2458 -0.44775 -0.1031 -0.589237  0.789919 -0.20068
## 5   0.5980 -0.09104  0.4327 -0.024970  0.049197 -0.32037
## 6   0.5980 -0.09104  0.4327 -0.109438  0.269108 -0.18813
## 7   0.5980 -0.09104  0.4327 -0.172794  0.505845 -0.26832
## 8  -0.2458 -0.44775 -0.1031  0.409126  0.004382 -0.02050
## 9  -0.2458 -0.44775 -0.1031  0.345743  0.065634  0.17900
## 10 -0.2458 -0.44775 -0.1031  0.361270 -0.132091 -0.31818
## 11 -0.2458 -0.44775 -0.1031  0.250268  0.264949  0.14661
## 12 -0.2458 -0.44775 -0.1031  0.403435  0.006882 -0.02042
## 13 -0.2458 -0.44775 -0.1031  0.341106  0.062888  0.18602
## 14  0.5980 -0.09104  0.4327  0.150584 -0.402423  0.38119
## 15  0.5980 -0.09104  0.4327  0.148800 -0.333354  0.02658
## 16  0.5980 -0.09104  0.4327  0.007818 -0.088373  0.36904
## 17 -0.3329  0.47117  0.2105 -0.491137 -0.268441  0.04607
## 18 -0.3329  0.47117  0.2105 -0.490135 -0.299839  0.25781
## 19 -0.3329  0.47117  0.2105 -0.607413 -0.356310 -0.53605
## 20 -0.3329  0.47117  0.2105 -0.392980  0.560962  0.11002
## 21  0.3833  0.31357 -0.8123 -0.118231 -0.441071  1.28428
## 22 -0.3329  0.47117  0.2105  0.511732  0.101190  0.06100
## 23 -0.3329  0.47117  0.2105  0.508564  0.124313  0.07243
## 24 -0.3329  0.47117  0.2105  0.371002 -0.195805 -0.26911
## 25 -0.3329  0.47117  0.2105  0.590368  0.333931  0.25783
## 26  0.3833  0.31357 -0.8123 -0.305500  1.133393  0.20914
## 27  0.3833  0.31357 -0.8123  0.313382 -0.272787 -0.93446
## 28  0.3833  0.31357 -0.8123  0.110348 -0.419536 -0.55896
## 
## 
## Biplot scores for constraining variables
## 
##                             CAP1    CAP2    CAP3 MDS1 MDS2 MDS3
## isolatewarm               0.2099 -0.9335  0.2908    0    0    0
## temperature28             0.9828  0.1359 -0.1253    0    0    0
## isolatewarm:temperature28 0.8041 -0.1224  0.5818    0    0    0
## 
## 
## Centroids for factor constraints
## 
##                   CAP1     CAP2     CAP3 MDS1 MDS2 MDS3
## isolatecold   -0.09415  0.41864 -0.13043    0    0    0
## isolatewarm    0.07062 -0.31398  0.09782    0    0    0
## temperature6  -0.28450 -0.03934  0.03628    0    0    0
## temperature28  0.51211  0.07081 -0.06531    0    0    0

anova(cap,by="axis",model="direct",perm.max=9999,step=1000)
## Permutation test for capscale under direct model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
## 
## Model: capscale(formula = lipids_distBC ~ isolate * temperature)
##          Df SumOfSqs       F Pr(>F)    
## CAP1      1  0.40274 41.1581  0.001 ***
## CAP2      1  0.05493  5.6138  0.029 *  
## CAP3      1  0.00929  0.9489  0.384    
## Residual 24  0.23484                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(cap,by="terms",model="direct",perm.max=9999,step=1000)
## Permutation test for capscale under direct model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## Model: capscale(formula = lipids_distBC ~ isolate * temperature)
##                     Df SumOfSqs       F Pr(>F)    
## isolate              1  0.06640  6.7859  0.007 ** 
## temperature          1  0.38445 39.2889  0.001 ***
## isolate:temperature  1  0.01611  1.6460  0.191    
## Residual            24  0.23484                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# all functions more or less taken from RDA
# e.g. to get site scores for plotting
#scores(cap,display="sites")[,1:2]

Introducing ordination: distance-based methods

Methods based on distance/dissimilarity/similarity

Why use dissimilarity-based methods?

(Dis)similarity coefficients

(Dis)similarity coefficients

(Dis)similarity coefficients: some properties

Using dissimilarity: Non-metric multidimensional scaling (NMDS)

Non-metric multidimensional scaling (NMDS) in R

Hypothesis tests in the distance world

Permutational MANOVA (PERMANOVA)

Permutational MANOVA (PERMANOVA) in R

Cluster analysis

Using dissimilarity: Principal Coordinate Analysis (PCoA)

Canonical analysis of principal coordinates (CAP)

Canonical analysis of principal coordinates (CAP)

Choice of the multivariate analysis