Thomas Fuß
WS24/25
Pairs of samples/sites/objects can be more or less similar to each other based on multiple variables.
Two samples, which contain the same species with the same abundances (or exactly same environmental conditions), have the highest similarity (and lowest dissimilarity or distance); the similarity decreases (and dissimilarity/distance increases) with the differences in their species composition (or environmental conditions).
The most known dissimilarity is a (physical) distance: Euclidean distance, in a 2D-coordinate system it is just computed based on Pythagoras:
\[ d_{Euc}=\sqrt{\Delta{x_1}^2+\Delta{x_2}^2} \]
Euclidean distance computed between any two sites for \(p\) variables:
\[ d_{Euc}=\sqrt{\Delta{x_1}^2+\Delta{x_2}^2+...+\Delta{x_p}^2} \]
Two examples of dissimilarity indices here computed between any two sites for \(p\) variables:
Euclidean distance
\[ d_{Euc}=\sqrt{\Delta{x_1}^2+\Delta{x_2}^2+...+\Delta{x_p}^2} \] where \(\Delta{x}\) is the distance between two sites along any \(X\)-variable.
Binary version: \[
d_{Euc}=\sqrt{A+B-2J}
\] A and B are numbers of species at compared sites, and J is the
number of species that occur on both compared sites.
Bray-Curtis dissimilarity \[
d_{BC}=\frac{\sum_{j=1}^p{|x_{1j}-x_{2j}|}}{\sum_{j=1}^p{(x_{1j}+x_{2j})}}
\]
Binary version: \[
d_{BC}=\frac{A+B-2J}{A+B}
\]
Double-zero (joint absences
for two sites) problem: The fact that a species is missing does not say
anything about ecological similarity or difference between both
sites.
Double-zero can mean:
Sites are located outside of the species niche on the same side (sites A and B), which means sites are similar, or on opposite sites (A and C), which means sites are different.
Sites are located inside species niche (D and E), but the species does not occur because it didn’t get there (dispersal limitation), or it was present, but overlooked (sampling bias)
Bray-Curtis is well-suited for data on species composition because it ignores double-zeros.
Euclidean distance is useful for environmental data and especially for representing geographic space (from x/y coordinates), but rarely for species data.
\[ D_{(a,b)}+D_{(b,c)}\geq{D_{(a,c)}} \] * semimetric: do not follow the triangle inequality
\[ D_{(a,b)}+D_{(b,c)}<{D_{(a,c)}} \] … in which case perfect projections into Euclidean space are not possible. Two often used transformations to make semimetric coefficients analyzable in Euclidean space are adding a constant or \(\sqrt(D)\).
Triangle inequality: Distance via an intermediate point must be higher than or equal to distance without such an intermediate point. E.g. consider B as intermediate between A and C as in a triangle: the distance from A to C via B must be larger than the direct distance from A to C.
The goal: Collapse information from multiple dimensions into just a few, so they can be visualized and interpreted.
Iterative procedure:
# We sampled stream biofilm at 42 different streams across the Vjosa river network. Algal species were derived by 18S rRNA gene amplification
vjosa<-read.csv(file="data/data_vjosa.csv", sep=";") #load data
dim(vjosa) #
## [1] 42 662
com<-vjosa[,grep("ASV",names(vjosa))] # choose only ASV columns, these are the species
env<-vjosa[,-grep("ASV",names(vjosa))] # choose everything expect ASV
# We try to find the "best" reproduction of dissimilarities among sites in a 2 dimensional space.
# We use non-metric multidimensional scaling on Bray-Curtis dissimilarity matrix.
mds_vjosa <- metaMDS(comm = com, distance = "bray", k = 2) # to run a NMDS, $points to get scores, $stress to get information about fit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1742853
## Run 1 stress 0.1853766
## Run 2 stress 0.1742853
## ... New best solution
## ... Procrustes: rmse 2.64764e-05 max resid 0.0001220507
## ... Similar to previous best
## Run 3 stress 0.1555025
## ... New best solution
## ... Procrustes: rmse 0.07805432 max resid 0.3845205
## Run 4 stress 0.1720438
## Run 5 stress 0.1572333
## Run 6 stress 0.1555025
## ... New best solution
## ... Procrustes: rmse 2.876438e-06 max resid 1.029209e-05
## ... Similar to previous best
## Run 7 stress 0.1555026
## ... Procrustes: rmse 0.0001253696 max resid 0.0004936047
## ... Similar to previous best
## Run 8 stress 0.1555025
## ... New best solution
## ... Procrustes: rmse 2.630782e-05 max resid 9.127394e-05
## ... Similar to previous best
## Run 9 stress 0.1624049
## Run 10 stress 0.1637225
## Run 11 stress 0.1555025
## ... Procrustes: rmse 5.632574e-05 max resid 0.000213458
## ... Similar to previous best
## Run 12 stress 0.1624049
## Run 13 stress 0.1833159
## Run 14 stress 0.1555026
## ... Procrustes: rmse 5.176445e-05 max resid 0.0002051233
## ... Similar to previous best
## Run 15 stress 0.1555025
## ... New best solution
## ... Procrustes: rmse 9.128518e-06 max resid 3.578814e-05
## ... Similar to previous best
## Run 16 stress 0.1555025
## ... Procrustes: rmse 2.029508e-05 max resid 6.963497e-05
## ... Similar to previous best
## Run 17 stress 0.1555025
## ... Procrustes: rmse 2.84255e-05 max resid 9.313011e-05
## ... Similar to previous best
## Run 18 stress 0.1979301
## Run 19 stress 0.1894487
## Run 20 stress 0.2024924
## *** Best solution repeated 3 times
mds_vjosa$stress
## [1] 0.1555025
## % of dissimilarities unrepresented
## Generally, stress < 0.05 provides an excellent represention in reduced
# dimensions, < 0.1 is great, < 0.2 is good, and stress > 0.3 provides a poor representation
# if high stress is a problem, the argument trymax (number of default iterations) or number of dimensions (e.g. k=3) can be increased
stressplot(mds_vjosa) # to compare configuration distances with observed dissimilarities
goodness(mds_vjosa) # sample-specific goodness of fit
## [1] 0.01797343 0.02117174 0.03075435 0.02266354 0.01875799 0.03030840
## [7] 0.01756087 0.02204496 0.02533707 0.01858164 0.02069291 0.02263214
## [13] 0.02771618 0.02030960 0.01414603 0.01925585 0.03355964 0.01377055
## [19] 0.01551199 0.03023133 0.01754644 0.01956208 0.01845611 0.01855135
## [25] 0.03660484 0.02693934 0.03854815 0.02876442 0.02292746 0.04194374
## [31] 0.02756328 0.01684616 0.02297853 0.03021929 0.02105799 0.02601296
## [37] 0.01712468 0.02016921 0.01700964 0.01543156 0.01517806 0.02517505
#quick plot
plot(mds_vjosa, type="t")
#Do different sizes of streams (approximated by catchment area) have different communities?
ordisurf(mds_vjosa, env$catchment)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
##
## Estimated degrees of freedom:
## 3.9 total = 4.9
##
## REML score: 367.5864
#customized plot
#use catchment_class as point size in plot
cex.catch <- as.factor(env$catchment_class)
levels(cex.catch)<-c(3,2,1)
cex.catch<-as.numeric(as.character(cex.catch))
plot(mds_vjosa$points,asp=1,cex=cex.catch, xlab="MDS dimension 1",ylab="MDS dimension 2")
#connect catchment classes in plot
ordihull(mds_vjosa, env$catchment_class)
#alternative: ordispider(mds_vjosa, env$catchment_class)
legend("topleft", pch=21, pt.cex=c(1,2,3), legend=c("small","medium", "large"))
All known study designs with factors or continuous predictors may be transferred to the distance domain.
PERMANOVA: An ANOVA-type of analysis testing effects of factors
(also in interaction) on a multivariate matrix, thus MANOVA. Useful to
test effects on composition. In the distance domain known as
permutational MANOVA. In R implemented as
vegan::adonis
.
dbRDA (distance-based RDA) and CAP (canonical analysis of
principal coordinates): multiple-step analysis to test for effects of
continuous predictors on a dissimilarity matrix. In R
vegan::capscale
.
Mantel-test: An old-school test for correlation between two
distance/dissimilarity matrices (e.g. one describing environment or
physical distance, the other describing community turnover or genetic
differentiation). In R vegan::mantel
.
Uses a test statistic based on distances within groups (to a group centroid or averaged among all pairs) versus distances from group centroids to the overall centroid.
Prerequisite similar to ANOVA: homogeneous dispersion (multivariate variance, cloud shape). Tested using within-group distances to centroids.
# PERMANOVA - non-parametric permutational MANOVA #
# a multivariate hypothesis test:
# 1 factor (catchment class) and 1 multivariate response = "community composition"
# compute a distance matrix:
dmat = vegdist(com, method = "bray") # compute a dissimilarity matrix
# first testing for homogeneity of dispersion (homogeneous distances to group centroids)
catch_fac<-as.factor(env$catchment_class)
disp.check<-betadisper(dmat, catch_fac)
disp.check$distances
## [1] 0.2662608 0.5158472 0.5197804 0.6105727 0.2552048 0.7176511 0.2110770
## [8] 0.7107280 0.5253533 0.1690616 0.2339827 0.3725949 0.7270872 0.4702762
## [15] 0.4190021 0.2343220 0.5979947 0.4018676 0.4552224 0.7038676 0.4321580
## [22] 0.5239102 0.5041751 0.6330266 0.7008383 0.7711862 0.7500328 0.6306801
## [29] 0.4179180 0.5134026 0.7289303 0.4089514 0.6384028 0.7877881 0.6125022
## [36] 0.4844181 0.3890526 0.5498081 0.3591060 0.3516448 0.3929632 0.5724630
boxplot(disp.check$distances~catch_fac)
# or directly: boxplot(disp.check)
anova(lm(disp.check$distances~catch_fac))
## Analysis of Variance Table
##
## Response: disp.check$distances
## Df Sum Sq Mean Sq F value Pr(>F)
## catch_fac 2 0.28973 0.144865 6.4718 0.00374 **
## Residuals 39 0.87297 0.022384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#or directly on anova(disp.check) #also works directly on disp.check
# the actual PERMANOVA would work like this, however, we do not have homogeneous dispersion...
adonis2(dmat~catch_fac)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = dmat ~ catch_fac)
## Df SumOfSqs R2 F Pr(>F)
## catch_fac 2 1.0397 0.08149 1.73 0.026 *
## Residual 39 11.7190 0.91851
## Total 41 12.7587 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#... however, we found another interesting result, beta diversity among small stream is larger then among medium stream and large streams.
Hierarchical group-forming (clustering) based on pairwise (dis)similarity.
Three types:
Various “linkage rules” to group clusters in agglomerative clustering:
lipids<-read.table(file="data/BacterialMembrane.txt",header=TRUE)
names(lipids)
## [1] "case" "isolate" "temperature" "FA1_SAnb" "FA2_MU"
## [6] "FA3_SAnb" "FA4_SAb" "FA5_SAb" "FA6_SAnb" "FA7_MU"
## [11] "FA8_SAb" "FA9_SAnb" "FA10_MU" "FA11_SAb" "FA12_SAnb"
## [16] "FA13_MUb" "FA14_MU" "FA15_SAnb" "FA16_MU" "FA17_MU"
## [21] "sum_MU" "sum_SA" "sum_SAbran" "sum_SAnbran" "SA_branprop"
lip_data<-lipids[,4:20]
# abbreviations: MU = mono-unsaturated, SA = saturated, nb = non-branched, b = branched
# SA_branprop = proportion of branched and saturated FAs
# theory: unsaturated and branched FAs increase fluidity of membrane
# test (i) adaptation vs. (ii) acclimatization to temperature by changing FA composition of membranes
isolate<-factor(lipids$isolate)
temperature<-factor(lipids$temperature)
combifac<-factor(paste(temperature,"_",isolate,sep=""))
## distance matrix: Euclidean distance on arcsine-sqr-data
as_lip_data<-asin(sqrt(lip_data)) # very old-school, better don´t do ;-)
lipids_distE<-vegdist(as_lip_data, method="euclidean")
## alternatively: Bray-Curtis distance with proportional data
lipids_distBC<-vegdist(lip_data, method="bray")
# cluster analysis #
# various agglomeration methods available and the choice is important, explore method=
# "single": nearest neighbour counts (good for gradients, but makes chains)
# "complete": all group members must be close, farthest group member counts (makes small spheric groups, good to find outliers)
# "average": compromise average strategy, new member joins at mean distance to all group members, actually UPGMA
# "ward.D": aims at minimizing within-group sums of squares of distances
lipids_cluster<-hclust(lipids_distBC, method = "ward.D")
lipids_cluster$height<-sqrt(lipids_cluster$height) # may help
# then compare effect of agglomeration method on dendrogram
plot(lipids_cluster, hang = -1, labels = combifac, ylab = "BC")
# which agglomeration method (and thus dendrogram) is best?
#cophenetic(lipids_cluster) # linkage distances in dendrogram
plot(lipids_distBC,cophenetic(lipids_cluster))
cor(lipids_distBC,cophenetic(lipids_cluster),method="spearman")
## [1] 0.8600969
cutree(lipids_cluster,k=4)
## [1] 1 1 1 1 2 2 2 3 3 3 3 3 3 2 2 2 1 1 1 1 4 3 3 3 3 2 4 4
cutree(lipids_cluster,h=0.05)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 9 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27
plot(lipids_cluster, hang = -1, labels = combifac, ylab = "BC")
rect.hclust(lipids_cluster,k=4) # or specify height h instead of k
(aka metric multidimensional scaling)
PCoA = distance-based unconstrained ordination
The goal of PCoA is to project sites onto Cartesian (Euclidean) coordinates from pairwise dissimilarities.
Imagine a road distance matrix between major towns. A 2D-projection of all pairwise distances produces a map (which gets better with less mountains and straighter roads in the area).
Steps in PCoA:
Place the first point (sample site) at the origin of a coordinate system.
Add the second point the correct distance away from the first point along the first axis.
Position the third point the correct distance away from each of the first two points, adding a second axis if necessary.
Continue until all points have been added. The result is a set of no more than n-1 axes.
But how do we get back down to fewer (e.g. 2) dimensions? Well, simply do a PCA on these constructed points.
lipids<-read.table(file="data/BacterialMembrane.txt",header=TRUE)
names(lipids)
## [1] "case" "isolate" "temperature" "FA1_SAnb" "FA2_MU"
## [6] "FA3_SAnb" "FA4_SAb" "FA5_SAb" "FA6_SAnb" "FA7_MU"
## [11] "FA8_SAb" "FA9_SAnb" "FA10_MU" "FA11_SAb" "FA12_SAnb"
## [16] "FA13_MUb" "FA14_MU" "FA15_SAnb" "FA16_MU" "FA17_MU"
## [21] "sum_MU" "sum_SA" "sum_SAbran" "sum_SAnbran" "SA_branprop"
lip_data<-lipids[,4:20]
# abbreviations: MU = mono-unsaturated, SA = saturated, nb = non-branched, b = branched
# SA_branprop = proportion of branched and saturated FAs
# theory: unsaturated and branched FAs increase fluidity of membrane
# test (i) adaptation vs. (ii) acclimatization to temperature by changing FA composition of membranes
isolate<-factor(lipids$isolate)
temperature<-factor(lipids$temperature)
combifac<-factor(paste(temperature,"_",isolate,sep=""))
## distance matrix: Bray-Curtis distance with proportional data
lipids_distBC<-vegdist(lip_data, method="bray")
# using the same distance matrix as cluster analysis
pcoa<-cmdscale(lipids_distBC,k=2,eig=TRUE,add=TRUE)
# argument add=TRUE means a constant is added to distances to avoid negative eigenvalues
cumsum(pcoa$eig/sum(pcoa$eig)) # contributions of various PCoA axes
## [1] 0.5108797 0.6713554 0.7243661 0.7626502 0.7943804 0.8232193 0.8499307
## [8] 0.8685434 0.8850589 0.8969014 0.9077931 0.9172892 0.9256833 0.9338634
## [15] 0.9419715 0.9496211 0.9562653 0.9622121 0.9677456 0.9731763 0.9782572
## [22] 0.9829655 0.9875741 0.9920859 0.9960663 1.0000000 1.0000000 1.0000000
# first two axes cover 67% of variation of distances
pcoa$points # the site scores (coordinates in reduced space)
## [,1] [,2]
## [1,] -0.039034416 -0.063039145
## [2,] -0.052535090 -0.034222468
## [3,] -0.027399434 -0.087663904
## [4,] 0.009878077 -0.004018133
## [5,] 0.212039581 0.030589392
## [6,] 0.216084698 0.020983290
## [7,] 0.259894649 0.051819890
## [8,] -0.165391223 0.116835950
## [9,] -0.150166208 0.118147706
## [10,] -0.185980907 0.073277130
## [11,] -0.128185960 0.114502815
## [12,] -0.163993524 0.117098404
## [13,] -0.149963125 0.117191012
## [14,] 0.219431512 0.064195917
## [15,] 0.210959910 0.057130086
## [16,] 0.251459219 0.070724865
## [17,] -0.040185127 -0.182894184
## [18,] -0.034791299 -0.174443966
## [19,] -0.036014921 -0.211286639
## [20,] -0.012082138 -0.103177494
## [21,] 0.160218882 -0.026898255
## [22,] -0.202366734 -0.008237486
## [23,] -0.198633241 -0.005774575
## [24,] -0.198199603 -0.068765798
## [25,] -0.194183841 0.040168498
## [26,] 0.233079249 0.023071464
## [27,] 0.078445944 -0.018582519
## [28,] 0.127615070 -0.026731854
col.isolate<-isolate
levels(col.isolate)<-c("white","red")
col.isolate<-as.character(col.isolate)
pch.temperature<-as.numeric(as.character(temperature))
pch.temperature[pch.temperature==6]<-21
pch.temperature[pch.temperature==28]<-23
plot(pcoa$points,pch=pch.temperature,bg=col.isolate)
legend("topleft",pch=c(21,21,23,23),pt.bg=c("white","red","white","red"),
legend=c("6°C - warm isolate","6°C - cold isolate","28°C - warm isolate","28°C - cold isolate"),cex=0.6)
# how to relate species (=fatty acids) to ordination?
wascores(pcoa$points,lip_data) # as weighted averages of site (=sample) scores
text(wascores(pcoa$points,lip_data),labels=names(lip_data),cex=0.5)
ordisurf(pcoa$points,lip_data$FA7_MU,col="darkgreen",add=TRUE) # as contourplot
plot(envfit(pcoa$points,lip_data)) # take care: behaviour of species not necessarily monotonous in ordination space
Following scheme taken from description of db-RDA. Essentially 3 steps:
cap<-capscale(lipids_distBC~isolate*temperature)
summary(cap)
##
## Call:
## capscale(formula = lipids_distBC ~ isolate * temperature)
##
## Partitioning of squared Bray distance:
## Inertia Proportion
## Total 0.7018 1.0000
## Constrained 0.4670 0.6654
## Unconstrained 0.2348 0.3346
##
## Eigenvalues, and their contribution to the squared Bray distance
##
## Importance of components:
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3 MDS4
## Eigenvalue 0.4027 0.05493 0.009286 0.1233 0.03707 0.02196 0.01708
## Proportion Explained 0.5739 0.07827 0.013231 0.1756 0.05283 0.03129 0.02434
## Cumulative Proportion 0.5739 0.65214 0.665369 0.8410 0.89384 0.92514 0.94947
## MDS5 MDS6 MDS7 MDS8 MDS9 MDS10
## Eigenvalue 0.01462 0.007081 0.004262 0.002948 0.001993 0.001796
## Proportion Explained 0.02083 0.010090 0.006073 0.004200 0.002840 0.002558
## Cumulative Proportion 0.97030 0.980392 0.986465 0.990665 0.993505 0.996063
## MDS11 MDS12 MDS13 MDS14 MDS15
## Eigenvalue 0.001280 0.0008798 0.0003408 0.0002224 3.548e-05
## Proportion Explained 0.001824 0.0012536 0.0004856 0.0003169 5.055e-05
## Cumulative Proportion 0.997887 0.9991404 0.9996260 0.9999429 1.000e+00
## MDS16
## Eigenvalue 4.604e-06
## Proportion Explained 6.561e-06
## Cumulative Proportion 1.000e+00
##
## Accumulated constrained eigenvalues
## Importance of components:
## CAP1 CAP2 CAP3
## Eigenvalue 0.4027 0.05493 0.009286
## Proportion Explained 0.8625 0.11764 0.019885
## Cumulative Proportion 0.8625 0.98011 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 2.055196
##
##
## Species scores
##
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## Dim1
## Dim2
## Dim3
## Dim4
## Dim5
## Dim6
## Dim7
## Dim8
## Dim9
## Dim10
## Dim11
## Dim12
## Dim13
## Dim14
## Dim15
## Dim16
##
##
## Site scores (weighted sums of species scores)
##
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## 1 -0.118345 0.175455 -0.047145 -0.506299 -0.423715 0.09533
## 2 -0.140838 0.003717 -0.302707 -0.387221 -0.320024 0.19578
## 3 -0.103306 0.327975 0.270279 -0.628190 -0.318823 -0.24296
## 4 0.008757 -0.173947 0.898199 -0.589237 0.789919 -0.20068
## 5 0.541369 -0.065379 0.560416 -0.024970 0.049197 -0.32037
## 6 0.543308 -0.004912 0.439088 -0.109438 0.269108 -0.18813
## 7 0.685271 -0.125175 0.720434 -0.172794 0.505845 -0.26832
## 8 -0.368405 -0.840786 -0.581799 0.409126 0.004382 -0.02050
## 9 -0.326275 -0.854491 -0.118938 0.345743 0.065634 0.17900
## 10 -0.447862 -0.569762 -0.479908 0.361270 -0.132091 -0.31818
## 11 -0.271896 -0.855062 0.008803 0.250268 0.264949 0.14661
## 12 -0.363883 -0.842636 -0.568017 0.403435 0.006882 -0.02042
## 13 -0.326080 -0.847913 -0.109894 0.341106 0.062888 0.18602
## 14 0.583087 -0.105603 -0.017661 0.150584 -0.402423 0.38119
## 15 0.552282 -0.099067 0.201860 0.148800 -0.333354 0.02658
## 16 0.682659 -0.146090 0.692094 0.007818 -0.088373 0.36904
## 17 -0.173766 1.053027 0.318616 -0.491137 -0.268441 0.04607
## 18 -0.155959 0.986716 0.683869 -0.490135 -0.299839 0.25781
## 19 -0.184776 1.353849 0.619417 -0.607413 -0.356310 -0.53605
## 20 -0.072462 0.578351 0.794922 -0.392980 0.560962 0.11002
## 21 0.406687 0.358960 -0.891817 -0.118231 -0.441071 1.28428
## 22 -0.526047 -0.062290 -0.245361 0.511732 0.101190 0.06100
## 23 -0.513488 -0.064273 -0.313550 0.508564 0.124313 0.07243
## 24 -0.553768 0.299526 0.216670 0.371002 -0.195805 -0.26911
## 25 -0.482668 -0.375509 -0.390361 0.590368 0.333931 0.25783
## 26 0.614101 0.181112 -0.844084 -0.305500 1.133393 0.20914
## 27 0.194381 0.284271 -0.578667 0.313382 -0.272787 -0.93446
## 28 0.317922 0.429937 -0.934759 0.110348 -0.419536 -0.55896
##
##
## Site constraints (linear combinations of constraining variables)
##
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## 1 -0.2458 -0.44775 -0.1031 -0.506299 -0.423715 0.09533
## 2 -0.2458 -0.44775 -0.1031 -0.387221 -0.320024 0.19578
## 3 -0.2458 -0.44775 -0.1031 -0.628190 -0.318823 -0.24296
## 4 -0.2458 -0.44775 -0.1031 -0.589237 0.789919 -0.20068
## 5 0.5980 -0.09104 0.4327 -0.024970 0.049197 -0.32037
## 6 0.5980 -0.09104 0.4327 -0.109438 0.269108 -0.18813
## 7 0.5980 -0.09104 0.4327 -0.172794 0.505845 -0.26832
## 8 -0.2458 -0.44775 -0.1031 0.409126 0.004382 -0.02050
## 9 -0.2458 -0.44775 -0.1031 0.345743 0.065634 0.17900
## 10 -0.2458 -0.44775 -0.1031 0.361270 -0.132091 -0.31818
## 11 -0.2458 -0.44775 -0.1031 0.250268 0.264949 0.14661
## 12 -0.2458 -0.44775 -0.1031 0.403435 0.006882 -0.02042
## 13 -0.2458 -0.44775 -0.1031 0.341106 0.062888 0.18602
## 14 0.5980 -0.09104 0.4327 0.150584 -0.402423 0.38119
## 15 0.5980 -0.09104 0.4327 0.148800 -0.333354 0.02658
## 16 0.5980 -0.09104 0.4327 0.007818 -0.088373 0.36904
## 17 -0.3329 0.47117 0.2105 -0.491137 -0.268441 0.04607
## 18 -0.3329 0.47117 0.2105 -0.490135 -0.299839 0.25781
## 19 -0.3329 0.47117 0.2105 -0.607413 -0.356310 -0.53605
## 20 -0.3329 0.47117 0.2105 -0.392980 0.560962 0.11002
## 21 0.3833 0.31357 -0.8123 -0.118231 -0.441071 1.28428
## 22 -0.3329 0.47117 0.2105 0.511732 0.101190 0.06100
## 23 -0.3329 0.47117 0.2105 0.508564 0.124313 0.07243
## 24 -0.3329 0.47117 0.2105 0.371002 -0.195805 -0.26911
## 25 -0.3329 0.47117 0.2105 0.590368 0.333931 0.25783
## 26 0.3833 0.31357 -0.8123 -0.305500 1.133393 0.20914
## 27 0.3833 0.31357 -0.8123 0.313382 -0.272787 -0.93446
## 28 0.3833 0.31357 -0.8123 0.110348 -0.419536 -0.55896
##
##
## Biplot scores for constraining variables
##
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## isolatewarm 0.2099 -0.9335 0.2908 0 0 0
## temperature28 0.9828 0.1359 -0.1253 0 0 0
## isolatewarm:temperature28 0.8041 -0.1224 0.5818 0 0 0
##
##
## Centroids for factor constraints
##
## CAP1 CAP2 CAP3 MDS1 MDS2 MDS3
## isolatecold -0.09415 0.41864 -0.13043 0 0 0
## isolatewarm 0.07062 -0.31398 0.09782 0 0 0
## temperature6 -0.28450 -0.03934 0.03628 0 0 0
## temperature28 0.51211 0.07081 -0.06531 0 0 0
anova(cap,by="axis",model="direct",perm.max=9999,step=1000)
## Permutation test for capscale under direct model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: capscale(formula = lipids_distBC ~ isolate * temperature)
## Df SumOfSqs F Pr(>F)
## CAP1 1 0.40274 41.1581 0.001 ***
## CAP2 1 0.05493 5.6138 0.029 *
## CAP3 1 0.00929 0.9489 0.384
## Residual 24 0.23484
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(cap,by="terms",model="direct",perm.max=9999,step=1000)
## Permutation test for capscale under direct model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## Model: capscale(formula = lipids_distBC ~ isolate * temperature)
## Df SumOfSqs F Pr(>F)
## isolate 1 0.06640 6.7859 0.007 **
## temperature 1 0.38445 39.2889 0.001 ***
## isolate:temperature 1 0.01611 1.6460 0.191
## Residual 24 0.23484
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# all functions more or less taken from RDA
# e.g. to get site scores for plotting
#scores(cap,display="sites")[,1:2]
In general, to analyse:
short gradients > linear methods: PCA and RDA
long gradients > unimodal methods: CA and CCA
similarities among samples based on distance calculations > PCoA, NMDS, db-RDA
Explore main gradients of variation > unconstrained methods: PCA, CA, PCoA, NMDS
Identify gradients of variation in a dataset explained by another dataset > constrained methods: RDA, CCA, db-RDA
Paliy and
Shankar in Molecular Ecology 2016