wildlifeDI: Contact Analysis Workflow

Jed Long

22 March, 2024

Background

This vignette aims to demonstrate the workflows used to perform contact analysis using the wildlifeDI package in R. Specifically, two datasets are used to show how the different functions for contact analysis can be used. The main contact analysis functions in the wildilfeDI package have all been given a ‘con’ prefix (e.g., conProcess) so that they clearly stand apart from the dynamic interaction indices and other functions available in the package.

Set Up

Libraries and Packages

library(wildlifeDI)
library(move2)
library(adehabitatLT)
library(ggplot2)
library(sf)
library(igraph)
library(dplyr)

Contact Analysis - Doe Deer Data

First let’s take a look at the doe deer data.

data(does)
does
## A <move2> with `track_id_column` "id" and `time_column` "date"
## Containing 7 tracks lasting on average 31 days in a
## Simple feature collection with 10364 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -97.27711 ymin: 34.00079 xmax: -97.23606 ymax: 34.02512
## Geodetic CRS:  WGS 84
## First 10 features:
##                   date          id Elev    Slope  pForest     dPond dRoads
## 1  2011-04-30 19:02:37 d16241y2011  248 2.781886 3.719008  67.08204    330
## 2  2011-04-30 19:32:41 d16241y2011  248 2.781886 3.719008  67.08204    330
## 3  2011-04-30 20:02:37 d16241y2011  246 5.710593 3.719008  60.00000    330
## 4  2011-04-30 20:32:36 d16241y2011  253 2.134300 4.132231 161.55495    300
## 5  2011-04-30 21:02:37 d16241y2011  253 1.350224 4.545455 189.73666    300
## 6  2011-04-30 21:32:37 d16241y2011  253 2.160789 4.132231 174.92856    330
## 7  2011-04-30 22:02:47 d16241y2011  253 2.160789 4.132231 174.92856    330
## 8  2011-04-30 22:32:36 d16241y2011  253 2.160789 4.132231 174.92856    330
## 9  2011-04-30 23:02:40 d16241y2011  253 2.160789 4.132231 174.92856    330
## 10 2011-04-30 23:32:37 d16241y2011  253 1.391763 4.545455 201.24612    330
##                      geometry
## 1    POINT (-97.261 34.01511)
## 2   POINT (-97.26103 34.0151)
## 3   POINT (-97.26128 34.0149)
## 4  POINT (-97.26148 34.01305)
## 5  POINT (-97.26132 34.01291)
## 6  POINT (-97.26118 34.01297)
## 7  POINT (-97.26117 34.01295)
## 8  POINT (-97.26118 34.01297)
## 9  POINT (-97.26118 34.01297)
## 10    POINT (-97.261 34.0129)
## Track features:
##            id
## 1 d16241y2011
## 2 d16243y2011
## 3 d16244y2011
## 4 d16246y2011
## 5 d16247y2011
## 6 d16250y2011
## 7 d16252y2011

Next, we can make a quick plot of the deer does data to see it on a map. We will color each individual a seperate color.

ggplot(does) + 
  geom_sf(aes(color=mt_track_id(does)))

Processing contacts

We use the conProcess function to identify contacts first. We use a temporal threshold of \(t_c\) = 15 minutes (based on the 30 minute tracking data) to define simultaneous fixes. A distance threshold of \(d_c\) = 50 m (based on biologically relevant signals between deer and previous literature) was used to define contacts. The parameter dc must be specified in the correct units (i.e., those associated with the tracking dataset). The parameter tc needs to be specified in seconds. We can look at the distribution of all paired fixes (based on tc) to further explore whether our choice for dc makes sense.

dcPlot(does,tc=15*60,dmax=1000)

## [1]  55.55483 126.26098 287.87502 398.98468 499.99346 570.69961 671.70839
## [8] 732.31366 833.32244

The red lines in the dcPlot are automatically generated using a natural breaks algorithm to find local minima. They are more for reference, and not to be used for empirical assessment. That being said, it appears that a choice of \(d_c\)=50 corresponds to the first local minima.

doecons <- conProcess(does,dc=50,tc=15*60)
table(doecons$contact)
## 
##    0    1 
## 9871  493

We can see that a total of 493 contacts have been identified in the dataset.

Next we can arrange contacts between does into phases of continuous interaction using the function conPhase. A parameter \(p_c\) is used to group contacts as belonging to the same phases separated by \(p_c\) units in time. The parameter \(p_c\) must be specified in seconds. The function conSummary can be used to summarize contacts and phases within the entire dataset to get some basic statistics. It computes how many contacts are observed, and in how many unique segments these occur in, as well as some other values regarding contact phase duration. Here \(p_c\) = 60 minutes.

doephas <- conPhase(doecons, pc=60*60)

consum <- doephas |>
    st_drop_geometry() |>
    filter(!is.na(contact_pha)) |>
    dplyr::group_by(contact_pha) |>
    dplyr::summarise(
      nfix = n(),
      t1 = min(date),
      t2 = max(date),
      duration = max(date)-min(date),
      avg_d = mean(contact_d,na.rm=T),
      min_d = min(contact_d,na.rm=T),
      max_d = max(contact_d,na.rm=T)
    ) 
consum
## # A tibble: 98 × 8
##    contact_pha  nfix t1                  t2                  duration  avg_d
##          <dbl> <int> <dttm>              <dttm>              <drtn>    <dbl>
##  1           1     2 2011-05-02 04:02:40 2011-05-02 04:32:50 1810 secs  22.0
##  2           2     6 2011-05-03 01:33:01 2011-05-03 04:02:41 8980 secs  23.0
##  3           3     1 2011-05-04 01:02:36 2011-05-04 01:02:36    0 secs  28.9
##  4           4     5 2011-05-05 11:32:40 2011-05-05 13:32:37 7197 secs  32.3
##  5           5     1 2011-05-05 19:02:36 2011-05-05 19:02:36    0 secs  21.2
##  6           6     6 2011-05-06 01:02:36 2011-05-06 03:32:40 9004 secs  27.0
##  7           7     5 2011-05-07 14:02:40 2011-05-07 16:02:37 7197 secs  18.9
##  8           8     5 2011-05-07 20:32:37 2011-05-07 22:32:37 7200 secs  20.6
##  9           9     6 2011-05-08 07:32:39 2011-05-08 10:02:57 9018 secs  31.4
## 10          10     5 2011-05-09 14:02:37 2011-05-09 16:02:40 7203 secs  27.1
## # ℹ 88 more rows
## # ℹ 2 more variables: min_d <dbl>, max_d <dbl>

From the phase analysis we can see that there are 98 contact phases (or periods) in the dataset, some contacts are fleeting lasting only a single fix, some are lengthy bouts of interactive behaviour, the longest lasting for 18.5 hours. These summary stats can be customized and queried to answer specific questions related to contact phases.

For example, we can extract more detailed information about the timing and duration of phases within the dataset. We plot the frequency histogram of the initiation of contact phases (by hour) throughout the day and compare that to the frequency histogram of all contacts.

#contact phase initiaition tod
consum$tod <- as.numeric(as.POSIXlt(consum$t1)$hour + as.POSIXlt(consum$t1)$min / 60)  #convert POSIX to hours

conall <- doecons |>
  subset(contact == 1)
conall$tod <- as.numeric(as.POSIXlt(conall$date)$hour + as.POSIXlt(conall$date)$min / 60)  #convert POSIX to hours


h1 <- ggplot(consum,aes(tod)) + 
  geom_histogram(binwidth=1)
h2 <- ggplot(conall,aes(tod)) + 
  geom_histogram(binwidth=1)
h1

h2

We can see the clear diurnal pattern in when contacts occur, which corresponds to known activity peaks in deer.

Mapping Contacts

Using the built in functionality of ‘move2’ objects we can easily make maps of contacts. First plot the distribution of all contact points on top of the distribution of all GPS fixes.

ggplot() + 
  geom_sf(data=does,aes(color=mt_track_id(does))) +
  geom_sf(data=conall)

We can see that the contacts are clustered around certain locations when compared to all GPS telemetry fixes.

Next, lets map only the initiation of phases (i.e., the first fix in every phase).

pha_fir <- doephas |>
  filter(!is.na(contact_pha)) |>
  group_by(contact_pha) |>
  filter(row_number()==1)


ggplot() + 
  geom_sf(data=does,aes(color=mt_track_id(does))) +
  geom_sf(data=pha_fir)

Here we can see a small difference in the spatial pattern of the initiation of contact phases the original distribution of GPS fixes, and the locations of all contact points.

Finally, lets map the contact phases as lines.

pha_lin <- doephas |>
  filter(!is.na(contact_pha)) |>
  group_by(contact_pha) |>
  summarise(n = dplyr::n(),do_union=FALSE) |>
  filter(n > 1) |>
  st_cast("LINESTRING")

ggplot() + 
  geom_sf(data=does,aes(color=mt_track_id(does))) +
  geom_sf(data=pha_lin)

The map of the phases as lines can be used to provide different insight into the spatial structure of contact phases throughout the study area.

Derive the Contact Network

From a contact list it is relatively straightforward to derive the contact network. We are usually interested in one of two parameters:

The contact matrix can be asymmetric for counts in the case of irregular fixes and/or depending on how the \(t_c\) parameter is chosen, but in many (or most) cases it should be symmetric. The contact matrix for rates is typically assymetric because individuals typically have different numbers of overall fixes in a given tracking dataset.

The input into this analysis is simply the output from the conProcess function which can produce a table of all contacts by setting return = ‘contact’.

cons <- conProcess(does,dc=50,tc=15*60,return='contacts')


tab_cnt <- cons |>
  count(id1,id2)
gr <- graph_from_data_frame(tab_cnt,directed=FALSE)
E(gr)$weight <- tab_cnt$n

plot(gr)

Comparing contacts to non-contact fixes

We can study if the does behaved differently during contacts compared to other times. To do this we can compare contact fixes to non-contact fixes. Here we show two variables: percent forest cover (related to habitat) which was already present in the data and movement step-length (related to behaviour). We will look at fixes immediately before and after contacts and compare to all other fixes. The function conTimelag is a useful tool for calculating the time from any given fix to the nearest contact fix, which can then be used in subsequent analysis as shown below.

doephas$stepLength <- as.numeric(mt_distance(doephas))

# Calculate time to any contact fix
doephas <- conTimelag(doephas,def='all')

#categorize time to contact as immediately before, contact, after, or non contact (NA) 
#Should be tailored to individual dataset
doephas$dt_lev <-  cut(doephas$contact_timelag, breaks = c(-Inf,-45*60,-15*60,15*60,45*60,Inf), labels = c("Other","Before","Contact","After","Other"))
table(doephas$dt_lev)
## 
##   Other  Before Contact   After 
##    8146      98     542      98
ggplot(doephas, aes(x=dt_lev, y=pForest)) + 
  geom_boxplot() +
  labs(x='',y='Forest Cover (%)') 

The boxplots show a visible difference for percent forest cover between contacts and non-contact (other) fixes. Note there is an additional NA boxplot. In this case these refer to an individual that had no contacts with any other individual, and therefore, the function conTimelag returns NA values associated with all fixes for the time to nearest contact. These could be reclassified as ‘other’ but here it is interesting to at least consider them differently, as this individual seems to be more associated with high forest cover than the other individuals which have higher contact levels.

ggplot(doephas, aes(x=dt_lev, y=stepLength)) + 
  geom_boxplot() +
  labs(x='',y='Step-Length (m)') + 
  scale_y_continuous(trans='log10')
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 10 rows containing non-finite values (`stat_boxplot()`).

With step-length it is a little different. Visually it appears that the step lengths before contacts are a little bit higher than at other times (during contacts, after contacts, and during non-contacts).

Summary

The wildlifeDI package can be used to tackle a wide range of problems when performing contact analysis using wildlife tracking data. Specifically, it provides tools to process, manage, and analyze contacts spatially and temporally. It provides output data structures that are useful for integration in R’s well established statistical modelling packages facilitating further statistical analyses.

Session Information

sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=C                    LC_CTYPE=English_Canada.utf8   
## [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.utf8    
## 
## time zone: America/Toronto
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] dplyr_1.1.3         igraph_1.5.1        sf_1.0-15          
##  [4] ggplot2_3.4.4       adehabitatLT_0.3.27 CircStats_0.2-6    
##  [7] boot_1.3-28.1       MASS_7.3-60         adehabitatMA_0.3.16
## [10] ade4_1.7-22         sp_2.1-1            move2_0.2.6        
## [13] wildlifeDI_1.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.3       xfun_0.39          bslib_0.4.2        lattice_0.21-9    
##  [5] tzdb_0.4.0         vctrs_0.6.4        tools_4.3.2        generics_0.1.3    
##  [9] tibble_3.2.1       proxy_0.4-27       fansi_1.0.4        highr_0.10        
## [13] pkgconfig_2.0.3    KernSmooth_2.23-22 assertthat_0.2.1   lifecycle_1.0.4   
## [17] farver_2.1.1       compiler_4.3.2     munsell_0.5.0      codetools_0.2-19  
## [21] htmltools_0.5.7    class_7.3-22       sass_0.4.5         yaml_2.3.7        
## [25] pillar_1.9.0       crayon_1.5.2       jquerylib_0.1.4    classInt_0.4-9    
## [29] cachem_1.0.7       lwgeom_0.2-13      wk_0.7.2           tidyselect_1.2.1  
## [33] digest_0.6.31      labeling_0.4.2     fastmap_1.1.1      grid_4.3.2        
## [37] colorspace_2.1-0   cli_3.6.1          magrittr_2.0.3     utf8_1.2.3        
## [41] e1071_1.7-13       withr_3.0.0        scales_1.2.1       bit64_4.0.5       
## [45] rmarkdown_2.21     bit_4.0.5          evaluate_0.20      knitr_1.45        
## [49] s2_1.1.2           rlang_1.1.2        Rcpp_1.0.10        glue_1.6.2        
## [53] DBI_1.2.2          rstudioapi_0.14    vroom_1.6.3        jsonlite_1.8.4    
## [57] R6_2.5.1           units_0.8-1