Introduction to wpeR package

Welcome to the wpeR vignette! wpeR is an R package designed for analyzing wild pedigree data. Pedigree reconstruction is a powerful tool for understanding the genetic structure of wild populations, but it can also generate large and complex datasets that can be difficult to analyze. wpeR provides a streamlined solution for exploring and visualizing this type of data, allowing the user to gain insights into the genetic relationships between individuals in wild populations. In this vignette, we will introduce the main features of wpeR and demonstrate how they can be used to analyze and interpret wild pedigrees.

To get started install the the latest development of the package from GitHub:

devtools::install_github("GR3602/wpeR")

You should be now able to load wpeR.

library(wpeR)

1 Input data

wpeR works with two main input datasets:

  1. Pedigree

    1. COLONY pedigree data.The reconstructed pedigree is an output of COLONY software and is stored in the colony project output folder. The function get_colony() automatically reads the pedigree file so you do not need to import it into the R session.

    2. Custom pedigree data. Users can work with pedigree data reconstructed by any software as long as it follows the formatting rules specified in the get_ped() function.

  2. Genetic samples metadata.
    This dataset should include information on all genetic samples belonging to the animals included in the pedigree and must include columns that describe:

    • Sample unique identifier code.
    • Date of sample collection in YYYY-MM-DD format.
    • Identifier code of the particular individual that the sample belongs to.
    • Genetic sex coded as M for males, F for females and NA for unknown sex.
    • Geographic location from where the sample was collected, as latitude and longitude in WGS84 coordinate system (EPSG: 4326).
    • Sample type (eg: scat, urine, tissue)

Correctly formatted genetic samples metadata is crucial for the proper functioning of wpeR functions. To ensure that your genetic samples metadata conforms to the package’s rules, the package includes check_sampledata() function. This function performs a series of checks and validations on your input data to verify its integrity and compatibility. If all the validations are passed the check_sampledata() function outputs the sample metadata data frame that can be seamlessly used in downstream analyses.

An example pedigree and sample metadata (wolf_samples) is included in this package and this two datasets will be used throughout this vignette.

To check if the genetic sample metadata is formatted correctly you can link the columns in the data frame with the parameters of the check_sampledata() function:

sampledata <- check_sampledata(
  Sample = wolf_samples$Sample,
  Date = wolf_samples$Date,
  AnimalRef = wolf_samples$AnimalRef,
  GeneticSex = wolf_samples$GeneticSex,
  lat = wolf_samples$lat,
  lng = wolf_samples$lng,
  SType = wolf_samples$SType
)

If there are no errors or warnings during the execution of the check_sampledata() function, it indicates that the genetic sample metadata is correctly formatted. You can use the returned data frame, in downstream analyses. Example of properly formatted sample metadata with all required columns looks like this:

head(sampledata)
#>   Sample       Date AnimalRef GeneticSex      lat      lng SType
#> 1  M10XC 2017-11-16     M10XC          M 45.70766 14.12922  Scat
#> 2  M0PXH 2017-11-22     M10XC          M 45.71356 14.10497  Scat
#> 3  M0PFL 2017-12-22     M10XC          M 45.69898 14.07907  Scat
#> 4  M1J47 2019-08-20     M1J47          M 45.70854 14.09644  Scat
#> 5  M1HF2 2019-08-31     M1J47          M 45.69930 14.05550  Scat
#> 6 MSV163 2020-07-17     M1J47          M 45.71804 14.14319  Scat

2 The workflow

Since many of the functions in wpeR build upon the results of previous functions, it is recommended to follow a specific sequence when using the package. Here, we present the optimal workflow for using wpeR.

Function
call
order
Function Description
1a get_colony() Organizes COLONY output
1b get_ped() Organizes Pedigree Data
2 anim_timespan() Get dates of individuals first and last sample
3 org_fams() Organizes animals into families and expands pedigree data
4 plot_table() Prepares pedigree data for plotting and spatial representation
5.1 ped_satplot() Temporal plot of pedigree
5.2 ped_spatial() Get Files For Spatial Representation Of Pedigree

2.1 Import the pedigree

COLONY PEDIGREE DATA

Pedigree reconstructed by COLONY software is imported into the R session by get_colony() function. Apart form reading the colony output file get_colony() also adds missing parents to OffspirngID, assigns sex to each animal and adds the probability of paternity and maternity assignment as calculated by COLONY.

path <- paste0(system.file("extdata", package = "wpeR"), "/wpeR_samplePed")
ped_colony <- get_colony(
  colony_project_path = path, 
  sampledata =  wolf_samples
  )

tail(ped_colony)
#>    ClusterIndex     id father mother sex
#> 60            1 MSV0T7  M20AM  M273P   1
#> 61            1 MSV0TJ  M20AM  M273P   2
#> 62            1 MSV0UL  M20AM  M273P   1
#> 63            1 MSV0X4  M20AM  M273P   1
#> 64            1 MSV17F  M20AM  M273P   2
#> 65            1 MSV1MH  M20AM  M273P   2

CUSTOM PEDIGREE DATA

In cases when the pedigree was not reconstructed with COLONY software you must use the get_ped() function. Under the hood the get_colony() and get_ped() are very similar, the latter having a little less functionalities, because it is primarily designed so that any pedigree data can be used in downstream analysis. When using get_ped() function it is important to note that the ped parameter (the reconstructed pedigree) has to be formatted as a basic pedigree with four columns corresponding to offspring (has to be named OffspringID), father (has to be named FatherID) and mother (has to be named MotherID). Unknown parents should be represented by NA values.

ped <- data.frame(
  OffspringID = c(
    "M273P", "M20AM", "M2757", "M2ALK", "M2ETE", "M2EUJ", "MSV00E",
    "MSV018", "MSV05L", "MSV0M6", "MSV0T4", "MSV0T7", "MSV0TJ", "MSV0UL"
  ),
  FatherID = c(
    NA, NA, "M20AM", "M20AM", "M20AM", "M20AM", "M20AM",
    "M20AM", "M20AM", "M20AM", "M20AM", "M20AM", "M20AM", "M20AM"
  ),
  MotherID = c(
    NA, NA, "M273P", "M273P", "M273P", "M273P", "M273P",
    "M273P", "M273P", "M273P", "M273P", "M273P", "M273P", "M273P"
  )
)


get_ped(
    ped = ped,
    sampledata = wolf_samples
    )
#>        id father mother sex
#> 1   M273P   <NA>   <NA>   2
#> 2   M20AM   <NA>   <NA>   1
#> 3   M2757  M20AM  M273P   2
#> 4   M2ALK  M20AM  M273P   2
#> 5   M2ETE  M20AM  M273P   2
#> 6   M2EUJ  M20AM  M273P   2
#> 7  MSV00E  M20AM  M273P   1
#> 8  MSV018  M20AM  M273P   1
#> 9  MSV05L  M20AM  M273P   1
#> 10 MSV0M6  M20AM  M273P   1
#> 11 MSV0T4  M20AM  M273P   1
#> 12 MSV0T7  M20AM  M273P   1
#> 13 MSV0TJ  M20AM  M273P   2
#> 14 MSV0UL  M20AM  M273P   1

The output of the get_colony() and get_ped() functions can can be formatted in different ways to facilitate downstream analysis with other R packages for pedigree analysis and visualization. The format of the output is defined by out parameter. Both functions support downstream analysis with kinship2, pedtools or FamAgg packages.

2.1.1 [example] get_colony() & kinship2

library(kinship2)
#> Loading required package: Matrix
#> Loading required package: quadprog
ped_ks2 <- get_colony(path, wolf_samples, out = "kinship2")

ped_ks2 <- ped_ks2[!(ped_ks2$dadid %in% "M2AM8"),]

ped_ks2 <- pedigree(
  ped_ks2$id,
  ped_ks2$dadid,
  ped_ks2$momid,
  ped_ks2$sex
)
plot(ped_ks2, symbolsize = 1.5, cex = 0.4)

#> Did not plot the following people: M2AM8

2.2 Animal timespan

anim_timespan() function creates ‘first seen’ and ‘last seen’ columns for each animal in the pedigree by examining the dates of all genetic samples associated with that animal. This is an important step in obtaining a temporal perspective of the pedigree, as it allows other functions to work with time frame over which each animal was observed. Besides that the functions determines if animal is dead based on predefined sample type eg. tissue.

animal_ts <- anim_timespan(
  individual_id = wolf_samples$AnimalRef,
  sample_date = wolf_samples$Date,
  sample_type = wolf_samples$SType,
  dead = c("Tissue")
)

head(animal_ts)
#>      ID  FirstSeen   LastSeen IsDead
#> 1 M10XC 2017-11-16 2017-12-22  FALSE
#> 2 M1J47 2019-08-20 2021-01-07  FALSE
#> 3 M1YP0 2017-01-25 2017-01-25   TRUE
#> 4 M200F 2015-07-27 2018-08-22  FALSE
#> 5 M20AM 2016-08-29 2020-08-02  FALSE
#> 6 M220J 2017-11-10 2018-02-17  FALSE

As shown above the anim_timespan() function creates a sort of a code list of animal detection time frame. To feed this data to subsequent functions the anim_timespan() function output needs to be merged with sample metadata. This additional step ensures that all relevant information about each animal is included and facilitates downstream analysis.

sampledata <- merge(wolf_samples, animal_ts, by.x = "AnimalRef", by.y = "ID", all.x = TRUE )
head(sampledata)
#>   AnimalRef Sample       Date GeneticSex      lat      lng SType  FirstSeen
#> 1     M10XC  M10XC 2017-11-16          M 45.70766 14.12922  Scat 2017-11-16
#> 2     M10XC  M0PXH 2017-11-22          M 45.71356 14.10497  Scat 2017-11-16
#> 3     M10XC  M0PFL 2017-12-22          M 45.69898 14.07907  Scat 2017-11-16
#> 4     M1J47  M1J47 2019-08-20          M 45.70854 14.09644  Scat 2019-08-20
#> 5     M1J47  M1HF2 2019-08-31          M 45.69930 14.05550  Scat 2019-08-20
#> 6     M1J47 MSV163 2020-07-17          M 45.71804 14.14319  Scat 2019-08-20
#>     LastSeen IsDead
#> 1 2017-12-22  FALSE
#> 2 2017-12-22  FALSE
#> 3 2017-12-22  FALSE
#> 4 2021-01-07  FALSE
#> 5 2021-01-07  FALSE
#> 6 2021-01-07  FALSE

2.3 Organize families

The org_fams() function takes the pedigree data generated by the get_colony()/get_ped() function and groups animals into families. This function expands the pedigree by adding information about the family that each individual was born in and the individual’s status as a reproductive animal. Based on the ´output´ parameter the function can return a data frame (ped or fams) or a list with two objects (ped and fams). In the examples below we will present each of the two data frames separately.

The result of org_fams() function introduces us to two important concepts within the context of this package: family and half-sib group. In the wpeR package, a family is defined as a group of animals where at least one parent and at least one offspring are known. Meanwhile, a half-sib group refers to a group of half-siblings who are either maternally or paternally related. In the function’s output, the DadHSgroup parameter groups paternal half-siblings, while the MomHSgroup parameter groups maternal half-siblings.

2.3.1 Pedigree

ped_org <- org_fams(ped = ped_colony, sampledata = sampledata, output = "ped")

tail(ped_org)
#>    ClusterIndex     id father mother sex     parents FamID  FirstSeen
#> 60            1 MSV0T7  M20AM  M273P   1 M20AM_M273P     5 2019-08-11
#> 61            1 MSV0TJ  M20AM  M273P   2 M20AM_M273P     5 2019-12-28
#> 62            1 MSV0UL  M20AM  M273P   1 M20AM_M273P     5 2020-07-15
#> 63            1 MSV0X4  M20AM  M273P   1 M20AM_M273P     5 2019-09-03
#> 64            1 MSV17F  M20AM  M273P   2 M20AM_M273P     5 2020-11-08
#> 65            1 MSV1MH  M20AM  M273P   2 M20AM_M273P     5 2021-02-25
#>      LastSeen IsDead DadHSgroup MomHSgroup hsGroup
#> 60 2020-02-09   TRUE       <NA>       <NA>       4
#> 61 2019-12-28   TRUE       <NA>       <NA>       4
#> 62 2020-07-15  FALSE       <NA>       <NA>       4
#> 63 2019-10-23   TRUE       <NA>       <NA>       4
#> 64 2020-12-04  FALSE       <NA>       <NA>       4
#> 65 2021-07-15  FALSE       <NA>       <NA>       4

The ped output is just an extend version of pedigree obtained by get_colony() function. Apart from common pedigree information individual, mother, father, sex, family), ped also includes information on:

  • parents: identifier codes of both parents separated with ⁠_⁠,
  • FamID: number of family that the individual belongs to (see Families below),
  • FirstSeen: date of first sample of individual,
  • LastSeen: date of last sample of individual,
  • IsDead: logical value (TRUE/FALSE) that identifies if the individual is dead,
  • DadHSgroup: identifier of paternal half-sib group,
  • MomHSgroup: identifier of maternal half-sib group,
  • hsGroup: half-sib group of the individual.

2.3.2 Families

fams_org <- org_fams(ped = ped_colony, sampledata = sampledata, output = "fams")

head(fams_org)
#>         parents   father   mother FamID   FamStart     FamEnd FamDead
#> 7   M228J_M200F    M228J    M200F     1 2017-01-25 2018-08-22    TRUE
#> 15 MSV00E_M28LU   MSV00E    M28LU     2 2020-08-16 2021-03-05   FALSE
#> 24  M2772_M28TU    M2772    M28TU     3 2019-08-28 2021-04-23   FALSE
#> 33  M2AM8_M200F    M2AM8    M200F     4 2018-10-29 2021-03-23   FALSE
#> 51  M20AM_M273P    M20AM    M273P     5 2018-01-05 2020-08-02   FALSE
#> NA      Unknown *Unknown #Unknown     0 2015-07-27 2021-04-23   FALSE
#>    DadHSgroup MomHSgroup hsGroup
#> 7        <NA>     MomP_1       1
#> 15       <NA>       <NA>       2
#> 24       <NA>       <NA>       3
#> 33       <NA>     MomP_1       1
#> 51       <NA>       <NA>       4
#> NA       <NA>       <NA>       0

The fams output contains information about the families to which individuals in the pedigree belong. The families are described by:

  • parents: identifier codes of both parents separated with ⁠_⁠,
  • father: identifier code of the father,
  • mother: identifier code of the mother,
  • FamID: numeric value that identifies a particular family,
  • famStart: date when the first sample of any of the family members was collected1,
  • famEnd: date when the last sample of any of the family members was collected1,
  • FamDead: logical value (TRUE/FALSE) that identifies if the family does not exist any more,
  • DadHSgroup: Identifier connecting families that share the same father.
  • MomHSgroup: Identifier connecting families that share the same mother.
  • hsGroup: Numeric value connecting families that share one of the parents.

1famStart and famEnd columns, estimate a time window for the family based solely on sample collection dates provided in sampledata. famStart indicates the date of the earliest sample collected from any offspring belonging to that family. famEnd indicates the date of the latest sample collected from either the mother or the father of that family. It is important to recognize that this method relies on observation (sampling) dates. Consequently, famEnd (last parental sample date) can precede famStart (first offspring sample date), creating a biologically impossible sequence and a negative calculated family timespan. Users should interpret the interval between famStart and famEnd with this understanding.

2.4 Plotting table

To produce a temporal and spatial pedigree representation, the sample metadata needs to be formatted in a specific way, which can be achieved with the plot_table() function. This function combines the outputs of previous functions (fams and ped from the org_fams() function) with sample metadata, with all three data frames serving as inputs.

The function offers flexibility in selecting families for visualization through the plot_fams parameter. To include all families included in the pedigree, plot_fams should be set to “all” (which is the default). For plotting a subset of families, provide a numeric vector of the desired FamIDs, which are the family identification numbers generated by org_fams() and can be seen in the fams output table.

In order for the plot_table() function to work sample metadata has to include some specific information, most of them are already defined in the Input data part of this vignette, apart form them the sample metadata must also include columns on the date of first and last sample of individual and logical value identifying if the individual is dead. All this additional information can be added by anim_timespan() function (see Animal timespan). If the information stored in sample data does not use default column names (to see them check documentation ?plot_table) the custom names can be defined as a vector with parameter datacolumns.

pt <- plot_table(
  plot_fams = "all",
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue")
)

head(pt)
#>     Sample AnimalRef GeneticSex       Date  SType      lat      lng  FirstSeen
#> 54   M20AP     M228J          M 2016-09-30 Saliva 45.71140 14.01201 2016-09-30
#> 55   M228J     M228J          M 2017-01-26   Scat 45.70406 14.12798 2016-09-30
#> 56   M28ML     M228J          M 2017-08-18 Saliva 45.67397 14.11150 2016-09-30
#> 57   M28MM     M228J          M 2017-08-18 Saliva 45.67397 14.11150 2016-09-30
#> 58   M2C36     M228J          M 2018-02-09 Tissue 45.67033 14.15404 2016-09-30
#> 10 EX.1JH0     M200F          F 2015-07-27 Saliva 45.75250 14.14653 2015-07-27
#>      LastSeen IsDead plottingID FamID hsGroup  rep later_rep isPolygamous  dead
#> 54 2018-02-09   TRUE          1     1       1 TRUE     FALSE        FALSE FALSE
#> 55 2018-02-09   TRUE          1     1       1 TRUE     FALSE        FALSE FALSE
#> 56 2018-02-09   TRUE          1     1       1 TRUE     FALSE        FALSE FALSE
#> 57 2018-02-09   TRUE          1     1       1 TRUE     FALSE        FALSE FALSE
#> 58 2018-02-09   TRUE          1     1       1 TRUE     FALSE        FALSE  TRUE
#> 10 2018-08-22  FALSE          2     1       1 TRUE     FALSE         TRUE FALSE
#>    first_sample last_sample IsReference
#> 54         TRUE       FALSE       FALSE
#> 55        FALSE       FALSE        TRUE
#> 56        FALSE       FALSE       FALSE
#> 57        FALSE       FALSE       FALSE
#> 58        FALSE        TRUE       FALSE
#> 10         TRUE       FALSE       FALSE

The plot_table() function output adds additional information to sample metadata which include:

Apart from adding additional information to sample metadata, plot_table() also duplicates sample entries (rows) for animals that are present in more than one family (eg. polygamous animals, animals that were detected as offspring in one family and later as reproductive animal in another). Considering that, it is crucial for users to be aware of this data duplication when utilizing the plot_table() output in analysis outside of the scope of this package.

nrow(sampledata) == nrow(pt)
#> [1] FALSE

After applying the plot_table() function, the pedigree data is prepared for temporal and spatial visualization, marking the completion of the data preparation phase in this package’s workflow. The data visualization stage involves two functions: ped_satplot() for temporal representation and ped_spatial() for spatial representation.

2.5 Temporal plot

The core of the temporal plot, generated by the ped_satplot() function, is the representation of the occurrence of samples for each individual (y-axis) trough time (x-axis). Furthermore the individuals are first grouped by families and then by half-sib groups. Within each family, the individuals are arranged from top to bottom based on the date of their first sample collection. At the bottom of each family, the animal that was initially detected is positioned, followed by subsequent animals in chronological order. This layout enables a visual understanding of the temporal relationships within and between families, with each family forming a distinct cluster in the plot.

Each sample is visually depicted as a point on the plot, and these points are connected by lines to represent the continuous survival of the individual. This connection remains intact even during periods where no samples of that particular individual were collected. Each sample can be additional marked to represent any additional characteristics of a particular individual (eg. reproductive animal, polygamous animal). Additionally, certain samples can be marked to indicate mortality (eg. tissue samples).

Before we get started with the first plot it is important to look back at plot_table() function and the previously mentioned plot_fams parameter. This parameter allow us to select a subset of families that we would like to plot. In the below example just one family (FamID = 4) is selected for plotting.

pt <- plot_table(
  plot_fams = 4,
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

sp <- ped_satplot(pt)

sp 

An example of two families that share the same mother (FamID = 1 & 4)

pt <- plot_table(
  plot_fams = c(1,4),
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

sp <- ped_satplot(pt)

sp

Technically, there is no limit to the number of families that can be plotted in this manner. However, as the number of families increases, the complexity of the graph intensifies, making it progressively more challenging to comprehend. This can be observed in the example of five families, two of which share a reproductive animal.

pt <- plot_table(
  plot_fams = c(1:5),
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

sp <- ped_satplot(pt)

sp

2.6 Spatial files

To incorporate a spatial dimension into the pedigree analysis, the ped_spatial() function comes into play. Acting as a wrapper function, ped_spatial() combines multiple functions that utilize the output of the plot_table() function, transforming it into various sf objects that can be visualized on a map. It’s worth noting that the function automatically removes samples without coordinates, as they cannot be plotted.

By utilizing the default function parameters, the ped_spatial() function produces a list containing 14 sf objects.

pt <- plot_table(
  plot_fams = 1,
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

ps <- ped_spatial(pt)

summary(ps)
#>                       Length Class Mode
#> motherRpoints         16     sf    list
#> fatherRpoints         16     sf    list
#> offspringRpoints      19     sf    list
#> motherMovePoints      16     sf    list
#> fatherMovePoints      16     sf    list
#> offspringMovePoints   19     sf    list
#> maternityLines         8     sf    list
#> paternityLines         8     sf    list
#> motherMoveLines        3     sf    list
#> fatherMoveLines        3     sf    list
#> offspringMoveLines     3     sf    list
#> motherMovePolygons     3     sf    list
#> fatherMovePolygons     3     sf    list
#> offspringMovePolygons  3     sf    list

Through the integration of POINT, LINESTRING, and POLYGON geometries, the ped_spatial() function generates sf objects that establish connections between parent and offspring samples, as well as samples of the same individual. This enables users to analyze and interpret the spatial progression of a pedigree. Created objects can be categorized into 5 broader categories:

By specifying the fulsibdata parameter in the ped_spatial() function, you can include the FullsibLines object in the output list. The FullsibLines is a LINESTRING object connecting reference samples of full siblings.

fullsibdata <- read.csv(paste0(path,".FullSibDyad"))

ps <- ped_spatial(pt, fullsibdata = fullsibdata)

summary(ps)
#>                       Length Class Mode
#> motherRpoints         16     sf    list
#> fatherRpoints         16     sf    list
#> offspringRpoints      19     sf    list
#> motherMovePoints      16     sf    list
#> fatherMovePoints      16     sf    list
#> offspringMovePoints   19     sf    list
#> maternityLines         8     sf    list
#> paternityLines         8     sf    list
#> motherMoveLines        3     sf    list
#> fatherMoveLines        3     sf    list
#> offspringMoveLines     3     sf    list
#> motherMovePolygons     3     sf    list
#> fatherMovePolygons     3     sf    list
#> offspringMovePolygons  3     sf    list
#> FullsibLines           4     sf    list

In the ped_spatial() function, you have the flexibility to define the time window for selecting the samples used to generate the spatial pedigree outputs. By specifying the time.limits parameter, you can set the start and end dates that limit the samples included in the spatial representation. The time.limits parameter is defined as a vector of two dates in Date format. Moreover, the function provides the option to apply time.limits selectively to specific types of output data: - time.limit.repparameter enables the application of time limits solely to offspring reference and movement points, - time.limit.offspring parameter applies the time.limits to offspring reference and movement points, - time.limit.moves parameter permits the application of time limits to the movement lines of all individuals.

ps_tl <- ped_spatial(
  plottable = pt,
  time.limits = c(as.Date("2017-01-01"), as.Date("2018-01-01")),
  time.limit.rep = TRUE,
  time.limit.offspring = TRUE,
  time.limit.moves = TRUE
)

2.6.1 [example] Drawing maps in R

To showcase the capabilities of the ped_spatial() function and to deepen our comprehension of the data frames generated by this function, we present a series of examples that highlight maps produced through the utilization of the wpeR package. These examples make use of various R packages enable visual representation of geographic datasets.

2.6.1.1 ggplot2

In the similar fashion as the temporal plots in chapter 2.5, the first set of maps shows just one family. The maps are static and produced with ggplot2, basemaps and ggsflabel packages. To clearly represent the output of ped_spatial() function different spatial files are presented separately on three maps. First showing the pedigree, second movement of reproductive animals and the third movement of the offspring. Furthermore, each of the three maps is represented in two different variants, one showing all the samples of the family members included in the sample metadata table and the other utilizing the time.limits parameter, subletting the samples to within a defined time window (presented on temporal plot with orange dotted rectangle).

We begin by creating a plotting table (plot.table) of the family/families we would like to plot (in this example family with FamID == 1), through the plot_table() function. Subsequently, the ped_spatial() function is applied to generate a list of sf data frames representing the distribution of animal samples and their relationships. Additionally we generate a second list of sf files in which all the generated dataframes are limited to the period between “2017-01-01” and “2018-01-01”.

It is worth noting that depending on the number of families, individuals and samples the maps generated from these data can appear complex and cluttered, especially if the time.limits parameter is not employed. This parameter is crucial in refining the visualizations, enhancing their clarity and interpretability.

pt <- plot_table(
  plot_fams = 1,
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

ps <- ped_spatial(pt)

ps.tl <- ped_spatial(
  plottable = pt,
  time.limits = c(as.Date("2017-01-01"), as.Date("2018-01-01")),
  time.limit.rep = TRUE,
  time.limit.offspring = TRUE,
  time.limit.moves = TRUE
)
Temporal plot, showing all individuals and samples of the plotted family.
 Orange dashed rectangle encopasses samples that fall within defined time limits.

Temporal plot, showing all individuals and samples of the plotted family. Orange dashed rectangle encopasses samples that fall within defined time limits.


Legend explaining symbols used for spatial pedigree representation.

Legend explaining symbols used for spatial pedigree representation.


Spatial pedigree representation. a) all samples, b) time window.

Spatial pedigree representation. a) all samples, b) time window.


Movement of reproductive animals as inffered from collected samples. a) all samples, b) time window.

Movement of reproductive animals as inffered from collected samples. a) all samples, b) time window.


Movement of offspring as inffered from collected samples. a) all samples, b) time window.

Movement of offspring as inffered from collected samples. a) all samples, b) time window.


2.6.1.2 leaflet

library(leaflet)
library(leaflet.providers)

pt <- plot_table(plot_fams = c(1:5),
                        fams_org,
                        ped_org,
                        sampledata,
                        deadSample = c("Tissue", "Decomposing Tissue", "Blood"))

ps <- ped_spatial(pt,
                  time.limits = c(as.Date("2020-07-01"), as.Date("2021-06-30")),
            time.limit.rep = TRUE,
            time.limit.offspring = TRUE,
            time.limit.moves = TRUE)

2.6.2 GIS output

As described above the default output of the ped_spatila() function is a list of sf objects that can be further analyzed and visualized using R packages that enable visual representation of geographic datasets such as leaflet and mapview.

To extend the possibilities of spatial analysis and visualization outside of R, ped_spatial()provides the flexibility for users to export the spatial data in formats compatible with Geographic Information System (GIS) software. By specifying the "gis" value for the output parameter and defining a folder path in the path parameter, users can store the georeferenced files in the designated folder. This allows for seamless integration and utilization of the pedigree data with GIS software, unlocking a wide range of spatial analysis capabilities and visualization options.

pt <- plot_table(
  plot_fams = 1
  all_fams = fams_org,
  ped = ped_org,
  sampledata = sampledata,
  deadSample = c("Tissue", "Decomposing Tissue", "Blood")
)

ps <- ped_spatial(
  plottable = pt,
  output = "gis",
  path = "/folder/where/GIS/files/shuld/be/saved/"
)

The created GIS files follow the same structure as described above, just the output file names are different:

sf object file name
motherRpoints momRef
fatherRpoints dadRef
offspringRpoints ofsprRef
motherMovePoints momMovPt
fatherMovePoints dadMovPt
offspringMovePoints offsprMovPt
maternityLines matLn
paternityLines patLn
motherMoveLines momMovLn
fatherMoveLines dadMovLn
offspringMoveLines offsprMovLn
motherMovePolygons momMovPoly
fatherMovePolygons dadMovPoly
offspringMovePolygons offsMovPoly
FullsibLines FsLines

To distinguish or avoid overwriting of generated files, the parameter filename can be used. The string specified with this parameter acts as a common name for all the files generated. When generating GIS files the all the other function parameters can be used as described in the beginning of this chapter (eg. fullsibdata, time.limits).

3 Outside of the workflow

Besides the functions that facilitate the visualization and analysis of pedigree data in temporal and spatial dimensions, the wpeR package currently provides an additional function designed to aid in the calculation and representation of detected animals across multiple time periods. This functions works just with sample metadata and can be used independently of the workflow described in previous chapter.

To calculate the number of captured animals between two or more time periods the function nbtw_seasons() is used. The function takes four parameters the first two: animal_id and capture_date correspond to AnimalRef and Date column in the same meta data table, respectively. The other two are vectors in ´Date´ format one corresponding to start and the other to end of the time periods of interest. It is worth noting that the function refers to these time periods as “seasons” in its terminology.

seasons <- data.frame(
  start = c(
    as.Date("2017-01-01"),
    as.Date("2018-01-01"),
    as.Date("2019-01-01")
  ),
  end = c(
    as.Date("2017-12-31"),
    as.Date("2018-12-31"),
    as.Date("2019-12-31")
  )
)

dyn_mat <- dyn_matrix(
  animal_id = wolf_samples$AnimalRef,
  capture_date = wolf_samples$Date,
  start_dates = seasons$start,
  end_dates = seasons$end
)


dyn_mat
#>                         2017-01-01 - 2017-12-31 2018-01-01 - 2018-12-31
#> 2017-01-01 - 2017-12-31                      13                       8
#> 2018-01-01 - 2018-12-31                       2                       4
#> 2019-01-01 - 2019-12-31                       0                       0
#> Tot. Skipped                                  0                       2
#>                         2019-01-01 - 2019-12-31 Tot. Capts
#> 2017-01-01 - 2017-12-31                       6         13
#> 2018-01-01 - 2018-12-31                       6         12
#> 2019-01-01 - 2019-12-31                      17         25
#> Tot. Skipped                                  0         NA

The function outputs a matrix with 1 + no. of time periods rows and columns explaining the dynamics of animal deception between included time periods. It conveys the information on all detected animals, newly detected animals, recaptured animals and skipped animals. For the purpose a more detailed explanation, we will drop the row and column names.

unname(dyn_mat)
#>      [,1] [,2] [,3] [,4]
#> [1,]   13    8    6   13
#> [2,]    2    4    6   12
#> [3,]    0    0   17   25
#> [4,]    0    2    0   NA

In the matrix presented above:

JUST TWO TIME PERIODS
To get the animal detection dynamics between just two time periods nbtw_seasons() function can be called. This function provides a simple output representing a detection dynamics between two time periods. The function takes six parameters. First two are the same as described above. The other parameters correspond to strings in Date format defining the start and end of time periods of interest.

nbtw_seasons(
 animal_id = wolf_samples$AnimalRef,
 capture_date = wolf_samples$Date,
 season1_start = as.Date("2017-01-01"),
 season1_end = as.Date("2017-12-31"),
 season2_start = as.Date("2018-01-01"),
 season2_end = as.Date("2018-12-31")
)
#>                   season1                 season2 total_cap new_captures
#> 1 2017-01-01 - 2017-12-31 2018-01-01 - 2018-12-31        12            4
#>   recaptures skipped
#> 1          8       2

The returned data frame first defines the two time periods and than gives five values describing the detection of animals. total_cap gives the number of detected animal in season 2, new_captures corresponds to the number of new detection in season 2, recaptured to the number of animals detected in season 1 and in season 2 and skipped just the opposite, number of animals detected in season 1 but not in season 2.