4 - Detecting errors

Simon Garnier

Missing observations and recording errors are fairly common in tracking data. They can be caused by hardware failures, object occultation, faulty data writing, etc. trackdf provides a few functions to help detect this missing or erroneous data so that you can fix them or omit them altogether from your analysis.

But first, let’s load some “flawed” data provided with trackdf:

library(trackdf)
raw <- read.csv(system.file("extdata/gps/01.csv", package = "trackdf"))
tt <- track(x = raw$lon, y = raw$lat, t = paste(raw$date, raw$time), id = 1,  
            proj = "+proj=longlat", tz = "Africa/Windhoek")
## Warning: 1 failed to parse.
print(tt, max = 10 * ncol(tt))
## Track table [3599 observations]
## Number of tracks:  1 
## Dimensions:  2D 
## Geographic:  TRUE 
## Projection:  +proj=longlat 
## Table class:  data frame ('data.frame')
##    id                   t        x         y
## 1   1 2015-09-10 07:00:00 15.76468 -22.37957
## 2   1 2015-09-10 07:00:01 15.76468 -22.37957
## 3   1 2015-09-10 07:00:04 15.76468 -22.37958
## 4   1 2015-09-10 07:00:05 15.76468 -22.37958
## 5   1 2015-09-10 07:00:06       NA -22.37958
## 6   1 2015-09-10 07:00:07 15.76467        NA
## 7   1 2015-09-10 07:00:08 15.76467 -22.37959
## 8   1 2015-09-10 07:00:09 15.76467 -22.37959
## 9   1 2015-09-10 07:00:09 15.76467 -22.37959
## 10  1 2015-09-10 07:00:10 15.76467 -22.37959
##  [ reached 'max' / getOption("max.print") -- omitted 3589 rows ]

4.1 - Missing observations

These are observations that have not been recorded at all. If the data is recorded at regular intervals, then these missing observations can be easily detected using the missing_data function as follows:

missing <- missing_data(tt)
missing
## Track table [5 observations]
## Number of tracks:  1 
## Dimensions:  2D 
## Geographic:  TRUE 
## Projection:  +proj=longlat 
## Table class:  data frame ('data.frame')
##   id                   t        x         y
## 1  1 2015-09-10 07:00:02       NA        NA
## 2  1 2015-09-10 07:00:03       NA        NA
## 4  1 2015-09-10 07:00:06       NA -22.37958
## 5  1 2015-09-10 07:00:07 15.76467        NA
## 3  1 2015-09-10 07:00:34       NA        NA

The output is a track table with each row corresponding to a time stamp at which at least one coordinate is missing.

Note that you can specify the beginning (begin) and end (end) of the observation window in which you want to detect missing data, as well as the time difference (step) between successive observations.


4.2 - Duplicated observations

These are observations that are repeated multiple times throughout the data set (e.g., two observations with identical time stamps for a given individual). These duplicated observations can be detected using the duplicated_data function as follows:

dups <- duplicated_data(tt, type = "t")
dups
## Track table [1 observations]
## Number of tracks:  1 
## Dimensions:  2D 
## Geographic:  TRUE 
## Projection:  +proj=longlat 
## Table class:  data frame ('data.frame')
##   id                   t        x         y duplicate
## 8  1 2015-09-10 07:00:09 15.76467 -22.37959       txy

The output is a track table with each row corresponding to an observation that was partially or completely duplicated, depending on the type argument. This argument is a character string or a vector of character strings indicating the type of duplications to look for. The strings can be any combination of “t” (for time duplications) and “x”, “y”, “z” (for coordinate duplications). For instance, the string “txy” will return data with duplicated time stamps and duplicated x and y coordinates.


4.3 - Inconsistent observations

These are observations whose coordinates are too different from the surrounding (timewise) observations, for instance, because of sporadic errors in GPS recordings. These inconsistent observations can be detected using the duplicated_data function as follows:

inconsistent <- inconsistent_data(tt, s = 15)
inconsistent
## Track table [1 observations]
## Number of tracks:  1 
## Dimensions:  2D 
## Geographic:  TRUE 
## Projection:  +proj=longlat 
## Table class:  data frame ('data.frame')
##   id                   t        x        y
## 1  1 2015-09-10 07:00:24 15.86467 -22.4796

The output is a track table with each row corresponding to an inconsistent observation.

Note that the detection of inconsistencies requires specifying a threshold (s) for distinguishing between consistent and inconsistent observations. Higher threshold values will result in a lower number of detected inconsistencies, and reciprocally for lower threshold values.