Getting Started with pep725

Matthias Templ1

Barbara Templ2

2026-03-04

Vignette 1 of 4 for the pep725 R package. The most current version is available on GitHub and CRAN.

What is Phenology?

Phenology is the study of cyclic and seasonal natural phenomena, especially in relation to climate and plant life cycles. In plant science, phenology focuses on the timing of recurring biological events such as:

These phenological phases are strongly temperature-dependent and respond sensitively to interannual climate variability as well as long-term climate change. As a result, phenological records provide some of the most direct biological evidence of climate impacts on terrestrial ecosystems.

The PEP725 Database

The PEP725 Pan European Phenology Database is one of the world’s most comprehensive collections of plant phenological observations. It brings together long-term records from national phenological monitoring networks across Europe into a harmonized and openly accessible research infrastructure. Key characteristics of PEP725 include (Templ et al. 2018, 2026):

Observations in PEP725 are based on standardized phase definitions (largely aligned with BBCH-type schemes) and consistent observation protocols within national networks. This standardization enables comparative analyses across regions, species, and time periods, thus addresses key data integration and reusability challenges commonly encountered in large, heterogeneous phenological datasets (Templ 2025).

Why Use the pep725 R Package?

Working with phenological data presents several challenges:

  1. Data volume: Millions of observations require efficient data handling
  2. Data quality: Observations may contain errors or outliers
  3. Standardization: Different countries use different formats
  4. Analysis complexity: Calculating trends, normals, and anomalies requires specialized methods

The pep725 package addresses these challenges by providing:

Installation

Before you begin, install the package:

# Install from CRAN (when available)
install.packages("pep725")

# Or install development version from GitHub
# First install devtools if you don't have it:
# install.packages("devtools")
devtools::install_github("matthias-da/pep725")

Loading the Package

Once installed, load the package at the start of your R session:

library(pep725)

When the package loads, you’ll see a startup message with tips on how to get data.

Getting Phenological Data

The pep725 package offers five ways to obtain data. Understanding which option to use is important for your workflow:

Option Best For Data Size Internet Required?
1. Download synthetic data Learning, tutorials, reproducible examples ~7.24M rows First download only
2. Seed dataset Quick offline tests, minimal examples ~1,300 rows No
3. Generate synthetic data Creating shareable versions of your data Variable No
4. Import real PEP725 data Actual research and publications ~7.24M rows For initial download
5. Use your own phenological data Regional networks, field trials, citizen science datasets Variable No

Why Synthetic Data?

You might wonder why we use synthetic data instead of real observations. The original PEP725 database has usage restrictions that prevent redistribution - you must register on the pep725.eu website and accept the data use policy. This creates a problem for tutorials and examples that need to be reproducible.

Synthetic data solves this problem. It is generated to preserve the statistical properties of real data:

This means you can learn all the analysis techniques using synthetic data, then apply them confidently to real data when you’re ready for actual research.

Option 2: Use the Small Seed Dataset

For quick tests when you don’t need the full dataset, use the included seed data:

# Load the included seed dataset (small, for quick tests only)
data(pep_seed)

# This is a small subset (~1,300 rows) useful for:
# - Testing code quickly
# - Working offline
# - Running examples in documentation

When to use this: If you’re developing code and want fast iteration without waiting for the full dataset to load, the seed dataset is perfect. However, it’s too small for meaningful statistical analyses.

Option 3: Generate Your Own Synthetic Data

If you have real PEP725 data and want to create a shareable synthetic version (e.g., for teaching or supplementary materials), use pep_simulate():

# Generate synthetic data based on your real data
# This preserves statistical properties while anonymizing observations
pep_synthetic <- pep_simulate(your_real_pep_data)

# You can then share pep_synthetic freely

Use case: You’ve done an analysis with real data for a paper, and you want to provide reproducible examples in supplementary materials without violating data sharing restrictions.

Option 4: Import Real PEP725 Data (For Research)

For actual research and publications, you’ll want real data. Here’s the process:

  1. Register at pep725.eu (free for research)
  2. Download data for your species/regions of interest
  3. Import using pep_import()
# Import from downloaded PEP725 files
# Point to the folder containing your downloaded CSV files
pep <- pep_import("path/to/pep725_data/")

The pep_import() function automatically cleans and formats the data:

Option 5: Use Your Own Plant Phenological Data

Although pep725 is designed around the structure and conventions of the PEP725 database, it is not limited to PEP725 data.

If your dataset consists of station-based phenological observations with:

you can use most of the functionality in pep725 with minimal adaptation.

The key requirement is a PEP725-like tabular structure — one row per observation. This structure can usually be achieved with straightforward preprocessing (e.g., renaming columns and ensuring consistent DOY formatting).

Once your data has the required columns, convert it to a pep object:

# Prepare your data with the required columns
my_data <- data.frame(
  s_id = c("MY001", "MY001", "MY002"),
  lon = c(10.1, 10.1, 11.3),
  lat = c(47.5, 47.5, 48.1),
  genus = c("Malus", "Malus", "Malus"),
  species = c("Malus domestica", "Malus domestica", "Malus domestica"),
  phase_id = c(60, 65, 60),
  year = c(2020, 2020, 2020),
  day = c(110, 125, 108)
)

# Convert to a pep object — validates required columns automatically
my_pep <- as.pep(my_data)

Important Note for Research

All examples in these vignettes use synthetic data. The patterns you see are realistic but not actual scientific observations.

For publishable research, always use real PEP725 data! After registering at pep725.eu, simply replace pep_download() with pep_import("your/data/path/"). All the analysis techniques work identically.

Understanding the Data Structure

Now that we have data loaded, let’s understand what we’re working with. The pep object is a special type of data frame optimized for large datasets.

# View the data - the print method shows a summary
print(pep)
#> PEP725 Phenological Data
#> --------------------------------------------------
#> Observations: 7,240,883
#> Stations:     10,927
#> Species:      41
#> Phases:       73 (BBCH codes: 0, 1, 3, 7, 9, ...)
#> Years:        1775 - 2025
#> Extent:       lon [-10.24, 48.35], lat [14.44, 68.44]
#> Countries:    34 (Austria, Belgium, Bosnia and Herz., Bulgaria, Croatia, ...)
#> --------------------------------------------------
#> First 5 rows:
#>      s_id     lon     lat  genus        species phase_id  year   day
#>    <fctr>   <num>   <num> <fctr>         <fctr>    <int> <int> <int>
#> 1:  26675 16.3264 48.3036  Vitis Vitis vinifera      100  1775   297
#> 2:  26675 16.3264 48.3036  Vitis Vitis vinifera      100  1776   281
#> 3:  26675 16.3264 48.3036  Vitis Vitis vinifera      100  1777   288
#> 4:  26675 16.3264 48.3036  Vitis Vitis vinifera      100  1778   290
#> 5:  26675 16.3264 48.3036  Vitis Vitis vinifera      100  1779   288

Let’s get a quick visual overview. The plot() method for pep objects provides three built-in views — a station map, a time series, and a DOY histogram:

# Where are the stations?
plot(pep, type = "map")


# How are phenological events distributed across the year?
plot(pep, type = "histogram")

What You See in the Output

The output shows you:

Key Columns Explained

Each row in the dataset represents one phenological observation - a record of when a specific plant at a specific location reached a particular developmental stage.

Column Description Example
s_id Station identifier - A unique code for each monitoring location “AT0001”
lon, lat Geographic coordinates - Longitude and latitude in decimal degrees 15.5, 48.2
alt Altitude - Elevation above sea level in meters 350
genus Plant genus - The genus name (first part of scientific name) “Malus”
species Full species name - Genus + specific epithet “Malus domestica”
phase_id BBCH code - Standardized phenological stage (explained below) 60
year Observation year 2015
day Day of year (DOY) - The day when the phase was observed (1-365) 145
country Country name “Austria”

Understanding Day of Year (DOY)

Phenological data uses Day of Year (DOY) rather than calendar dates. This simplifies calculations and comparisons:

Why DOY? It makes it easy to calculate things like “flowering advanced by 10 days” without dealing with month boundaries.

The pep Class

The package provides a custom class system that extends data.table with phenology-specific methods. Let’s verify what class our data has:

# Check the class hierarchy
class(pep)
#> [1] "pep"        "data.table" "data.frame"

You’ll see three classes:

  1. pep - Our custom class with phenology-specific methods
  2. data.table - Fast data manipulation (from the data.table package)
  3. data.frame - Base R compatibility

This means you can use:

The Summary Method

The summary() function is your first tool for understanding a phenological dataset. It provides different views depending on the by parameter:

# Default summary - shows top species by observation count
summary(pep)
#> PEP725 Phenological Data Summary
#> ==================================================
#> 
#> OVERVIEW
#>   Total observations: 7,240,883
#>   Unique stations:    10,927
#>   Year range:         1775 - 2025
#>   DOY range:          -210 - 578
#> 
#> TOP 10 SPECIES (by observation count)
#>        genus           species  n_obs n_stations year_min year_max median_doy
#>       <fctr>            <fctr>  <int>      <int>    <int>    <int>      <int>
#>  1:  Hordeum   Hordeum vulgare 930924       7996     1870     2025        175
#>  2:    Malus   Malus domestica 899311       8403     1896     2025        140
#>  3: Triticum Triticum aestivum 668330       7504     1870     2024        220
#>  4:    Avena      Avena sativa 642098       7737     1870     2025        150
#>  5:   Secale    Secale cereale 605981       7837     1870     2025        213
#>  6:  Solanum Solanum tuberosum 593638       7866     1870     2025        176
#>  7:   Prunus    Prunus cerasus 483615       7083     1873     2025        130
#>  8:   Prunus  Prunus domestica 429971       7617     1868     2025        134
#>  9:      Zea          Zea mays 417009       5692     1941     2025        193
#> 10:     Beta     Beta vulgaris 403413       6219     1926     2024        130

What this tells you: Which species have the most data, how many stations observe each species, and the year range of observations.

# Summary by phenological phase
summary(pep, by = "phase")
#> PEP725 Phenological Data Summary
#> ==================================================
#> 
#> OVERVIEW
#>   Total observations: 7,240,883
#>   Unique stations:    10,927
#>   Year range:         1775 - 2025
#>   DOY range:          -210 - 578
#> 
#> OBSERVATIONS BY BBCH PHASE
#>     phase_id                                 phase_name  n_obs n_species
#>        <int>                                     <char>  <int>     <int>
#>  1:        0                        Dry seed / Dormancy 751405        21
#>  2:        1                            Seed imbibition  18041         8
#>  3:        3                                      Other   1099         6
#>  4:        7                         Coleoptile emerged  17758         9
#>  5:        9                                  Emergence    230         8
#>  6:       10              First leaf through coleoptile 899415        19
#>  7:       11                        First leaf unfolded  22210        22
#>  8:       12                          2 leaves unfolded     61         5
#>  9:       13                          3 leaves unfolded    328         2
#> 10:       14                                      Other    142         1
#> 11:       15                                True leaves   3382        11
#> 12:       16                                      Other      1         1
#> 13:       17                                      Other      2         1
#> 14:       19                         9+ leaves unfolded     45         6
#> 15:       21                     Beginning of tillering      1         1
#> 16:       31                      First node detectable 414453         9
#> 17:       33                                      Other      1         1
#> 18:       35                                      Other   1947         2
#> 19:       48                                      Other     41         1
#> 20:       49                                  Late boot    120         1
#> 21:       51                       Beginning of heading 417161         8
#> 22:       55                          Middle of heading  10054         6
#> 23:       56                                      Other      3         1
#> 24:       57                                      Other     79         2
#> 25:       59                             End of heading  12005         9
#> 26:       60  Beginning of flowering / Heading complete 834172        33
#> 27:       61                     Beginning of flowering  80103        29
#> 28:       63                                      Other    628        15
#> 29:       65                  Full flowering / Anthesis 475922        33
#> 30:       67                                      Other    207        15
#> 31:       69                           End of flowering 442901        20
#> 32:       75                                Medium milk  29851         5
#> 33:       77                                  Late milk      1         1
#> 34:       80                          Dough development    115         1
#> 35:       81 Beginning of ripening or fruit colouration   9340        20
#> 36:       83                                Early dough    414         2
#> 37:       85                                 Soft dough 299417        14
#> 38:       87                                 Hard dough 894688        22
#> 39:       89                                 Fully ripe  15497        18
#> 40:       91                                      Other    660        10
#> 41:       93                                      Other   1070        12
#> 42:       95                       50% of leaves fallen  93066        20
#> 43:       97                                 Plant dead   2677        13
#> 44:      100                                    Harvest 923474        34
#> 45:      109                                      Other  16802         8
#> 46:      111                       First cut for silage 241233         3
#> 47:      131                          First cut for hay 127007         1
#> 48:      135                                      Other    253         1
#> 49:      139                                      Other    235         1
#> 50:      140                                      Other   7638         1
#> 51:      151  Start of harvest for silage (corn, grass)  60999         1
#> 52:      161                                      Other   4796         1
#> 53:      182                                      Other  35771         1
#> 54:      200                                      Other    103         4
#> 55:      201                                      Other    975        12
#> 56:      202                                      Other      2         1
#> 57:      203                                      Other   2190         6
#> 58:      204                                      Other     35         1
#> 59:      205                     Autumn coloring >= 50%  53709        17
#> 60:      206                                      Other      1         1
#> 61:      207                                      Other      3         1
#> 62:      209                                      Other   2412         7
#> 63:      210                                      Other     27         2
#> 64:      213                                      Other     51         4
#> 65:      223                                      Other   1166         3
#> 66:      250                                      Other   5960         1
#> 67:      380                                      Other   1982        16
#> 68:      381                                      Other    222        11
#> 69:      383                                      Other      1         1
#> 70:      385                                      Other   3119         9
#> 71:      386                                      Other      2         1
#> 72:      387                                      Other      1         1
#> 73:      389                                      Other      1         1
#>     phase_id                                 phase_name  n_obs n_species
#>        <int>                                     <char>  <int>     <int>
#>     median_doy iqr_doy
#>          <int>   <int>
#>  1:        127     170
#>  2:        138      38
#>  3:        115      60
#>  4:        117      18
#>  5:        117      37
#>  6:        141     157
#>  7:        118      22
#>  8:        107      39
#>  9:        165     185
#> 10:        106     178
#> 11:        111      27
#> 12:        123       0
#> 13:        126       3
#> 14:        140      19
#> 15:        130       0
#> 16:        129      29
#> 17:        130       0
#> 18:        163      20
#> 19:        207      28
#> 20:        216      52
#> 21:        161      25
#> 22:        160      21
#> 23:         69       4
#> 24:         96      11
#> 25:        182      29
#> 26:        124      30
#> 27:        157      21
#> 28:        126     105
#> 29:        124      18
#> 30:        113      45
#> 31:        132      17
#> 32:        228      17
#> 33:        109       0
#> 34:        278      20
#> 35:        239      28
#> 36:        214      16
#> 37:        206      27
#> 38:        234      64
#> 39:        217      28
#> 40:        274      38
#> 41:        292      26
#> 42:        301      18
#> 43:        314      16
#> 44:        226      49
#> 45:        242      35
#> 46:        171     101
#> 47:        155      15
#> 48:        169      14
#> 49:        181      17
#> 50:        232      21
#> 51:        269      20
#> 52:        270      22
#> 53:         81      22
#> 54:        263      24
#> 55:        281      26
#> 56:        296       5
#> 57:        286      15
#> 58:        282      24
#> 59:        286      20
#> 60:        301       0
#> 61:        320       6
#> 62:        295      15
#> 63:        273      20
#> 64:        285      19
#> 65:        120      14
#> 66:         85      25
#> 67:        222      84
#> 68:        263     112
#> 69:        301       0
#> 70:        258      69
#> 71:        276      15
#> 72:        309       0
#> 73:        309       0
#>     median_doy iqr_doy
#>          <int>   <int>

What this tells you: Which developmental stages are recorded, how many observations of each, and the typical DOY (useful for understanding the seasonal distribution of your data).

# Summary by country (showing top 5)
summary(pep, by = "country", top = 5)
#> PEP725 Phenological Data Summary
#> ==================================================
#> 
#> OVERVIEW
#>   Total observations: 7,240,883
#>   Unique stations:    10,927
#>   Year range:         1775 - 2025
#>   DOY range:          -210 - 578
#> 
#> TOP 5 COUNTRIES (by observation count)
#>          country   n_obs n_stations n_species year_min year_max
#>           <char>   <int>      <int>     <int>    <int>    <int>
#> 1: Germany-North 4181984       4685        17     1951     2023
#> 2: Germany-South 2333531       1987        18     1930     2023
#> 3:       Austria  366048       1434        22     1775     2025
#> 4:        Sweden   82331        908        22     1870     2024
#> 5:      Slovakia   53052         76        15     1950     2024

What this tells you: Geographic coverage - which countries contribute most data, how many stations they have, and species diversity.

# Temporal coverage summary
summary(pep, by = "year")
#> PEP725 Phenological Data Summary
#> ==================================================
#> 
#> OVERVIEW
#>   Total observations: 7,240,883
#>   Unique stations:    10,927
#>   Year range:         1775 - 2025
#>   DOY range:          -210 - 578
#> 
#> TEMPORAL COVERAGE
#>   Years with data:    247
#>   Median obs/year:    1,243
#>   Peak year:          1968 (151,997 obs)
#> 
#>   Last 10 years:
#>      year n_obs n_stations n_species
#>     <int> <int>      <int>     <int>
#>  1:  2016 47812       1654        32
#>  2:  2017 43068       1571        29
#>  3:  2018 41041       1536        33
#>  4:  2019 42209       1522        29
#>  5:  2020 40181       1500        32
#>  6:  2021 39085       1393        31
#>  7:  2022 36813       1376        28
#>  8:  2023 35133       1351        28
#>  9:  2024  4315        369        27
#> 10:  2025   649         85        19

What this tells you: How observations are distributed across years - are there gaps? Is coverage increasing or decreasing over time?

Subsetting the Data

You’ll often want to focus on specific subsets of the data. The package preserves the pep class when you subset, so all methods continue to work:

# Filter to apple data using data.table syntax
# The syntax is: dataset[row_filter, column_selection]
apple <- pep[species == "Malus domestica"]
print(apple)
#> PEP725 Phenological Data
#> --------------------------------------------------
#> Observations: 899,311
#> Stations:     8,403
#> Species:      1
#> Phases:       31 (BBCH codes: 0, 7, 10, 11, 15, ...)
#> Years:        1896 - 2025
#> Extent:       lon [-8.37, 48.35], lat [14.44, 66.90]
#> Countries:    24 (Austria, Bosnia and Herz., Croatia, Czechia, France, ...)
#> --------------------------------------------------
#> First 5 rows:
#>      s_id   lon   lat  genus         species phase_id  year   day
#>    <fctr> <num> <num> <fctr>          <fctr>    <int> <int> <int>
#> 1:  29529  2.49  48.8  Malus Malus domestica       65  1896   103
#> 2:  29529  2.49  48.8  Malus Malus domestica       65  1897   118
#> 3:  29529  2.49  48.8  Malus Malus domestica       65  1898   118
#> 4:  29529  2.49  48.8  Malus Malus domestica       65  1900   130
#> 5:  29529  2.49  48.8  Malus Malus domestica       65  1901   116

# Filter by year range
recent <- pep[year >= 2000 & year <= 2015]
print(recent)
#> PEP725 Phenological Data
#> --------------------------------------------------
#> Observations: 985,849
#> Stations:     3,576
#> Species:      33
#> Phases:       56 (BBCH codes: 0, 3, 7, 9, 10, ...)
#> Years:        2000 - 2015
#> Extent:       lon [-10.24, 28.40], lat [37.22, 64.55]
#> Countries:    28 (Austria, Belgium, Bosnia and Herz., Bulgaria, Croatia, ...)
#> --------------------------------------------------
#> First 5 rows:
#>      s_id      lon     lat      genus        species phase_id  year   day
#>    <fctr>    <num>   <num>     <fctr>         <fctr>    <int> <int> <int>
#> 1:    136  9.18333 54.2500 perm_grass           <NA>      182  2000     5
#> 2:   6047 12.10000 50.8333 perm_grass           <NA>      182  2000     9
#> 3:   1304  7.10000 51.1000 perm_grass           <NA>      182  2000    10
#> 4:   6532 14.08330 47.4833     Secale Secale cereale        0  2000    15
#> 5:    206  9.66667 54.3167 perm_grass           <NA>      182  2000    17

# The summary method works on subsets too
summary(recent)
#> PEP725 Phenological Data Summary
#> ==================================================
#> 
#> OVERVIEW
#>   Total observations: 985,849
#>   Unique stations:    3,576
#>   Year range:         2000 - 2015
#>   DOY range:          -128 - 523
#> 
#> TOP 10 SPECIES (by observation count)
#>          genus           species  n_obs n_stations year_min year_max median_doy
#>         <fctr>            <fctr>  <int>      <int>    <int>    <int>      <int>
#>  1:      Malus   Malus domestica 179208       2734     2000     2015        133
#>  2:        Zea          Zea mays 107126       1829     2000     2015        198
#>  3:    Hordeum   Hordeum vulgare  95636       1906     2000     2015        181
#>  4:   Triticum Triticum aestivum  90484       1848     2000     2015        207
#>  5:     Prunus      Prunus avium  86023       2820     2000     2015        122
#>  6:     Secale    Secale cereale  68568       1491     2000     2015        202
#>  7:      Avena      Avena sativa  67242       1558     2000     2015        150
#>  8:      Pyrus    Pyrus communis  65482       2063     2000     2015        121
#>  9:     Prunus    Prunus cerasus  61866       1835     2000     2015        123
#> 10: perm_grass              <NA>  50086       2328     2000     2015        135

Tip for beginners: The data.table syntax pep[condition] is similar to base R’s subset() function but much faster for large datasets. You can combine multiple conditions with & (and) or | (or).

Understanding BBCH Codes

The BBCH scale [from Biologische Bundesanstalt, Bundessortenamt und CHemische Industrie; Meier (2018)] is a standardized system for describing plant developmental stages. It’s used worldwide, ensuring that “flowering” means the same thing whether observed in Germany or Spain.

How BBCH Codes Work

BBCH codes are two-digit numbers where:

First Digit Principal Growth Stage
0 Germination / sprouting
1 Leaf development
2 Formation of side shoots / tillering
3 Stem elongation
4 Development of harvestable vegetative parts
5 Inflorescence emergence / heading
6 Flowering
7 Development of fruit
8 Ripening of fruit and seed
9 Senescence / dormancy

Looking Up BBCH Codes

The bbch_description() function translates codes to human-readable descriptions:

# Get descriptions for BBCH codes present in your data
codes <- unique(pep$phase_id)
bbch_description(codes)
#>    phase_id                                description
#> 1         0                        Dry seed / Dormancy
#> 2         1                            Seed imbibition
#> 3         7                         Coleoptile emerged
#> 4         9                                  Emergence
#> 5        10              First leaf through coleoptile
#> 6        11                        First leaf unfolded
#> 7        12                          2 leaves unfolded
#> 8        13                          3 leaves unfolded
#> 9        15                                True leaves
#> 10       19                         9+ leaves unfolded
#> 11       21                     Beginning of tillering
#> 12       31                      First node detectable
#> 13       49                                  Late boot
#> 14       51                       Beginning of heading
#> 15       55                          Middle of heading
#> 16       59                             End of heading
#> 17       60  Beginning of flowering / Heading complete
#> 18       61                     Beginning of flowering
#> 19       65                  Full flowering / Anthesis
#> 20       69                           End of flowering
#> 21       75                                Medium milk
#> 22       77                                  Late milk
#> 23       80                          Dough development
#> 24       81 Beginning of ripening or fruit colouration
#> 25       83                                Early dough
#> 26       85                                 Soft dough
#> 27       87                                 Hard dough
#> 28       89                                 Fully ripe
#> 29       95                       50% of leaves fallen
#> 30       97                                 Plant dead
#> 31      100                                    Harvest
#> 32      111                       First cut for silage
#> 33      131                          First cut for hay
#> 34      151  Start of harvest for silage (corn, grass)
#> 35      205                     Autumn coloring >= 50%

Commonly Used BBCH Codes in Phenological Research

Code Stage What to Look For
10 First leaves Cotyledons or first true leaves visible
11 First leaf unfolded First true leaf fully unfolded
60 First flowers open Beginning of flowering (anthesis)
65 Full flowering 50% of flowers open
69 End of flowering All petals fallen
85 Fruit coloring Softening begins
87 Fruit ripe Ready for harvest
89 Full ripeness Fruit fully mature
91 Shoot growth complete Leaves fully colored
95 Leaf fall 50% of leaves fallen

Filtering by Phenological Phase

When analyzing phenology, you typically focus on specific phenophases. Here’s how:

# Get all flowering observations (BBCH 60-69)
flowering <- pep[phase_id >= 60 & phase_id <= 69]
cat("Flowering observations:", nrow(flowering), "\n")
#> Flowering observations: 1833933

# Get only full flowering (BBCH 65)
full_flowering <- pep[phase_id == 65]
cat("Full flowering observations:", nrow(full_flowering), "\n")
#> Full flowering observations: 475922

# Get harvest-related observations
harvest <- pep[phase_id %in% c(87, 89)]
cat("Harvest observations:", nrow(harvest), "\n")
#> Harvest observations: 910185

Exploring Data Coverage

Before diving into analysis, it’s crucial to understand what your dataset contains. The pep_coverage() function provides a comprehensive overview.

Full Coverage Report

# Get a complete coverage report
pep_coverage(pep)
#> PEP725 Data Coverage Report
#> ==================================================
#> 
#> TEMPORAL COVERAGE
#> ------------------------------
#>   Year range:      1775 - 2025 (247 years)
#>   Missing years:   4 gaps
#>                    (1786, 1805, 1820, 1829)
#>   Median obs/year: 1,243
#>   Peak year:       1968 (151,997 obs)
#> 
#> GEOGRAPHICAL COVERAGE
#> ------------------------------
#>   Countries:       35
#>   Stations:        10,927
#>   Longitude:       -10.24 to 48.35
#>   Latitude:        14.44 to 68.44
#>   Altitude:        0 to 1933 m
#> 
#>   Top countries:
#>           country   n_obs n_stations n_species
#>            <char>   <int>      <int>     <int>
#>  1: Germany-North 4181984       4685        17
#>  2: Germany-South 2333531       1987        18
#>  3:       Austria  366048       1434        22
#>  4:        Sweden   82331        908        22
#>  5:      Slovakia   53052         76        15
#>  6:   Switzerland   51620        186        19
#>  7:          <NA>   32117         98        22
#>  8:        France   20487        706        22
#>  9:       Croatia   20130         10        15
#> 10:       Czechia   20008        146        14
#> 
#> SPECIES COVERAGE
#> ------------------------------
#>   Genera:          30
#>   Species:         41
#>   Subspecies:      129
#> 
#>   Top genera:
#>        genus   n_obs n_species n_stations
#>       <fctr>   <int>     <int>      <int>
#>  1:   Prunus 1357252         7      10012
#>  2:  Hordeum  930924         1       7996
#>  3:    Malus  899311         1       8403
#>  4: Triticum  668330         1       7504
#>  5:    Avena  642098         1       7737
#>  6:   Secale  605981         1       7837
#>  7:  Solanum  593638         1       7866
#>  8:      Zea  417009         1       5692
#>  9:     Beta  403413         1       6219
#> 10:    Pyrus  308320         1       7749
#> 
#>   Top species:
#>        genus           species  n_obs n_stations year_min year_max
#>       <fctr>            <fctr>  <int>      <int>    <int>    <int>
#>  1:  Hordeum   Hordeum vulgare 930924       7996     1870     2025
#>  2:    Malus   Malus domestica 899311       8403     1896     2025
#>  3: Triticum Triticum aestivum 668330       7504     1870     2024
#>  4:    Avena      Avena sativa 642098       7737     1870     2025
#>  5:   Secale    Secale cereale 605981       7837     1870     2025
#>  6:  Solanum Solanum tuberosum 593638       7866     1870     2025
#>  7:   Prunus    Prunus cerasus 483615       7083     1873     2025
#>  8:   Prunus  Prunus domestica 429971       7617     1868     2025
#>  9:      Zea          Zea mays 417009       5692     1941     2025
#> 10:     Beta     Beta vulgaris 403413       6219     1926     2024

This report shows you three dimensions of coverage:

  1. Temporal: What years are covered? Any gaps?
  2. Geographical: Which countries/regions? How many stations?
  3. Species: Which plants are observed? How many observations each?

Focused Coverage Analysis

You can focus on specific aspects and optionally generate plots:

# Temporal coverage with a visualization
pep_coverage(pep, kind = "temporal", plot = TRUE)

#> PEP725 Data Coverage Report
#> ==================================================
#> 
#> TEMPORAL COVERAGE
#> ------------------------------
#>   Year range:      1775 - 2025 (247 years)
#>   Missing years:   4 gaps
#>                    (1786, 1805, 1820, 1829)
#>   Median obs/year: 1,243
#>   Peak year:       1968 (151,997 obs)

The temporal plot shows observation density over time. Look for:

# Geographical coverage - which countries have most data
pep_coverage(pep, kind = "geographical", top = 5)
#> PEP725 Data Coverage Report
#> ==================================================
#> 
#> GEOGRAPHICAL COVERAGE
#> ------------------------------
#>   Countries:       35
#>   Stations:        10,927
#>   Longitude:       -10.24 to 48.35
#>   Latitude:        14.44 to 68.44
#>   Altitude:        0 to 1933 m
#> 
#>   Top countries:
#>          country   n_obs n_stations n_species
#>           <char>   <int>      <int>     <int>
#> 1: Germany-North 4181984       4685        17
#> 2: Germany-South 2333531       1987        18
#> 3:       Austria  366048       1434        22
#> 4:        Sweden   82331        908        22
#> 5:      Slovakia   53052         76        15

Geographic coverage tells you where your data comes from. This is important because:

# Species coverage - which species are best represented
pep_coverage(pep, kind = "species", top = 5)
#> PEP725 Data Coverage Report
#> ==================================================
#> 
#> SPECIES COVERAGE
#> ------------------------------
#>   Genera:          30
#>   Species:         41
#>   Subspecies:      129
#> 
#>   Top genera:
#>       genus   n_obs n_species n_stations
#>      <fctr>   <int>     <int>      <int>
#> 1:   Prunus 1357252         7      10012
#> 2:  Hordeum  930924         1       7996
#> 3:    Malus  899311         1       8403
#> 4: Triticum  668330         1       7504
#> 5:    Avena  642098         1       7737
#> 
#>   Top species:
#>       genus           species  n_obs n_stations year_min year_max
#>      <fctr>            <fctr>  <int>      <int>    <int>    <int>
#> 1:  Hordeum   Hordeum vulgare 930924       7996     1870     2025
#> 2:    Malus   Malus domestica 899311       8403     1896     2025
#> 3: Triticum Triticum aestivum 668330       7504     1870     2024
#> 4:    Avena      Avena sativa 642098       7737     1870     2025
#> 5:   Secale    Secale cereale 605981       7837     1870     2025

Species coverage shows which plants have enough data for robust analyses. As a rule of thumb:

Coverage by Groups

You can break down coverage by grouping variables:

# Temporal coverage broken down by country
cov_by_country <- pep_coverage(pep, kind = "temporal", by = "country")
# Show observations by group
cov_by_country$temporal$obs_by_group
#>              country   n_obs n_stations year_min year_max n_years
#>               <char>   <int>      <int>    <int>    <int>   <int>
#>  1:    Germany-North 4181984       4685     1951     2023      73
#>  2:    Germany-South 2333531       1987     1930     2023      80
#>  3:          Austria  366048       1434     1775     2025     193
#>  4:           Sweden   82331        908     1870     2024     115
#>  5:         Slovakia   53052         76     1950     2024      55
#>  6:      Switzerland   51620        186     1951     2024      74
#>  7:             <NA>   32117         98     1873     2025     121
#>  8:           France   20487        706     1872     2025     151
#>  9:          Croatia   20130         10     1961     2024      63
#> 10:          Czechia   20008        146     1930     2023      76
#> 11:         Slovenia   11507         15     1961     2020      60
#> 12:        Lithuania   11449         28     1960     2004      45
#> 13:           Poland    9461         14     1951     2024      74
#> 14:       Montenegro    9386          7     1951     2021      63
#> 15:            Spain    7446         79     1946     2025      77
#> 16:           Latvia    6595        102     1962     2018      52
#> 17: Bosnia and Herz.    5278          7     1971     2024      42
#> 18:          Hungary    3757          9     1947     2024      67
#> 19:          Romania    3700         70     1982     2005      24
#> 20:      Netherlands    3195        246     1868     2009     109
#> 21:           Russia    2395         33     1873     2019      68
#> 22:          Belgium    1860         64     1944     2020      62
#> 23:            Italy     650         12     1984     2024      40
#> 24:       Luxembourg     639          2     1955     2004      47
#> 25:           Norway     637          5     1927     2005      79
#> 26:          Ireland     453          5     1968     2024      57
#> 27:         Bulgaria     385          8     1973     2005      33
#> 28:    Liechtenstein     309          1     1973     2024      49
#> 29:           Serbia     294          6     1971     2002      29
#> 30:          Denmark     133          4     1972     2000      29
#> 31:   United Kingdom      23          1     1971     1981      11
#> 32:         Portugal      12          1     1973     1981       8
#> 33:           Greece       6          2     1971     1999       2
#> 34:          Finland       4          1     1976     1984       3
#> 35:            Yemen       1          1     2025     2025       1
#>              country   n_obs n_stations year_min year_max n_years
#>               <char>   <int>      <int>    <int>    <int>   <int>

This helps identify which countries have continuous records versus sporadic observations.

Common Pitfalls and Tips

As you start working with phenological data, keep these points in mind:

Data Quality Matters

Species Considerations

Next Steps

Now that you understand the basics, explore the other vignettes for more advanced analyses:

Phenological Analysis Vignette

Learn to calculate phenological normals (long-term averages), detect anomalies (unusual years), and analyze trends using:

vignette("phenological-analysis", package = "pep725")

Spatial Phenological Patterns Vignette

Analyze how phenology varies across landscapes:

vignette("spatial-patterns", package = "pep725")

Data Quality Assessment Vignette

Ensure your data is reliable and investigate unusual observations:

vignette("data-quality", package = "pep725")

Getting Help

If you encounter problems:

  1. Check the function documentation: ?function_name (e.g., ?pheno_normals)
  2. Read the vignettes: They contain worked examples
  3. Report issues: github.com/matthias-da/pep725/issues

Session Info

For reproducibility, here’s the R environment used for this vignette:

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.0.1
#> 
#> Matrix products: default
#> BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Zurich
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] pep725_1.0.0        ggplot2_4.0.1       data.table_1.18.2.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.10         generics_0.1.4      tidyr_1.3.2        
#>  [4] class_7.3-23        robustbase_0.99-7   KernSmooth_2.23-26 
#>  [7] lattice_0.22-7      digest_0.6.39       magrittr_2.0.4     
#> [10] evaluate_1.0.5      grid_4.5.2          RColorBrewer_1.1-3 
#> [13] rnaturalearth_1.2.0 fastmap_1.2.0       Matrix_1.7-4       
#> [16] jsonlite_2.0.0      e1071_1.7-17        DBI_1.2.3          
#> [19] mgcv_1.9-3          purrr_1.2.1         viridisLite_0.4.2  
#> [22] scales_1.4.0        jquerylib_0.1.4     cli_3.6.5          
#> [25] rlang_1.1.7         units_1.0-0         splines_4.5.2      
#> [28] withr_3.0.2         cachem_1.1.0        yaml_2.3.12        
#> [31] otel_0.2.0          tools_4.5.2         dplyr_1.2.0        
#> [34] vctrs_0.7.1         R6_2.6.1            proxy_0.4-29       
#> [37] lifecycle_1.0.5     classInt_0.4-11     pkgconfig_2.0.3    
#> [40] pillar_1.11.1       bslib_0.10.0        gtable_0.3.6       
#> [43] glue_1.8.0          Rcpp_1.1.1          sf_1.0-24          
#> [46] DEoptimR_1.1-4      xfun_0.56           tibble_3.3.1       
#> [49] tidyselect_1.2.1    rstudioapi_0.18.0   knitr_1.51         
#> [52] farver_2.1.2        nlme_3.1-168        htmltools_0.5.9    
#> [55] patchwork_1.3.2     rmarkdown_2.30      labeling_0.4.3     
#> [58] compiler_4.5.2      S7_0.2.1

References

Meier, Uwe. 2018. Growth Stages of Mono- and Dicotyledonous Plants: BBCH Monograph. Quedlinburg, Germany: Open Agrar Repositorium. https://doi.org/10.5073/20180906-074619.
Templ, Barbara. 2025. “A Roadmap for Advancing Plant Phenological Studies Through Effective Open Research Data Management.” Ecological Informatics 87: 103109. https://doi.org/https://doi.org/10.1016/j.ecoinf.2025.103109.
Templ, Barbara, Elisabeth Koch, Kjell Bolmgren, Markus Ungersböck, Anita Paul, Helfried Scheifinger, This Rutishauser, et al. 2018. “Pan European Phenological Database (PEP725): A Single Point of Access for European Data.” International Journal of Biometeorology 62: 1109–13. https://doi.org/10.1007/s00484-018-1512-8.
Templ, Barbara, Helfried Scheifinger, Isabella Ostovary, Markus Ungersböck, and Hans Ressl. 2026. PEP725: 15 Years of Driving European and Global Phenology Science.” New Phytologist. https://doi.org/10.1111/nph.70869.

  1. University of Applied Sciences Northwestern Switzerland (FHNW)↩︎

  2. Swiss Federal Institute for Forest, Snow and Landscape Research (WSL)↩︎