---
title: "Introducing synopR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introducing synopR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
**2026-03-06**
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Standard workflow
`show_synop_data()` requires a character vector or a data frame column where each element is a SYNOP string.
 
```{r extraccion datos estandar}
library(synopR)
data_input_vector <- c("AAXX 04003 87736 32965 00000 10204 20106 39982 40074 5//// 333 10266 20158 555 64169 65090 =",
                       "AAXX 01094 87736 NIL=",
                       "AAXX 03183 87736 32965 12708 10254 20052 30005 40098 5//// 80005 333 56000 81270 =")

my_data <- show_synop_data(data_input_vector, wmo_identifier = '87736')

print(my_data)

```

If a meteorological parameter isn't present in any of the SYNOP messages, you can set `remove_empty_cols = TRUE` to remove the extra columns.

The optional `wmo_identifier` argument offers a significant advantage: it allows for automatic filtering in case the data contains messages from other stations.

While the following example uses a vector with only two messages for simplicity, if you are working with thousands of SYNOP strings from multiple stations, this built-in filtering becomes extremely convenient.


```{r ventaja wmo_identifier}
library(synopR)
# Messages from 87736 and 87016
mixed_synop <- c("AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818=",
                 "AAXX 04033 87016 41460 83208 10200 20194 39712 40114 50003 70292 888// 333 56699 82810 88615="
                 )

colorado_data <- show_synop_data(mixed_synop, wmo_identifier = '87736', remove_empty_cols = TRUE)
knitr::kable(t(colorado_data))
```

It is good practice to check the SYNOP messages for non-standard structures. The `check_synop()` function is designed to handle these. It will make sure every message starts with "AAXX" and ends with "=", does not contain invalid characters (valid characters after removing "AAXX" and "=" are digits 0-9, '/' and 'NIL'), and verifies that all groups consist of 5 digits (except for the section identifiers '333' and '555').

The `check_synop()` function accepts either a string vector or a specific data frame column containing SYNOP strings. A data frame with multiple columns —where the SYNOP column is not explicitly specified— will be accepted **if and only if that data frame is the direct output of** `parse_ogimet()`.

```{r check synops2, error = TRUE}
library(synopR)

my_df <- data.frame(syn = c("AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818=",
                            "AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818="),
                    second_column = c(5,7))

check_synop(my_df) # Bad

check_synop(my_df$syn) # Good

```

So far, our messages have a correct structure (even the NIL ones). Now, let’s see what happens when they don't.

```{r check synops3}
library(synopR)

check_synop(c("AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818=",
              "AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 6000182100 333 56600 82818=",
              "AAXX 01183 87736 12465 20000 10326 2021 39974 40064 5//// 60001 82100 333 56600 82818=",
              "AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818",
              "Not a synop message="))

```

`check_synop()` returns a tibble where the first column indicates whether each SYNOP is valid (TRUE) or not (FALSE), and the second column describes the specific error found.
In our example:

* The first SYNOP is correct.
* In the second, there is a missing space between groups 6 and 8 in Section 1.
* In the third, group 2 of Section 3 contains only 4 digits.
* The fourth message is missing the "=" terminator (remember that SYNOP messages must always start with "AAXX" and end with "=").
* The fifth is simply not a SYNOP string at all.

## Workflow with Ogimet
The following SYNOP messages were retrieved from [Ogimet](https://www.ogimet.com/cgi-bin/getsynop?block=87736&begin=202602010300&end=202602012300) for the Rio Colorado station, Argentina (WMO identifier: 87736). We will observe that these are not "pure" SYNOP strings; they include a prefix added by Ogimet that specifies the station ID (87736) along with the date and time of the observation.

However, this is not an issue, as we can use the `parse_ogimet()` function. This tool is specifically designed to separate these aggregates from the raw SYNOP message for processing.

```{r parse_ogimet}
library(synopR)

data_input <- data.frame(synops = c("87736,2026,02,01,03,00,AAXX 01034 87736 NIL=",
                                    "87736,2026,02,01,06,00,AAXX 01064 87736 NIL=",
                                    "87736,2026,02,01,09,00,AAXX 01094 87736 NIL=",
                                    "87736,2026,02,01,12,00,AAXX 01123 87736 12965 31808 10240 20210 39992 40082 5//// 60104 82075 333 10282 20216 56055 82360=",
                                    "87736,2026,02,01,15,00,AAXX 01154 87736 NIL=",
                                    "87736,2026,02,01,18,00,AAXX 01183 87736 12465 20000 10326 20215 39974 40064 5//// 60001 82100 333 56600 82818=",
                                    "87736,2026,02,01,21,00,AAXX 01214 87736 NIL="))

# Note that `parse_ogimet(data_input)` is incorrect
data_from_ogimet <- parse_ogimet(data_input$synops) 

print(data_from_ogimet)

# A 'Year' column is included!
parse_ogimet(data_input$synops) |> show_synop_data(wmo_identifier = 87736, remove_empty_cols = TRUE)
```

## Limitations
### General limitations

* The validity of a SYNOP string doesn't mean its content is correct. A quality-control of the derived data is not included. Data post-processing is on you.

* Group 555 (reserved for national distribution) is currently ignored, as its content varies by country. However, future versions of **synopR** may include functions to extract data from this section based on user requirements.

* There is no support for sections 222 y 444. `show_synop_data()` will incorrectly decode the message.

### Specific limitations
The following meteorological parameters are not completely decoded, as they will not produce a strictly numeric vector, or the output would be too long:

* Horizontal visibility `VV`
* Lowest cloud base height `h`
* Cloud cover `N` and `Nh`, **but** they can be directly interpreted as in oktas (octaves), except when it's 9, which means the sky is not visible due to fog or other meteorological phenomenon
* Present and past weather `ww`, `W1`, `W2`,  cloud-related `Cl`, `Cm` and `Ch`. ground-related `E` and `E'` 

However, **Code tables are available** in the section "Code Tables" for direct conversions!

You should also be aware of this:

* Wind direction = 99 means "variable wind direction"
* Wind speed greater than 99 units (m/s or knots) are not supported (the final result will be 99), but it's expected it won't break the function
* If group 2 from section 1 informs relative humidity instead of dew point, the final value in the Dew_point column will be NA
* For geopotential height, only pressure levels 850, 700 and 500 hPa are supported (others pressure levels will result in NA)
* Groups 5 and 9 from section 1 are ignored
* Imperceptible precipitation, codified as 990, is considered as 0.01 (mm), so it can be distinguished from a 0 value
* A cloud description "/" (clouds not visible) is mapped to 10
* Snow depth `sss` is assumed to be between 1 cm and 996 cm. '997' means 'less than 0.5 cm', 998 "Snow cover, not continuous" and 999 "Measurement impossible or inaccurate"
* Groups 5 (including 55, 56, 57, etc...), 7, 8 and 9 from section 3 are ignored