---
title: "Getting Started with sumer"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with sumer}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## 1. Introduction

The `sumer` package provides tools for working with Sumerian cuneiform texts. It supports conversion between different representations, dictionary lookup, and the creation of translation templates.

Sumerian texts today are typically available in **transliterated** form. Transliteration renders the pronunciation of the signs in Latin characters -- for example, the Sumerian word for king is transliterated as `lugal`. However, transliteration is actually irrelevant for translation, because the meaning of a sign depends on the sign itself. The only exception is when there is reason to believe that words with similar pronunciations also have similar meanings. But even then, it is possible to base dictionaries solely on the cuneiform characters.

Each cuneiform sign has three representations:

| Representation    | Example             | Description                                        |
|-------------------|---------------------|----------------------------------------------------|
| Transliteration   | `lugal`             | Phonetic transcription in lowercase letters        |
| Sign name         | `LUGAL`             | Canonical name in uppercase letters                |
| Cuneiform         | &#x12217;           | Unicode character (U+12000 to U+12500)             |

The package works internally primarily with cuneiform characters and sign names. Transliteration serves as a convenient input method from which the other forms are recovered. Many signs have multiple readings (transliterations) but only one sign name and one cuneiform character.

```{r}
library(sumer)
```

## 2. Cuneiform Signs

### Retrieving sign information

The function `info()` displays all available information about a sign or sign sequence: reading, sign name, cuneiform character, and alternative readings.

```{r}
info("lugal")
```

The output contains a table with one row per sign as well as the three representations of the text: syllables (transliteration), sign names, and cuneiform.

For compound expressions, all contained signs are analyzed:

```{r}
info("d-en-lil2")
```

Here, each individual sign (d, en, lil2) is shown with its sign name (AN, EN, KID) and its cuneiform character. The `alternatives` column lists all possible readings -- for instance, the sign EN can also be read as `ru12` or `uru16`.

### Conversion

Two functions are available for converting entire texts:

```{r}
# Transliteration -> Cuneiform
as.cuneiform("lugal-e")
as.cuneiform(c("d-en-lil2", "an-ki"))

# Transliteration -> Sign names
as.sign_name("lugal-e")
as.sign_name(c("d-en-lil2", "an-ki"))
```

Both functions accept character vectors and process each element individually. Within a word, hyphens (`-`) separate syllables; dots (`.`) separate sign names; spaces separate words.

**Note on display:** For cuneiform characters to display correctly, a font supporting the Unicode "Cuneiform" block (U+12000--U+12500) must be installed. In RStudio, the AGG graphics backend should also be enabled (Tools > Global Options > General > Graphics > Backend > AGG).


## 3. Dictionary Lookup

### Loading a dictionary

The package includes a built-in dictionary that is loaded with `read_dictionary()`:

```{r}
dic <- read_dictionary()
```

When loading, the metadata is displayed (author, version, URL for updates). The dictionary is a data frame with entries for cuneiform characters, readings, and translations.

### Forward lookup: Sumerian -> English

A Sumerian expression can be looked up with `look_up()`:

```{r}
look_up("lugal", dic)
```

The output shows:

- The sign name and cuneiform character
- All translations with frequency (count) and grammatical type
- Entries for contained individual signs and substrings

For compound expressions, all partial combinations are also looked up:

```{r}
look_up("d-suen", dic)
```

Here, translations are shown not only for the complete expression AN.EN.ZU but also for the substrings AN, EN, ZU, AN.EN, EN.ZU, etc. This gives an overview of the meanings of the individual components at a glance.

### Reverse lookup: English -> Sumerian

If you know the English term and are looking for the Sumerian sign, use the parameter `lang = "en"`:

```{r}
look_up("Enki", dic, "en")
```

The reverse lookup searches all translations in the dictionary and displays matching entries with their sign names and cuneiform characters.


## 4. Grammatical Types

Each dictionary entry has a **grammatical type** in addition to its translation. These types describe what function the sign can have in a sentence. Since the same sign can serve different functions depending on the context, it can have multiple entries with different types in the dictionary.

### Basic types

There are three basic types:

| Type    | Name        | Description                                     |
|---------|-------------|-------------------------------------------------|
| **S**   | Noun        | Noun phrases and substantives                   |
| **V**   | Verb        | Verbs and verbal expressions                    |
| **A**   | Attribute   | Modifying subordinate clauses                   |

For example, the sign LUGAL is a noun (S) with the meaning "king", while the sign SI can appear in the dictionary both as a noun (S: "alignment of the order") and with the operator type Sx->V ("to put into order S").

### Operator types

In addition to the basic types, there are **operators**. An operator takes an expression of a certain basic type as its argument and produces an expression of a (possibly different) basic type. The notation describes where the argument is located and which type is produced:

| Notation    | Meaning                                                |
|-------------|--------------------------------------------------------|
| `Sx->V`     | Takes an S to the left as argument, produces a V       |
| `xS->S`     | Takes an S to the right as argument, produces an S     |
| `Sx->A`     | Takes an S to the left as argument, produces an A      |
| `Vx->V`     | Takes a V to the left as argument, produces a V        |
| `Vx->A`     | Takes a V to the left as argument, produces an A       |
| `SSx->V`    | Takes two S to the left as arguments, produces a V     |

The `x` marks the position of the operator itself: `Sx` means the argument S is to the left of the operator; `xS` means it is to the right.

An example: The sign AN can have the meaning "heaven" as a noun (type S). But it can also appear as an operator `xS->S` with the meaning "divine S" -- in this case, a noun stands to the right of AN and the result is again a noun. In the expression `d-en-lil2` (AN.EN.KID), AN functions as such an operator and produces "the divine EN.KID".

In translations, the placeholder `S` or `V` stands for the argument of the operator. For operators with two arguments (e.g., `SSx->V`), `S1` and `S2` stand for the two arguments.

You can view the different types of a sign with `look_up()`:

```{r}
look_up("an", dic)
```


## 5. Translation Templates

### Creating an empty template

When you want to translate an entire sentence, the function `skeleton()` helps. It generates a hierarchical template in which each word is broken down into its components. Consider the following input as an example:

```{r}
x <- "<d-en-ki> (ki a). jal2 ((e2-kur) ra)."
skeleton(x)
```

The input contains two sentences (separated by `.`), and the brackets control how the template is constructed (more on this below). Each line of the generated template follows the pattern `|reading=SIGN_NAME=cuneiform:type:translation`. Indentation indicates the nesting depth: the overall expression is at the top level, followed by words and word groups, and below them the individual signs.

The first line is the header -- it contains the reading of the entire input text. Below it are the entries for the individual components. Note that the transliteration `jal2=ig` has been adjusted in order to establish an unambiguous 1:1 mapping between transliteration and cuneiform signs.

The two fields after the colons are initially empty -- this is where you enter the grammatical type and translation.

### Brackets for controlling the template

The input can contain three types of brackets that control how the template is constructed:

- **Angle brackets `< >`**: The enclosed expression is treated as a fixed term. No individual entries are generated for the signs inside. This is useful for proper names: in our example, `<d-en-ki>` generates only a single entry for AN.EN.KI, without breaking down the three individual signs.

- **Round brackets `( )`**: The enclosed expression receives its own entry in the template, in addition to entries for its individual signs. This is useful when a subsequence of signs forms a compound word. Brackets can be nested: in `((e2-kur) ra)`, both `e2-kur` and `e2-kur-ra` receive their own entry.

- **Curly braces `{ }`**: Ignored during skeleton generation. They can be used in the input text to mark arguments of operators.

### Pre-filling the template automatically

The function `guess_substr_info()` looks up the most frequent translation for each substring from the dictionary. The result can be passed to `skeleton()`:

```{r}
fill <- guess_substr_info(x, dic)
skeleton(x, fill = fill)
```

The template is then pre-filled with the most likely translations and types. Since the automatic assignment is often not correct, the entries must be reviewed and adjusted as needed. The filled template could look like this:

```
|an-en-ki-ki-a-ig-e2-kur-ra: SEN: The god Enki transforms the Earth. The one who establishes sustenance of human existence utilizes a supplier of energy from a distant place (the E-Kur temple). 
|an-en-ki=AN.EN.KI=𒀭𒂗𒆠: S: god Enki
|ki-a=KI.A=𒆠𒀀: V: to transform the Earth
|	ki=KI=𒆠: S: Earth
|	a=A=𒀀:S☒->V: to transform S
|ig=IG=𒅅: S:  one who establishes the sustenance of human existence.
|e2-kur-ra=E2.KUR.RA=𒂍𒆳𒊏: V: to utilize a supplier of energy from a distant place
|	e2-kur=E2.KUR=𒂍𒆳: S: supplier of energy from a distant place
|		e2=E2=𒂍: ☒S->S: supplier of energy from S
|		kur=KUR=𒆳: S: distant place
|	ra=RA=𒊏: S☒->V: to utilize S
```

This example shows very well the typical structure of Old Sumerian sentences with each sentence consisting of a subject, an object and a verb (in this order). 


## 6. Interactive Translation

For interactive translation of individual lines, the package provides the function `translate()`, which opens a Shiny gadget:

```{r, eval = FALSE}
translate("<d-nu-dim2-mud> (ki a). jal2 ((e2-kur) ra).")
```

The gadget displays four sections on a scrollable page:

1. **N-gram patterns**: Frequent sign combinations in the text that appear in the current line. Such recurring patterns point to fixed terms or compound words.

2. **Context**: The neighbouring lines (if a full text was provided). Frequent n-grams are marked with curly braces.

3. **Grammar probabilities**: A bar chart showing the probability of the different grammatical types for each sign. Tall bars indicate a likely grammatical function.

4. **Translation**: The interactive core section. Here you see the skeleton template with input fields for type and translation. Clicking the green arrow next to an entry displays the corresponding dictionary entries. Clicking a dictionary entry adopts its type and translation. In the input field at the top, you can adjust the bracket structure and click "Update Skeleton" to regenerate the template without losing existing translations.

When you click "Done", the function returns a `skeleton` object containing the completed translation:

```{r, eval = FALSE}
result <- translate("<d-en-ki> (ki a). jal2 ((e2-kur) ra).")
print(result)
```

The second vignette ("Translating Sumerian Texts") shows how to use `translate()` together with a full text and how to generate a custom dictionary from the results.