Utilities

How to



A.1   Navigation

The program provides users with a tab-based navigation. Root links, such as the Main Page or the Catalog of Tablets , will remain always on the background.

B.1   Search engine

Sinleqiunnini searching tool has been conceived and developed to support regular expressions, a concise and flexible instrument for identifying strings of text, such as particular characters, words, or patterns of characters. Both Search for lemmata and Browse Glossary make use of it.

B.2 Searching preferences

Users can freely choose for a simplified searching method [Plain Text] or for a more advanced pattern matching [Advanced Search (Regexp)]. Only this latter  uses the regular expressions extended functionalities to support pattern-matching operations, whereas the former one has been noticeably simplified for a basic use.

This section summarizes, with examples, some of the main features, special characters and constructs that can be used for the "Advanced Search". More details that can be found in MySQL Reference Manual: 11.4.2. Regular Expressions or in Wikipedia::Regular expression .

A regular expression, often called a pattern , is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the two strings " it-ta-din- šú " and " it-ta-din- šu " can be matched by the pattern " it-ta-din- š(u|ú)$ ".

   The following operations help to construct regular expressions:

1) Alternation.

 A vertical bar separates alternatives. For example, šu|šú can match "šum-ma" or "šúm-ma").

2) Grouping.

Parentheses are used to define the scope and precedence of the operators (among other uses). For example, a-ba-šu-nu|a-ba-šú-nu and a-ba-š(u|ú)-nu are equivalent patterns which both describe the set of "a-ba-šu-nu" and "a-ba-šú-nu".

3) Quantification and Positioning.

A quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk *, and the plus sign +.

?

The question mark indicates there is zero or one of the preceding element.

*

The asterisk indicates there are zero or more of the preceding element.

+

The plus sign indicates that there is one or more of the preceding element.

[  ]

A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

[^ ]

Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c".

^

Matches the starting position within the string. For example, the pattern ^it matches the lemma "it-ta-din", but it does not match "mi-it-ha-ri-iš"

$

Matches the ending position of the string. For example, the pattern it$ matches only those lemmata like "ṣa-bi-it", but not "mi-it-ha-ri-iš".

All these constructions can be freely combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷.

4) Character classes.

Sinleqiunnini searching tool uses the following classes or categories of characters:

[:alnum:]

[A-Za-z0-9]

Alphanumeric characters

[:alpha:]

[A-Za-z]

Alphabetic characters

[:blank:]

[ \t]

Space and tab

[:digit:]

[0-9]

Digits

[:graph:]

[\x21-\x7E]

Visible characters

[:lower:]

[a-z]

Lowercase letters

[:upper:]

[A-Z]

Uppercase letters

[:punct:]

[-!"#$%&'()*+,./:;<=>?@[\\\]_`{|}~]

Punctuation characters

[:space:]

[ \t\r\n\v\f]

Whitespace characters

5) Escape sequence.

As it is clear, the advanced pattern matching [Advanced Search] changes the common value of some special characters. For example, the ligature sign + is interpreted as one or more of the preceding sign . In those cases one would match the sign + exactly (e.g. ŠE+GÌR or 20+1/2), this character must be escaped , that is two backslashes must be placed in front of the sign (e.g.   \\+  =>   ŠE\\+GÌR ).


B.3 Normalized Form*

Sinleqiunnini's search engine has been projected for the lemmatisation. It basically means that the program can group together the inflected forms of a word under a single headword. For example, by typing the verb headword into the search box and by selecting the "Normalized Form" function, users can look for all the inflected forms of the verb šâmu (a-ša-am, i-ša-am, etc.) or the different series of cuneiform signs it may apper (i-ša₁₀-am, i-šaᵪ-am, iš-am, etc.).
The lemmatisation algorithm is still under development and, at this time, it has been fully implemented for legal tablets only.