Danotation: Simple data representation language, primarily for hand-maintained logs.

Corlan, Alexandru Dan

A.D. Corlan > software > danotation

Danotation: Simple data representation language, primarily for hand-maintained logs.

Alexandru D. Corlan

Published: 2012/01/12

Short permanent WebCite link: http://www.webcitation.org/64ewf3UTg

Motivation

Most data representation languages, such as xml, netcdf or cdl, are primarily suited for representing computer generated datasets. Some programming languages, most notably fortran, lisp, ada and R, support complex syntactical structures for data initialisers. However, they are confined to the universe of these languages.

A general language data entry by hand does not seem to be available.

Purpose

Danotation aims to be a universal user language for specifying data by hand (with a text editor) especially for log-like information such as records of expenses, personal medical events, lecture notes about some text, comments on files or arbitrary items and the like.

Danotation aims to be used by non-computer professionals. To this end it has a minimal and plain syntax. However, some syntactic provision is made for the avoidance of counterproductive repetitions.

Syntax

There is a basic syntax that is the simplest form of data representation. Simplicity is in terms of structural simplicity, that is fewest possible rules.

The basic syntax is frozen, in the sense that it should never have further versions.

There is also a macro syntax that includes a number of mechanisms for reducing verbosity and adapting to various fashions of data representation.

Finally, there is a dump syntax, an ascii table form that is even easier to load into programs (such as the R statistical system) than the basic syntax, even without the need of any API or library.

Conversion from the macro to basic and dump format, and from basic to dump, as well as syntax checking is performed by the danotation utility (unix command).

Basic (standard form) syntax

1. A danotation file is sequence of records, each ending with the symbol '_'.

2. Each record contains of a sequence of pairs of symbols, that are named attributes, and values. Symbols are alphanumerical sequences that must start with a letter and may also contain any number of '_' characters.

3. Values are any strings of graphic characters. If a value starts with a double quote it must also end with a double quote and it may then contain any characters, including double quotes that are escaped with a backslash.

4. Symbols and values are sepparated by white space (spaces, tabs, line breaks).

This is all. Please see an example below.

Interpretation of records

In a record the same attribute may be repeated any number of times. The first attribute in a record is also named its predicate. The corresponding (first) value is also named its subject.

The order in which attributes appear is usually significant. For example, the record:

person "Alexandru Corlan"
phone +9021-343-3333
at    home
phone +9022-334-4444
at    job
comment "the above are imaginary phone numbers"
_

The first 'at' refers to the first 'phone' while the last to the last phone. However, this significance results from the behaviour of the processing programs. Still, danotation preserves the order.

The dump syntax

The dump syntax is a table of space sepparated columns, like this:

 1 1 PREDICATE defval
 1 2 SUBJECT neuro
 1 3 val 0
 1 4 semn "excelent"
 2 1 PREDICATE defval
 2 2 SUBJECT neuro
 2 3 val 1
 2 4 semn "usor ametit/somnolent/etc"

The first is the record number. The second is the attribute number inside the record. The third is the attribute name (two new explicit attributes are introduced, PREDICATE and SUBJECT). The last is the value. This format is easy to process with tools like awk, sed or R or loaded into spreadsheets. The above example corresponds to:

defval neuro val 0 semn "excelent" _
defval neuro val 1 semn "usor ametit/somnolent/etc" _

Macro syntax (also called the language form)

In the language form, since version 0.7, comments may be included. They start with a semicolon (`;') following whitespace or at the beginning of a line (that is, not inside or at the end of a non-whitespace sequence) and continue to the end of the line. The following statements modify the expected sequence of records.

with

It must be placed outside a record. It must be followed by an attribute name and a value. All records until a matching 'forget' or a new 'with' with the same attribute name will have the attribute and the value inserted immediately after subject. If more than one with is active, the order of insertion is the order in which the active withs occured.

The name of an attribute may be PREDICATE or SUBJECT. This is to be used in conjunction with tables (see below).

forget

It must be placed outside a record. It must be followed by an attribute name matching that of an active with. Following forget, the with is no longer active.

end_data

It must be placed outside records. Anything in the file following this symbol is ignored by the preprocessor.

table_head

It must appear outside records. It must be followed by any number of symbols and then by '_'. It declares the names of attributes for any table, that is a set of records entered with one record per table line, that follows.

It is active until the next table_head or the end of the file.

table_data, end_table

It must appear outside records. Sequence of values ending in the '_' symbol must follow. Each sequence must have exactly the same number of values as the number of symbols in the last table_head statement. Each sequence is transformed in a record.

Your comments

Your comments are highly welcome, please enter them here.

Danotation v0.7, April 4, 2013

This version adds comments to the macro syntax (language form).

Download from here:

Danotation V0.7 source

Danotation V0.7 binary

Installation and use are unchanged from version 0.6 (except that you need to use '7' instead of '6').

Danotation v0.6, May 22, 2012

Download

Danotation V0.6 source

Danotation V0.6 binary

Installation

form source

tar xvzf danotation_0.6.tar.gz
cd danotation_0.6
make
sudo make install

from binary

tar xvzf danotation_0.6_bin.tar.gz
cd danotation_0.6_bin
sudo make install

Uninstall

cd danotation_0.6
sudo make uninstall

Download

Danotation V0.5, January 12, 2012

Installation

You need to have GNAT, the Ada compiler installed. Then, untar danotation in a directory and say make. Copy the binary danotation in a directory that you have in your PATH, such as /usr/local/bin.

Usage

man danotation
man 5 danotation
danotation -i|--interpret stdout

danotation [-d|--dump] [-h|--header] {-p|--predicate }*
    {-s|--subject }* {-eq|--equals  }* * stdout

The first form (interpret) reads the macro syntax on the standard input and produces the base syntax on the standard output.

The second form reads the basic syntax on the standard input and produces either the dump format or a table with attribute values, one record per line, on the standard output. In the latter case, it can be easily combined with xargs and printf to produce reports in a variety of formats.

A selection of the records to dump or tabulate may be done by specifying predicates, subjects and attribute values. Repeated specifications of the same type are considered to be combined with 'or' while specification of different kinds are considered to be combined with 'and'.