CORLPACK: R-like objects, Extended UUID (extuuid) support, simplified API for Ada-2005

A.D. Corlan

October 2, 2012. Last changed: April 24, 2013.
Archived by WebCite® at http://www.webcitation.org/6DoDWzbLf

Corlpack (Common Objects of the R Language PACKage) is an Ada package with a collection of data types and utility functions for programmers of computational applications. It is currently at version 0.5, that is an alpha, unstable version. However, below is the general roadmap for version 1.0.

The Wonderful World of CORLPACK 1.0

The primary purpose of Corlpack is to simplify computational application programming by reducing the number of APIs that need to be learned, while still maintaining reasonable implementation efficiency.

Application programmers would only need to learn a small set of generic functions and data types. Corlpack aims to separate the systems side of an application (for example: access to datafiles of a variety of formats) from the application itself.

We try to do this by introducing two data types: the extUUID as a generic, 128 bit object that can represent a broad variety of usual application-level data types, such as identifiers, coordinates or quantities with unit, and the table of extUUIDs as a generic container.

The choice of fixed size UUIDs and tables is motivated by the need for reasonable efficiency, that is frequently essential for computational applications.

The Corlpack table

The table is a two-dimensional array the elements of which may be addressed either by numeric indices or by key values, such as names. Each cell of the array is either an extUUID or a pointer to another structure that may be a large string, a table or something else.

Named vectors, lists, matrices, data frames and structures from the R language [2] can easily be represented as corlpack tables by enforcing suitable restrictions on the content of cells, dimensions or content and number of the keys. The keys are restricted to single symbols for the dimensions that have names.

Otherwise, the Corlpack tables are a generalisation of both R named lists in that they may be bidimensional. They are also a generalisation of the Common-Lisp arrays of objects of arbitrary type, in that they may also be indexed with names instead of numbers. Also, tables aim to be much faster to implement as addressing can be done by calculating an offset, while still preserving the possibility of having heterogenous content.

A small number of operators will allow access and modification of tables and the vectors of ExtUUIDs that host them, mostly by using the plain array indexing in Ada and an offset computation function.

The VUUID

Collections of data structures, for example one master table and all the other tables, strings and other structures that are pointed to by the master table, are stored as Ada variable size arrays of ExtUUIDs, named VUUIDs. They are similar to fixed Strings from the standard Ada library, except they are made of ExtUUIDs rather than Characters.

Typically, a VUUID is loaded from a datafile of a specified format (such as an image, a movie, a csv/xls table, a netcdf file, a fits record, a file system directory structure, etc) with a simple Load invocation and may be saved in a data file of any suitable format with a Save. Some of the formats are specific for some varieties of structure (for example, images), other are generic, for any possible VUUID.

A registering mechanism is provided for the `systems' programmer to add new Load and Save methods.

VUUIDs may also be used without structuring elements, as simple sequences of ExtUUIDs, for example to represent short expressions and formulas or just heterogenous vectors of data.

The Extended UUID (ExtUUID)

The extended UUID is the generic `scalar' type of Corlpack 1.0. It always contains exactly 128 bits, thus fast addressable aggregates such as vectors and tables are feasible without dynamic allocation. The ExtUUIDs are designed to fit in the RFC 4122 scheme of 128 bits.

Each ExtUUID consists of a type tag, that is 13--28 bits long and data. The format of the data bits is established by the value of the type tag. The data bits may contain subtypes that further determine the semantics of the remaining byte.

From the application programmer point of view, there is a flat list of user types (named KINDs) of ExtUUIDs. There will be generic operators for creating (Make_UUID), changing (Set), accessing components (Get) of, operating with ("+","-","/","*","<",">",equality,etc) and converting between ExtUUIDs of diferent kinds. There are also two generic operations, (Read and Format) that convert to and from human readable string representations of ExtUUIDs.

Get and Set deal with UUIDs uniformly, as sets of vectors of integer, float, character and time objects.

For the generic operations, relatively fast registering and dispatching mechanisms are provided for specific methods, so both the list of types and their operations may be extended.

There is also a set of system ExtUUIDs named Konectors that provide structural representation inside a plain array of ExtUUIDs (a VUUID), for example strings, vectors or tables.

The KINDs that should be supported in version 1.0 for UUIDs are listed below. If a recent existing release already supports them, it is added in parentheses.

  1. the "Not Available" (missing) value (0.5)
  2. alphanumeric symbols (0.4.2)
  3. time---a specific instant in physical real time (0.4.2)
  4. quant: a floating point number together with SI unit (0.4.2)
  5. fixed: a fixed point number with a unit such as a currency (0.4.2)
  6. integer, floating point, fraction or complex numbers
  7. spherical coordinates, unbound or bound to spheres of interest (earth, planets, sky as seen form earth)
  8. V1-V5 UUID as specified by RFC 4122 (0.4.2)
  9. serial numers: ISBN, ISSN, EAN, PMID, UNII, LOINC, MAC address, etc (0.4.2)
  10. runs and compilations
  11. subcomponents of a document
  12. formats for formatted output
  13. data formats (of files)
  14. konnectors (pointers, strings, arrays and lists) for structuring a VUUID. (0.4.2)

Other types

For user convenience, other types: Int (64 bit integer), Real (80-bit float) and Text (unboundend string), Time (Ada calendar time) are provided with some of the same generic functions that are also available for UUIDs. Also, thin interfaces to libraries such as Ada.Text_IO, Ada.Calendar, Ada.Exceptions, elementary numerical functions and others are available, reducing the number of 'with' and 'use' statements, as well as generic instantiations that the user needs to make.

The Hq

The Hq data structure was a previous attempt to design and implement an efficient and versatile object like the table, but it proved to be too complex. The first such attempt was using niliada conses, but they proved too slow for large objects because of the necessity of garbage collection. Hopefully, the VUUID-based tables will not have any of these drawbacks.

While the Hq is relatively complicated to use, and its development is currently frozen, it is still kept for potential use in the future, for very large (of the order of gigabytes) structures that would require even faster and more memory efficient algorithms.

How you can help

Currently, Corlpack development mostly needs:
  1. testing;
  2. development of new ExtUUID user types. There are examples, for the ISSN, ISBN, LOINC, PMID and others in the sources. You could write one for any urn and similar encoding scheme that fits easily in about 100 bits.

    Examples are: ResearcherID, ORCID, SICI, ISAN, IETF urns, URN:LEX, astronomical object catalogs (GSC, USNO, NGC, etc), things like mac-addresses, ip addreses (ipv6 doesn't fit in an obvious way, but an ipv6 network does);

  3. writing data load/save plugins for a variety of data formats such as PNG or FITS.

Any other contribution is also welcome.

Download and Install

Version 0.5.2

This version features an executable tool, named `corl' that provides an interface between shells and some corlpack functions. It also includes minor improvements and cleanups such as removal of compilation warnings and better error support when loading vuuids (reporting line numbers of errors).

corlpack_0.5.2.ada released april 24, 2013.

Version 0.5.1

We added arithmetical operators for quant extuuids (numbers with units of measurement) and between quants, integers and floats. There is also a new mechanism to add unit names and definitions that are recognised by the reader. You can write: Len: UUID:= 4*foot + 2.5*inch

corlpack_0.5.1.ada released january 24, 2013.

Version 0.5

First implementation of the corlpack tables, that are a generalisation of all R data types. Workaround for the problem of random bits in gaps of packed records and arrays preventing strightforward equality testing of some UUIDs. Simplified error reporting. Load/save drivers for key/value files.

corlpack_0.5.ada released january 15, 2013.

Version 0.4

corlpack_0.4.2.ada released january 8, 2013. Adds matrices, pointers and lists as structuring elements in vectors of UUIDs. Plugin mechanism for loaders and savers of files of various structures into/from ext-UUID vectors.

corlpack_0.4.1.ada released january 1, 2013. Includes Extended UUID support for: Extended UUID suport for PubMed ID (pmid), Unique ingredient identifier (unii) Logical Intervention Indentifiers, Names and Codes (loinc), webcitation URIs (wbct).

corlpack_0.4.ada released december 27, 2012

For installation and usage, see below.

Version 0.3

Implementation of an extended scheme of 128-bit UUIDs (EXTUUID) including: Preliminary implementation of I/O of the first five varieties of UUIDs.

corlpack_0.3.ada released october 18, 2012

To install, change directory into your Ada input path and say:

gnatchop corlpack_0.3.ada

To use, precede your application with:

with Corlpack; use Corlpack;
See the .ads file for details.

Version 0.2

Early, unstable, alpha version.

corlpack_0.2.ada released october 4, 2012

To install, change directory into your Ada input path and say:

gnatchop corlpack_0.2.ada

To use, precede your application with:

with Corlpack; use Corlpack;
See the .ads file for details.

General introduction

Corlpack currently contains:

Data types

Operators overview

The sepparate procedures Hqtest, Corlidtest and Testuuid contain regression tests.

References

[1] RFC 4122 A Universally Unique IDentifier (UUID) URN Namespace. P. Leach, M. Mealling, R. Salz.

[2] R Language Definition. www.r-project.org