NaN_payload: storage of tagged objects in the payload bits of NaN IEEE-754 floating point values.

Corlan, Alexandru Dan

NaN_payload: storage of tagged objects in the payload bits of NaN IEEE-754 floating point values.

A.D. Corlan

June 6, 2014.
Archived by WebCite® at http://www.webcitation.org/

Introduction

In most statistics applications it is necessary to store a mixture of floating point and cathegorical data, at least in order to represent the non-availability of some data and perhaps traces to reasons for such non-availability. This could be achieved via normal tagged types, as done in the corlpack package where 128-bit records are used. Alternatively, one can use the fact that IEEE-754 floating point objects, that are implemented in most modern processors, already are tagged types: either a floating point number, or +/- infinity or a NaN (not-a-number) value that has an unused payload can be stored in a record. Thus, one alterative is to use the NaN payload, that allows 51 bits for the 'double', 64 bit floats, to store other types, as presumably intended by the IEEE-754 designers. The advantages of using the NaN payload are:

They are compatible with existing libraries that deal with floating point number arrays, as well as data formats (such as netcdf). Even if changes are necessary to such libraries, they are likely to be localised as the general data structures need not be changed.
Memory/storage/bandwidth is used more efficiently.
They allow for simpler data structures. For example, I can use a vector of floats even if one or two elements need to contain some non-float information. I don't have to design some record structure that contains the vector only because of these values.

The disadvantages are the limited number of bits, that makes them unusable for some applications--for example, I can't store more than about 9 letters from a selected set--and also the fact that behaviour of machines and libraries that process NaN values is poorly documented and may be variable, resulting in portability limitations.

nan_payload_64

This is a first, experimental, implementation of selectors, predicates and in/out functions for the payload of IEEE-754 64-bit floating point types as an Ada package. The 51 bits are treated as a 3-bit tag and a 48-bit data field. Two alternative types are provided for, so far: symbols of up to 9 characters from a limited alphanumeric set and calendar dates with millisecond resolution. For more details see the code. Use gnatchop and gnatmake to compile.

Download nan_payload_64 version 0.1