Overview

YottaDB is a multi-language NoSQL database. The daemonless database engine resides in the address space of the process, and can be accessed from any supported language. Functions in the supported languages can call one another to the extent that such calling is permitted by the Supported language implementations.

As C is the lingua franca of programming, the C API provides access to the YottaDB engine from any language. As YottaDB adds standard APIs for other languages, additional sections will be added to the Programmers Guide.

Quick Start

Local Installation

  1. Install YottaDB.
  • Create a temporary directory and change to it, e.g.: mkdir /tmp/tmp ; cd /tmp/tmp
  • Get the YottaDB installer: wget https://gitlab.com/YottaDB/DB/YDB/raw/master/sr_unix/ydbinstall.sh
  • Make it executable: chmod +x ydbinstall.sh
  • Run it with your choice of directory where you want it installed (omit the --verbose option for less output): sudo ./ydbinstall.sh --utf8 default --verbose. If you do not specify an installation directory with --installdir, the script installs YottaDB in /usr/local/lib/yottadb/r### where r### is the release, e.g., r122.
  1. Set up the environment: source $(pkg-config --variable=prefix yottadb)/ydb_env_set. This defaults to an environment in $HOME/.yottadb; to use another directory, set the environment variable ydb_dir to the desired directory.
  2. Put your C program in the $ydb_dir directory, #include the file libyottadb.h in your C program and compile it. As a sample program, you can download the wordfreq.c program, with a reference input file and corresponding reference output file and compile it with gcc $(pkg-config --libs --cflags yottadb) -o wordfreq wordfreq.c -lyottadb.
  3. Run your program and verify that the output matches the reference output. For example:
$ cd $ydb_dir
$ gcc $(pkg-config --libs --cflags yottadb) -o wordfreq wordfreq.c -lyottadb
$ ./wordfreq <wordfreq_input.txt >wordfreq_output.tmp
$ diff wordfreq_output.tmp wordfreq_output.txt
$

Note that the wordfreq.c program randomly uses local or global variables (see Local and Global Variables).

Docker Container

You must have at least Docker 17.05 installed. Pre-built images are available at Docker Hub. The Docker image is built to provide sane defaults to begin exploring YottaDB. It is not meant for production usage.

To run a pre-built image: docker run --rm -it yottadb/yottadb to run the image but not persist any changes you make, and docker run -it yottadb/yottadb for persistent changes.

Volumes are supported by mounting the /data directory. To mount the local directory ydb-data into the container to save your database and routines locally and use them in the container, add an appropriate command line parameter before the yottadb/yottadb argument, e.g., docker run -it -v \`pwd\`/ydb-data:/data yottadb/yottadb

This creates a ydb-data directory in your current working directory. After the container is shutdown and removed, delete the directory if you want to remove all data created in the YottaDB container (such as your database and routines).

Concepts

Keys, Values, Nodes, Variables, and Subscripts

The fundamental core data structure provided by YottaDB is key-value tuples. For example, the following is a set of key value tuples:

["Capital","Belgium","Brussels"]
["Capital","Thailand","Bangkok"]
["Capital","USA","Washington, DC"]

Note that data in YottaDB is always ordered. [1] Even if you input data out of order, YottaDB always stores them in order. In the discussion below, data is therefore always shown in order. For example, the data below may well have been loaded by country.

[1]The terms “collate”, “order”, and “sort” are equivalent.

Each of the above tuples is called a node. In an n-tuple, the first n-1 items can be thought of as the keys, and the last item is the value associated with the keys.

While YottaDB itself assigns no meaning to the data in each node, by convention, application maintainability is improved by using meaningful keys, for example:

["Capital","Belgium","Brussels"]
["Capital","Thailand","Bangkok"]
["Capital","USA","Washington, DC"]
["Population","Belgium",1367000]
["Population","Thailand",8414000]
["Population","USA",325737000]

As YottaDB assigns no inherent meaning to the keys or values, its key value structure lends itself to implementing Variety. [2] For example, if an application wishes to add historical census results under “Population”, the following is a perfectly valid set of tuples (source: United States Census):

["Capital","Belgium","Brussels"]
["Capital","Thailand","Bangkok"]
["Capital","USA","Washington, DC"]
["Population","Belgium",1367000]
["Population","Thailand",8414000]
["Population","USA",325737000]
["Population","USA",17900802,3929326]
["Population","USA",18000804,5308483]
…
["Population","USA",20100401,308745538]

In the above, 17900802 represents August 2, 1790, and an application would determine from the number of keys whether a node represents the current population or historical census data.

[2]Variety is one of the three “V”s of “big data” — Velocity, Volume, and Variety. YottaDB handles all three very well.

In YottaDB, the first key is called a variable, and the remaining keys are called subscripts allowing for a representation both compact and familiar to a programmer, e.g., Capital("Belgium")="Brussels". The set of all nodes under a variable is called a tree (so in the example, there are two trees, one under Capital and the other under Population). The set of all nodes under a variable and a leading set of its subscripts is called a subtree (e.g., Population("USA") is a subtree of the Population tree). [3]

[3]Of course, the ability to represent the data this way does not in any way detract from the ability to represent the same data another way with which you are comfortable, such as XML or JSON. However, note while any data that can be represented in JSON can be stored in a YottaDB tree not all trees that YottaDB is capable of storing can be represented in JSON, or at least, may require some encoding (for example, see JSON-M) - in order to be represented in JSON.

With this representation, the Population tree can be represented as follows:

Population("Belgium")=1367000
Population("Thailand")=8414000
Population("USA")=325737000
Population("USA",17900802)=3929326
Population("USA",18000804)=5308483
…
Population("USA",20100401)=308745538

YottaDB has functions for applications to traverse trees in both breadth-first and depth-first order.

If the application designers now wish to enhance the application to add historical dates for capitals, the Capital("Thailand") subtree might look like this (source: The Four Capitals of Thailand).

Capital("Thailand")="Bangkok"
Capital("Thailand",1238,1378)="Sukhothai"
Capital("Thailand",1350,1767)="Ayutthaya"
Capital("Thailand",1767,1782)="Thonburi"
Capital("Thailand",1782)="Bangkok"

Variables vs. Subscripts vs. Values

When viewed as ["Capital","Belgium","Brussels"] each component is a string, and in an abstract sense they are all conceptually the same. When viewed as Capital("Belgium")="Brussels" differences become apparent:

  • Variables are ASCII strings from 1 to 31 characters, the first of which is “%”, or a letter from “A” through “Z” and “a” through “z”. Subsequent characters are alphanumeric (“A” through “Z”, “a” through “z”, and “0” through “9”). Variable names are case-sensitive, and variables of a given type are always in ASCII order (i.e., “Capital” always precedes “Population”).
  • Subscripts are sequences of bytes from 0 bytes (the null or empty string, “”) to 1048576 bytes (1MiB). When a subscript is a canonical number, YottaDB internally converts it to, and stores it as, a number. When ordering subscripts:
    • Empty string subscripts precede all numeric subscripts. By default, YottaDB prohibits empty string subscripts for global variables but permits them for local variables (see Local and Global Variables). Note: YottaDB recommends against the practice of using empty string subscripts in applications. [4]
    • Numeric subscripts precede string subscripts. Numeric subscripts are in numeric order.
    • String subscripts follow numeric subscripts and collate in byte order. Where the natural byte order does not result in linguistically and culturally correct ordering of strings, YottaDB has a framework for an application to create and use custom collation routines.
[4]The YottaDB code base includes code for a legacy subscript collation in which empty strings collate after numeric subscripts and before non-empty strings. This is supported only in M code for backward compatibility reasons, and is not supported for use with C or any other language. Any attempt to bypass protections and use this legacy collation with new code will almost certainly result in buggy applications that are hard to debug.

Like subscripts, values are sequences of bytes, except that ordering of values is not meaningful unlike ordering of subscripts. YottaDB automatically converts between numbers and strings, depending on the type of operand required by an operator or argument required by a function (see Numeric Considerations).

This means that if an application were to store the current capital of Thailand as Capital("Thailand","current")="Bangkok" instead of Capital("Thailand")="Bangkok", the above subtree would have the following order:

Capital("Thailand",1238,1378)="Sukhothai"
Capital("Thailand",1350,1767)="Ayutthaya"
Capital("Thailand",1767,1782)="Thonburi"
Capital("Thailand",1782)="Bangkok"
Capital("Thailand","current")="Bangkok"

Local and Global Variables

YottaDB is a database, and data in a database must persist and be shared. The variables discussed above are specific to an application process (i.e., are not shared).

  • Local variables reside in process memory, are specific to an application process, are not shared between processes, and do not persist beyond the lifetime of a process. [5]
  • Global variables reside in databases, are shared between processes, and persist beyond the lifetime of any individual process.
[5]In other words, what YottaDB calls a local variable, the C programming language calls a global variable. There is no C counterpart to a YottaDB global variable.

Syntactically, local and global variables look alike, with global variable names having a caret (“^”) preceding their names. Unlike the local variables above, the global variables below are shared between processes and are persistent.

^Population("Belgium")=1367000
^Population("Thailand")=8414000
^Population("USA")=325737000

Even though they may appear superficially similar, a local variable is distinct from a global variable of the same name. Thus ^X can have the value 1 and X can at the same time have the value "The quick brown fox jumps over the lazy dog." For maintainability YottaDB strongly recommends that applications use different names for local and global variables, except in the special case where a local variable is an in-process cached copy of a corresponding global variable.

Global Directories

To application software, files in a file system provide persistence. This means that global variables must be stored in files for persistence. A global directory file provides a process with a mapping from the name of every possible global variable name to one or more regions. A database is a set of regions, which in turn map to database files. Global directories are created and maintained by a utility program, which is discussed at length in Chapter 4 Global Directory Editor of the YottaDB Administration and Operations Guide and is outside the purview of this document.

The name of the global directory file required to access a global variable such as ^Capital, is provided to the process at startup by the environment variable ydb_gbldir.

In addition to the implicit global directory an application may wish to use alternate global directory names. For example, consider an application that wishes to provide an option to display names in other languages while defaulting to English. This can be accomplished by having different versions of the global variable ^Capital for different languages, and having a global directory for each language. A global variable such as ^Population would be mapped to the same database file for all languages, but a global variable such as ^Capital would be mapped to a database file with language-specific entries. So a default global directory Default.gld mapping a ^Capital to a database file with English names can be specified in the environment variable ydb_gbldir but a different global directory file, e.g., ThaiNames.gld can have the same mapping for a global variable such as ^Population but a different database file for ^Capital. The intrinsic special variable $zgbldir can be set to a global directory name to change the mapping from one global directory to another.

Thus, we can have:

$zgbldir="ThaiNames.gld"
^Capital("Thailand")="กรุ่งเทพฯ"
^Capital("Thailand",1238,1378)="สุโขทัย"
^Capital("Thailand",1350,1767)="อยุธยา"
^Capital("Thailand",1767,1782)="ธนบุรี"
^Capital("Thailand",1782)="กรุ่งเทพฯ"

Client/Server Operation

In common usage, database files reside on the same computer system as that running application code. However, as described in Chapter 13 GT.CM Client/Server of the Administration and Operations Guide, database files can reside on a computer system different from that running application code. This mapping of global variables to regions that map to remote files is also performed using global directories, and is transparent to application code except that YottaDB client/server operation does not support transaction processing.

Furthermore, there are configurations that impliticly invoke transaction processing logic, such as distributing a global variable over multiple database regions, or a trigger invocation (see Chapter 14 Triggers of the YottaDB M Programmers Guide). Operations that invoke implicit transaction processing are not supported for global variables that reside on remote database files.

Intrinsic Special Variables

In addition to local and global variables, YottaDB also has a set of Intrinsic Special Variables. Just as global variables are distinguished by a “^” prefix, intrinsic special variables are distinguished by a “$” prefix. Unlike local and global variable names, intrinsic special variable names are case-insensitive and so $zgbldir and $ZGblDir refer to the same intrinsic special variable. Intrinsic special variables have no subscripts.

While the majority of intrinsic special variables as enumerated in Chapter 8 (Intrinsic Special Variables) of the YottaDB M Programmers Guide are useful to M application code, others are more generally useful and documented here.

$tlevel

Application code can read the intrinsic special variable $tlevel to determine whether it is executing inside a transaction. $tlevel>0 means that it is inside a transaction, and $tlevel>1 means that it is inside a nested transaction. Note that a transaction can be started explicitly, e.g., by calling ydb_tp_s() or ydb_tp_st(),or implicitly by a trigger resulting from a ydb_delete_s(), ydb_delete_st(), ydb_set_s() or ydb_set_st().

$trestart

Application code inside a transaction can read the intrinsic special variable $trestart to determine how many times a transaction has been restarted. Although YottaDB recommends against accessing external resources within a transaction, logic that needs to access an external resource (e.g., to read data in a file), can use $trestart to restrict that access to the first time it executes ($trestart=0).

$zgbldir

$zgbldir is the name of the current global directory file; any global variable reference that does not explicitly specify a global directory uses $zgbldir. For example, an application can set an intrinsic special variable $zgbldir="ThaiNames.gld" to use the ThaiNames.gld mapping. At process startup, YottaDB initializes $zgbldir from the environment variable value $ydb_gbldir.

$zmaxtptime

$zmaxtptime provides a limit in seconds for the time that a transaction can be open (see Transaction Processing). $zmaxtptime is initialized at process startup from the environment variable ydb_maxtptime, with values greater than 60 seconds truncated to 60 seconds. In the unlikely event that an application legitimately needs a timeout greater than 60 seconds, use ydb_set_s() or ydb_set_st() to set it.

$zstatus

$zstatus provides additional details of the last error. Application code can retrieve $zstatus using ydb_get_s() or ydb_get_st(). $zstatus consists of several comma-separated substrings.

  • The first is an error number.
  • The second is always "(SimpleAPI)".
  • The remainder is more detailed information about the error, and may contain commas within.

Note that a race condition exists for a multi-threaded application: after a call that returns an error, it is possible for another call from a different thread to perturb the value of $zstatus. Use the errstr parameter discussed in Threads to get the correct $zstatus in a multi-threaded application.

$zyrelease

$zyrelease identifies the YottaDB release in use. It consists of four space separated pieces:

  1. Always “YottaDB”.
  2. The release number, which starts with “r” and is followed by two numbers separated by a period (“.”), e.g., “r1.24”. The first is a major release number and the second is a minor release number under the major release. Even minor release numbers indicate formally released software. Odd minor release numbers indicate software builds from “in flight” code under development, between releases.
  3. The operating system. e.g., “Linux”.
  4. The CPU architecture, e.g., “x86_64”.

Transaction Processing

YottaDB provides a mechanism for an application to implement ACID (Atomic, Consistent, Isolated, Durable) transactions, ensuring strict serialization of transactions, using optimistic concurrency control.

Here is a simplified view [6] of YottaDB’s implementation of optimistic concurrency control:

  • Each database file header has a field of the next transaction number for updates in that database.
  • The block header of each database block in a database file has the transaction number when that block was last updated.
  • When a process is inside a transaction, it keeps track of every database block it has read, and the transaction number of that block when read. Other processes are free to update the database during this time.
  • The process retains updates in its memory, without committing them to the database, so that its own logic sees the updates, but no other process does. As every block that the process wishes to write must also be read, tracking the transaction numbers of blocks read suffices to track them for blocks to be written.
  • To commit a transaction, a process checks whether any block it has read has been updated since it was read. If none has, the process commits the transaction to the database, incrementing the file header fields of each updated database file for the next transaction.
  • If even one block has been updated, the process discards its work, and starts over. If after three attempts, it is still unable to commit the transaction, it executes the transaction logic on the fourth attempt with updates by all other processes blocked so that the transaction at commit time will not encounter database changes made by other processes.
[6]At the high level at which optimistic concurrency control is described here, a single logical database update (which can span multiple blocks and even multiple regions) is a transaction that contains a single update.

In YottaDB’s API for transaction processing, an application packages the logic for a transaction into a function, passing the function to the ydb_tp_s() or ydb_tp_st() functions. YottaDB then calls that function.

  • If the function returns a YDB_OK, YottaDB attempts to commit the transaction. If it is unable to commit as described above, or if the called function returns a YDB_TP_RESTART return code, it calls the function again.
  • If the function returns a YDB_TP_ROLLBACK, ydb_tp_s() or ydb_tp_st() return to the caller with that return code after discarding the uncommitted database updates and releasing any locks acquired within the transaction.
  • To protect applications against poorly coded transactions, if a transaction takes longer than the number of seconds specified by the intrinsic special variable $zmaxtptime, YottaDB aborts the transaction and the ydb_tp_s() or ydb_tp_st() functions return the YDB_ERR_TPTIMEOUT error.

Note

If the transaction logic receives a YDB_TP_RESTART or YDB_TP_ROLLBACK from a YottaDB function that it calls, it must return that value to ydb_tp_s() or ydb_tp_st(). Failure to do so could result in application level data inconsistencies and hard to debug application code.

Sections Threads and Threads and Transaction Processing provide important information pertinent to transaction processing in a multi-threaded application.

Nested Transactions

YottaDB allows transactions to be nested. In other words, code executing within a transaction may itself call ydb_tp_s() or ydb_tp_st(). Although ACID properties are only meaningful at the outermost level, nested transactions are nevertheless useful. For example:

  • Application logic can be programmed modularly. Logic that requires ACID properties can be coded as a transaction, without the need to determine whether or not the caller of that logic is itself within a transaction.
  • That local variables can be saved, and restored on transaction restarts, provides useful functionality that nested transactions can exploit.

Locks

YottaDB locks are a fast, lightweight tool for multiple processes to coordinate their work. An analogy with the physical world may help to explain the functionality. When it is locked, the lock on a door prevents you from going through it. In contrast, a traffic light does not stop you from driving through a street intersection: it works because drivers by convention stop when their light is red and drive when it is green.

YottaDB locks are more akin to traffic lights than door locks. Each lock has a name: as lock names have the same syntax as local or global variable names, Population, ^Capital, and ^Capital("Thailand",1350,1767) are all valid lock names. Features of YottaDB locks include:

  • Locks are exclusive: one and only one process can acquire a lock with the resource name. For example, if process P1 acquires lock Population("USA"), process P2 cannot simultaneously acquire that lock. However, P2 can acquire lock Population("Canada") at the same time that process P1 acquires Population("USA").
  • Locks are hierarchical: a process that has a lock at a higher level blocks locks at lower levels and vice versa. For example, if a process P0 must wait for processes P1, P2, … to complete, each of P1, P2, … can acquire lock Process(pid). P0’s subsequent attempt to acquire lock Process is blocked till processes P1, P2, … complete.
  • Locks include counters: a process that acquires ^Capital("Belgium") can acquire that lock again, incrementing its count to 2. This simplifies application code logic: for example, a routine in application code that requires ^Capital("Belgium") can simply incrementally acquire that lock without needing to test whether a higher level routine has already acquired it. More importantly, when it completes its work, the routine can decrementally release the lock without concern for whether or not a higher level routine needs that lock. When the count goes from 1 to 0, the lock becomes available for acquisition by another process.
  • Locks are robust: while normal process exit releases locks held by that process, if a process holding a lock exits abnormally without releasing it, another process that needs the lock, and finding it held by a non-existent process will automatically scavenge the lock.

Although YottaDB lock names are the same as local and global variable names, YottaDB imposes no connection between a lock name and the same variable name. By convention, and for application maintainability, it is good practice to use lock names associated with the variables to which application code requires exclusive access, e.g., use a lock called ^Population to protect or restrict access to a global variable called ^Population. [7]

[7]Since a process always has exclusive access to its local variables, access to them never needs protection from a lock. So, it would be reasonable to use a lock Population to restrict access to the global variable ^Population.

Since YottaDB lock acquisitions are always timed for languages other than M, it is not in principle possible for applications to deadlock on YottaDB locks. Consequently defensive application code must always validate the return code of calls to acquire locks. As a practical matter, it is possible to set timeouts that are long enough that users may perceive applications to be hung.

Since YottaDB resources such as locks belong to a process rather than a thread within a process (see discussion under Threads), design rules to avoid deadlocks (such as acquiring locks in a predefined order that all processes must respect) must be respected by all threads in a process (or for a language such as Go, by all Goroutines in a process).

Locks and Transaction Processing

Transaction Processing and Locks solve overlapping though not congruent use cases. For example, consider application code to transfer $100 from a customer’s savings account to that same customer’s savings account, which would likely include the requirement that business transactions on an account must be serializable. This can be implemented by acquiring a lock on that customer (with an application coded so that other accesses to that customer are blocked till the lock is released) or by executing the transfer inside a YottaDB transaction (which provides ACID properties). Unless the application logic or data force pathological transaction restarts that cannot be eliminated or worked around, transaction processing’s optimistic concurrency control typically results in better application throughput than the pessimistic concurrency control that locks imply.

In general, we recommend using either transaction processing or locks, and not mixing them. However, there may be business logic that requires the use of locks for some logic, but otherwise permits the use of transaction processing. If an application must mix them, the following rules apply:

  • A lock that a process acquires prior to starting a transaction cannot be released inside the transaction - it can only be released after the transaction is committed or rolled back. Locks acquired inside a transaction can be released either inside the transaction, or after the transaction is committed or rolled back.

Programming in M

YottaDB includes a complete implementation of the M programming language (also known as MUMPS - see The Heritage and Legacy of M (MUMPS) – and the Future of YottaDB )) that mostly conforms to ISO/IEC 11756:1999. The YottaDB M Programmers Guide documents programming YottaDB in M and is not duplicated here.

YottaDB supports calling between M and C application code, as documented in Chapter 11 (Integrating External Routines) of the M Programmers Guide.