YDB Syslog

Overview

YDBSyslog is a YottaDB plugin to capture syslog data in a YottaDB database, to allow for more sophisticated analytics, forensics, and troubleshooting, for example by using Octo. Furthermore, by consolidating the syslogs of several systems in a single database, queries can run on data that cuts across multiple systems, e.g., to investigate concurrent events.

It operates in two modes to ingest data in the journalctl --output=export format:

  • By running journalctl --follow in a PIPE device, YDBSyslog can continuously ingest syslog entries in real time.

  • Reading a journalctl export from stdin. Reading from journalctl --output=export --follow in a pipe is effectively the same as reading from a PIPE device using the --follow option.

YDBSyslog can output a DDL which Octo will accept, allow the syslog databaase to be queried using SQL.

Quickstart

As a YottaDB plugin, YDBSyslog requires YottaDB. You can install YottaDB and YDBSyslog together:

mkdir /tmp/tmp ; wget https://gitlab.com/YottaDB/DB/YDB/raw/master/sr_unix/ydbinstall.sh
cd /tmp/tmp ; chmod +x ydbinstall.sh
sudo ./ydbinstall.sh --utf8 --syslog

Although you can omit the --utf8 option if you do not want UTF-8 support installed, we recommend installing UTF-8 support as syslogs can include UTF-8 characters. If you already have YottaDB installed, use sudo $ydb_dist/ydbinstall --syslog --plugins-only --overwrite-existing to install or reinstall the YDBSyslog plugin without reinstalling YottaDB.

Installation

If you don't use the Quickstart method, you can install YDBSyslog from source. In addition to YottaDB and its requirements, YDBSyslog requires cmake, git, make, and pkg-config. Clone the YDBSyslog repository, and then install the plugin, using the following commands:

git clone https://gitlab.com/YottaDB/Util/YDBSyslog.git YDBSyslog-master
cd YDBSyslog-master
mkdir build && cd build
cmake ..
make && sudo make install

Usage

The most common usage of YDBSyslog is to run %YDBSYSLOG from the shell.

yottadb -run %YDBSYSLOG op [options]

Where op and [options] are:

  • help - Output options to use this program.

  • ingestjnlctlcmd [options] - Run the journalctl --output=export command in a PIPE. Options are as follows; all options may be omitted.

    • --boot [value] - --boot is mutually exclusive with --follow. There are several cases of value:

      1. If omitted, the --boot parameter is omitted when invoking journalctl. This ingests the syslog from the current boot.

      2. If a hex string prefixed with 0x, the string sans prefix is passed to journalctl --boot.

      3. If a decimal number, it is passed unaltered to journalctl --boot.

      4. If a case-independent all, that option is passed to journalctl --boot.

    • --follow is mutually exclusive with --boot. The --follow option is used to invoke journalctl --follow, and results in %YDBSYSLOG running as a daemon to continuously ingest the syslog exported by journalctl.

    • --moreopt indicates that the rest of the command line should be passed verbatim to the journalctl command as additional options. See the Linux command man journalctl for details. YDBSyslog does no error checking of these additional options.

  • ingestjnlctlfile – read journalctl --output=export formatted data from stdin.

  • octoddl - output an Octo DDL to allow analysis of syslog data using SQL. If the database combines syslog data from multiple systems, Octo SQL queries can span systems.

The following M entryrefs can called directly from programs.

  • INGESTJNLCTLCMD^%YDBSYSLOG(boot,follow,moreopt) runs journalctl --output=export in a PIPE device. Parameters are:

    • boot is the parameter for the --boot command line option of journalctl. There are several cases:

      1. If unspecified or the empty string, the --boot option is omitted.

      2. If a hex string prefixed with "0x", the string sans prefix is passed to journalctl as the value.

      3. If a decimal number, it is passed unaltered to journalctl.

      4. If a case-independent "all", that option is passed to journalctl.

    • If follow is non-zero, INGESTJNLCTLCMD follows journalctl, continuously logging syslog output in the database. boot and follow are mutuially exclusive.

    • moreopt is a string intended to be passed verbatim to the journalctl command. See the Linux command man journalctl for details. INGESTJNLCTMCMD does no error checking of these additional options.

  • INGESTJNLCTLFILE^%YDBSYSLOG reads jnlctl --output=export formatted data from stdin.

  • OCTODDL^%YDBSYSLOG([scanflag]) generates the DDL that can be fed to Octo to query the ingested syslog data using SQL. If scanflag evaluates to 1, the routine scans the database for additional fields beyond those indentified in the code.

Data are stored in nodes of ^%ydbSYSLOG with the following subscripts, which are reverse engineered from the __CURSOR field of the journalctl export format. While __CURSOR is documented as opaque, reverse engineering provides a more compact database and faster access:

  • Cs – a UUID for a large number of syslog records.

  • Cb – evidently a boot UUID.

  • Ci - evidently the record number in a syslog.

  • Ct - evidently the number of microseconds since the UNIX epoch.

  • Cm – evidently a monolithic timestamp since boot.

  • Cx - a UUID that is unique to each syslog entry.

Fields that journalctl has been found to flag as binary, e.g., "MESSAGE" and "SYSLOG_RAW" have an additional, seventh, subscript, the tag for the field.

Note that since querying syslog entries is content based (e.g., the USER_ID field) and not by the subscripts, if the reverse engineering of __CURSOR is imperfect, or if a future systemd-journald changes the fields, it will not affect the correctness of queries; it will only incrementally increase database size and consequently access speed (smaller databases are marginally faster).

The numerous fields exported by journalctl are not well documented. Systemd Journal Export Formats is helpful, as is man systemd.journal-fields. However, outside the source code, there does not appear to be a comprehensive list of all fields. The fields listed in the _YDBSYSLOG.m source code were captured from a couple dozen Linux systems running releases and derivatives of Arch Linux, Debian GNU/Linux, Red Hat Enterprise Linux, SUSE Linux Enterprise, and Ubuntu. Even if journalctl exports additional fields not identified, %YDBSYSLOG captures them, and generates reasonable DDL entries for them.

Should you find additional entries not identified by the _YDBSYSLOG.m source code, please create an Issue or a Merge Request in the YottaDB project.

Syslog from multiple systems

Although there are many ways to script gathering data from multiple systems using %YDBSYSLOG, the program UseYDBSyslog is a sample script you can use. After reading the comments in the file UseYDBSyslog.txt:

  1. Edit the file UseYDBSyslog.txt to replace the sample loghost name, server names, and starting TCP port with the specific values for your environment.

  2. Save the file as UseYDBSyslog.m on the loghost and on each server in a location where YottaDB can execute it.

  3. To use it, first start it on the loghost, and then on each server, and confirm that the two port numbers reported by the loghost for each server match those the server reports.

  4. To collect all syslogs from all servers, intially, start it with yottadb -run %XCMD 'do ^UseYDBSyslog(1)'. Subsequently, a simple yottadb -run UseYDBSyslog suffices to capture syslogs from the current boot.

  5. To collect all syslogs from all servers starting at a specific time, pass the time as the third parameter, e.g., yottadb -run %XCMD 'do ^UseYDBSyslog(,,,"--since=""2023-08-13 14:04""")'.

The default configuration of UseYDBSyslog creates an unjournaled database that uses the MM access method. If you use journaling for recoverability, remember to monitor space used by prior generation journal files, and to delete those old journal files when they are no longer needed.