Appendix B: Monitoring YottaDB

Monitoring YottaDB Messages

This section covers information on monitoring YottaDB messages. There are several types of messages.

  • The YottaDB run-time system sends messages (such as when a database file extends) to the system log. These are not trapped by the application error trap.

  • Compilation errors generated by YottaDB are directed to STDERR. These are not trapped by the application error trap. You can avoid them by compiling application code before deploying it in production, or log them by running yottadb processes with STDERR directed to a file.

  • Errors trapped by the application and logged by the application. These are outside the purview of this discussion.

A system management tool will help you automate monitoring messages.

YottaDB sends messages to the system log at the LOG_INFO level of the LOG_USER facility. YottaDB messages are identified by a signature of the form YDB-s-abcdef where -s- is a severity indicator and abcdef is an identifier.

The severity indicators are: * -I- for informational messages * -W- for warnings * -E- for errors * -F- for events that cause a YottaDB process to terminate abnormally.

Your monitoring should recognize important events in real time, and warning events within an appropriate time. All messages have diagnostic value. It is important to create a baseline pattern of messages as a signature of normal operation of your system, so that a deviation from this baseline - the presence of unexpected messages, a usual number of expected messages (such as file extension) or the absence of expected messages - allows you to recognize abnormal behavior when it happens. In addition to responding to important events in real time, you should regularly review information and warning messages and ensure that deviations from the baseline can be explained.

Some message identifiers are described in the following table:

Component

Instance File or Replication Journal Pool

Receiver Pool

Identifier

Source Server

Y

N

N/A

N/A

SRCSRVR

MUPIP

Receiver Server

Y

N

N/A

N/A

RCVSRVR

MUPIP

Update Process

Y

N

N/A

N/A

UPD

MUPIP

Reader Helper

N/A

N/A

Y

N

UPDREAD

UPDHELP

Writer Helper

N/A

N/A

Y

N

UPDWRITE

UPDHELP

In addition to messages in the system log, and apart from database files and files created by application programs, YottaDB creates several types of files: journal files, replication log files, gtmsecshr log files, inter-process communication socket files, files from recovery/rollback and output and error files from JOB'd processes. You should develop a review-and-retention policy. Journal files and files from recovery/rollback are likely to contain sensitive information that may require special handling to meet business or legal requirements. Monitor all of these files for growth in file numbers or size that is materially different than expectations set by the baseline. In particular, monitoring file sizes is computationally inexpensive and regular monitoring - once an hour, for example - is easily accomplished with the system crontab.

While journal files automatically switch to new files when the limit is reached, log files can grow unchecked. You should periodically check the sizes of log files and switch them when they get large - or simply switch them on a regular schedule.

  1. gtmsecshr log file - gtm_secshr_log in the directory $ydb_log (send a SIGHUP to the gtmsecshr process to create a new log file).

    Note

    In the latest version, YottaDB logs gtmsecshr messages in the system log and ignores the environment variable ydb_log.

  2. Source Server, Receiver Server, and Update Process log files.

Since database health is critical, database growth warrants special attention. Ensure every file system holding a database file has sufficient space to handle the anticipated growth of the files it holds. Remember that with the lazy allocation used by UNIX file systems, all files in a system compete for space. YottaDB issues an informational message each time it extends a database file. When extending a file, it also issues a warning if the remaining space is less than three times the extension size. You can use the $VIEW() function to find out the total number of blocks in a database as well as the number of free blocks.

As journal files grow with every update, they use up disk faster than database files do. YottaDB issues messages when a journal file reaches within three, two or one extension-size number of blocks from the automatic journal file switch limit. YottaDB also issues messages when a journal file reaches its specified maximum size, at which time YottaDB closes the file, renames it, and creates a new journal file. Journal files covering time periods prior to the last database backup (or prior to the backup of replicating secondary instances) are not needed for continuity of business, and can be deleted or archived, depending on your retention policy. Check the amount of free space in file systems at least hourly and perhaps more often, especially file systems used for journaling, and take action if it falls below a threshold.

YottaDB uses monotonically increasing relative time stamps called transaction numbers. You can monitor growth in the database transaction number with DSE DUMP -FILEHEADER. Investigate and obtain satisfactory explanations for deviations from the baseline rate of growth.

After a MUPIP JOURNAL -ROLLBACK (non replicated application configuration) or MUPIP JOURNAL -RECOVER -FETCHRESYNC (replicated application configuration), you should review and process or reconcile updates in the broken and unreplicated (lost) transaction files.

In a replicated environment, frequently (at least hourly; more often is suggested, since checking takes virtually no system resources) check the state of replication and the backlog with MUPIP REPLICATE -CHECKHEALTH and -SHOWBACKLOG. Establish a baseline for the backlog, and take action if the backlog exceeds a threshold.

When a YottaDB process terminates abnormally, it attempts to create a YDB_FATAL_ERROR.ZSHOW_DMP_*.txt file containing a dump of the M execution context and a core file containing a dump of the native process execution context. The M execution context dump is created in the current working directory of the process. Your operating system may offer a means to control the naming and placement of core files; by default, they are created in the current working directory of the process with a name of core.*. The process context information may be useful to you in understanding the circumstances under which the problem occurred and/or how to deal with the consequences of the failure on the application state. The core files are likely to be useful primarily to your YottaDB support channel. If you experience process failures but do not find the expected files, check file permissions and quotas. You can simulate an abnormal process termination by sending the process a SIGILL (with kill -ILL or kill -4 on most UNIX/Linux systems).

Note

Dumps of process state files are likely to contain confidential information, including database encryption keys. Please ensure that you have appropriate confidentiality procedures as mandated by applicable law and corporate policy.

YottaDB processes issued with the JOB command create .mje and .mjo files for their STDERR and STDOUT respectively. Analyze non-empty .mje files. Design your application and/or operational processes to remove or archive .mjo files once they are no longer needed.

Use the environment variable ydb_procstuckexec to trigger monitoring for processes holding a resource for an unexpectedly long time. $ydb_procstuckexec specifies a shell command or a script to execute when any of the following conditions occur:

  • An explicit MUPIP FREEZE or an implicit freeze, such as for a BACKUP or INTEG -ONLINE that lasts longer than one minute.

  • MUPIP actions find kill_in_prog (KILLs in progress) to be non-zero after a one minute wait on a region.

  • BUFOWNERSTUCK, INTERLOCK_FAIL, JNLPROCSTUCK, SHUTDOWN, WRITERSTUCK, MAXJNLQIOLOCKWAIT, MUTEXLCKALERT, SEMWT2LONG, and COMMITWAITPID operator messages are being logged.

The shell script or command pointed to by ydb_procstuckexec can send an alert, take corrective actions, and log information.

Note

Make sure user processes have sufficient space and permissions to run the shell command/script. For example, for the script to invoke the debugger, the process must be of the same group or have a way to elevate privileges.

Managing Core Dumps

When an out-of-design situation or a fatal error causes a YottaDB process to terminate abnormally, YottaDB attempts to create a YDB_FATAL_ERROR.ZSHOW_DMP_*.txt file containing a dump of the M execution context. On encountering an unexpected process termination, YottaDB instructs the operating system to generate a core dump on its behalf at the location determined from the core generation settings of the operating system. YDB_FATAL_ERROR*.txt and core dump files may help YottaDB developers diagnose and debug the condition which resulted in an unexpected process termination, and help you get back up and running quickly from an application disruption. In addition to containing information having diagnostic value, a core dump file may also contain non-public information (NPI) such as passwords, local variables and global variables that may hold sensitive customer data, and so on. If you are an organization dealing with non-public information, you should take additional care about managing and sharing YDB_FATAL_ERROR.ZSHOW_DMP_*.txt and core dump files.

As core dump files may contain non-public information, you might choose to disable core dump generation. In the absence of a core dump file, you may be asked to provide detailed information about your hardware, YottaDB version, application state, system state, and possibly a reproducible scenario of the unexpected process termination. Note that unexpected process terminations are not always reproducible. You are likely to spend a lot more time in providing post-mortem information during a YottaDB support engagement than when a core dump file is available.

Core file generation and configuration are functions of your operating system. Ensure that core file generation is configured and enabled on your operating system. On Linux platforms, /proc/sys/kernel/core_pattern determines the naming convention of core files and /proc/sys/kernel/core_uses_PID determines whether the process id of the dumped process should added to the core dump file name. A core_pattern value of core creates core dump files in the current directory. Check the man page for core (on Linux) for instructions on enabling and configuring core dump file generation according to your requirements.

Note

As maintainers of YottaDB, our goal is to make the product as reliable as it can be, so you should get few, if any, core dump files. Before a public release, YottaDB goes through several rounds of automated testing which provides thorough test coverage for new functionality and possible regressions. While prioritizing fixes for a YottaDB public release, we assign a higher priority to unexpected process terminations that our regression testing cycle and customers may report. As part of any fix, we add new test cases that become an integral part of future regression testing cycles. We have followed this practice for the past several years and therefore it is very unusual for a stable production application to generate core files. YottaDB supplies a wide range of functionality in ways intended to maximize performance. Nonetheless, YottaDB is reasonably complex as the number of possible execution paths is large, and our testing coverage may not include all possible edge cases. If you encounter a core dump because of a YottaDB issue, it is likely that it is not part of our test coverage and we may find it hard to reproduce. Core dump files are a powerful tool in diagnosing and addressing issues that cause process failures. Note also that user actions can directly cause core dump files without any contributing YottaDB issue (see the following example).

The following suggestions may help with configuring core dump files:

  • Always put cores in a directory having adequate protection and away from normal processing. For example, the core file directory may have write-only permissions for protection for almost all users.

  • Set up procedures to remove core dumps and YDB_FATAL_ERROR.ZSHOW_DMP_*.txt when they are no longer needed.

  • Always configure core file generation in a way that each core gets a distinct name so that new cores do not overwrite old cores. YottaDB never overwrites an existing core file even when /proc/sys/kernel/core_uses_pid is set to 0 and /proc/sys/kernel/core_pattern is set to core. If there is a file named core in the target core directory, YottaDB renames it to core1 and creates a new core dump file called core. Likewise, if core(n) already exists, YottaDB renames the existing core to core(n+1) and creates a new core dump file called core.

  • Here are the possible steps to check core file generation on Ubuntu_x86 running YottaDB r1.20:

    $ ulimit -c unlimited
    $ /usr/local/lib/yottadb/r1.20/ydb
    YDB>zsystem "kill -SIGSEGV "_$j
    $YDB-F-KILLBYSIGUINFO, YottaDB process 24570 has been killed by a signal 11 from process 24572 with userid number 1000
    $ ls -l core*
    -rw------- 1 ydbnode jdoe 3506176 Aug 18 14:59 core.24573
    
  • In order to test your core generation environment, you can also generate a core dump at the YottaDB prompt with a ZMESSAGE 150377788 command.

  • If you do not find the expected dump files and have already enabled core generation on your operating system, check file permissions and quotas settings.

  • As YottaDB core dumps are not configured for use with automated crash reporting systems such as apport, you might want to adjust the core naming conventions settings in such a way that core dumps are preserved safely until the time you engage your YottaDB support channel.

Before sharing a core dump file with anyone, you must determine whether the files contain NPI and whether the recipient is permitted to view the information in the files. YottaDB Support does not accept NPI.