101 of Troubleshooting SQL Server on Linux
Based on our extensive experience with customers using SQL on Linux, we have compiled a guide outlining fundamental troubleshooting steps and available tools to assist in resolving SQL on Linux issues. This guide aims to make it easier for SQL DBAs who have primarily worked on Windows operating systems over the years.
In this article, we will normalize three checks in Linux that we typically perform in Windows to start troubleshooting issues with SQL Server:
System logs
SQL Server logs
Task Manager
Note: The screenshots in the below examples are taken from RHEL and Ubuntu machines, and while the Linux flavors are different, the commands are similar in both environments.
It contains the log messages that the system processes and applications, and these messages are written as plain text log files.
The rsyslog service keeps various log files in the /var/log directory. You can open these files using native commands such as tail,head,more,less,cat, and so forth, depending on what you are looking for.
In RHEL and SLES, they are called messages, while in Ubuntu, you would find them as syslog.
RHEL and SLES: The primary system log file is located at /var/log/messages.
Ubuntu: The main system log file is /var/log/syslog.
Log files and journals are crucial to a system administrator’s work. They reveal a great deal of information about a system and are instrumental during troubleshooting and auditing.
For example, in RHEL, to display boot and other kernel messages, view /var/log/messages:
[server]$ cat /var/log/messages
Use grep and other filtering tools to gather more specific events from a file. You can also use tail to view files as they are updated.
[server]$ tail -f /var/log/messages
Check the /var/log/secure file to view users and their activities:
[server]$ tail -f /var/log/secure
Similarly in Ubuntu, we have/var/log/syslog
What does the syslog or messages file contain? how do we read them?
These log files contain events and messages generated by the kernel, applications, and users that log into the system.
The logs are written in the below format.
The timestamp indicates the time when a log entry was created in the format MMM dd HH:mm:ss. Notice that this format does not include a year.
Hostname is the host or system that originally create the message.
Application is the application that created the message.
Message contains the actual details of an event.
Since the message/syslog could contain thousands/millions of lines as log entries, using ‘cat‘ command alone might not be the right choice at times. I suggest using grep for filtering text through the files.
Using tail -f command allows you to read the current log file in real time. You may combine it with ‘grep’ to filter on desired text.
An alternate method to validate the system events is via the journald
Systemd-journald :
journald is a component of systemd responsible for handling logging. It captures logs, records them, and makes them easy to find. Unlike traditional syslog implementations, journald offers features like structured logging, indexing for fast search, access control, and signed messages.
The systemd-journald service does not keep separate files, as rsyslog does. The idea is to avoid checking different files for issues. Systemd-journald saves the events and messages in a binary format that cannot be read with a text editor. You can query the journal with the journalctl command.
journald stores logs in memory (RAM) without persistent storage (by default), while the Traditional syslog (e.g., /var/log/messages) persists log data to flat files.
To show all event messages, use:
[server]$ journalctl
To view journal entries for today, use:
[server]$ journalctl –since today
To check for messages related to the sql server service for the past hour, you can run:
journalctl –unit mssql-server.service –since “1 hour ago”
SQL Server Error logs:
The error log contains informational messages, warnings, and information about critical events. The error log also contains information about user-generated messages and auditing information such as logon events (success and failure).In Linux the default SQL Errorlog location is /var/opt/mssql/log
Comparing Error and System Logs Output
You can use both the SQL Server error log and the system logs to identify the cause of problems. For example, while monitoring the SQL Server error log, you may encounter error messages that do not contain cause information. By comparing the dates and times for events between these logs, you can narrow the list of probable causes.
Consider the following scenario: We receive an alert indicating that the application is unable to connect to SQL following a maintenance activity over the weekend. We log into the server to verify if SQL Server is operational.
Upon inspection, we find that the SQL process is not running. We then check the status and attempt to start the SQL Server to see if it comes online.
From the logs, we identify a permission issue on a specific folder or file that is preventing SQL Server from starting. Granting the necessary permissions should resolve the issue.
Task Manager:
Where is the Task Manager in Linux? There are command-line utilities that provide similar information to what we see in Windows. We will explore the top and htop utilities
top
The first line of numbers on the dashboard includes the time, how long your computer has been running, the number of people logged in, and what the load average has been for the past one, five, and 15 minutes. The second line shows the number of tasks and their states: running, stopped, sleeping, or zombie.
The next 3 lines describe CPU, Memory utilization, Swap Memory of the server. This article describes the output in detail.
The column details of the process are tabulated as below.
To get into the details of a particular process, (In our case SQL Server) we can use the PID to get the further details of the SQL and the tasks that SQL is currently running.
To get the child PID of SQLServer, use this short command and run the top command on the pid.
We see the utilization of various tasks within SQL Server, that are consuming the resources.
Another utility that provides similar information is htop. The key difference is that htop offers a more user-friendly experience with its use of colors and graphs, compared to the top command.
While top and htop provide valuable information, there are additional command-line tools such as vmstat, System Activity Information (sar), iostat, and others. I suggest running these commands on your test machines and monitoring the output to become familiar with checking performance metrics on a Linux server.
References:
Microsoft Tech Community – Latest Blogs –Read More