Monday, March 6, 2017

Getting Hints Of Problems From The Logs

Most Linux Distributions use systemd now. The journal service of systemd can be very helpful if you want to check your system. I am very interested in error reports, critical system warnings, that sort of problems that require some immediate attention or human intervention.

[donato@archdesktop ~]$ sudo journalctl -f -p 3
[sudo] password for donato: 
-- Logs begin at Wed 2017-02-22 07:29:32 +08. --
Mar 06 20:16:44 archdesktop kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Mar 06 20:16:44 archdesktop kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Mar 06 20:16:44 archdesktop kernel: ata1: SError: { UnrecovData Handshk }
Mar 06 20:16:44 archdesktop kernel: ata1.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 15 pio 16392 in
                                             opcode=0x4a 4a 01 00 00 10 00 00 00 08 00res 50/00:03:00:08:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
Mar 06 20:16:44 archdesktop kernel: ata1.00: status: { DRDY }

This example is a SATA connection problem. I checked my smartd data with smartctl and nothing insane with them. All disks report healthy. So I am monitoring this if the problem will recur. It could be a bad data / power cable. It could be a bac cable connection. Sometimes merely moving cables around could set this errors on.

You could also set time on the journal output such as:

# journalctl --since "24 hours ago" -p 3 -xb
##the -p flag sets the priority (e.g. 0=system unusable,1=data loss,2=critical,3=errors)
##the -x flag appends some helpful context, messages
##the -b flag limits output to current boot only, since value is empty

