Rsyslogd 占用高内存

前因

近日告警系统不停收到虚机内存告警,恰好问题发生在周末,并且内存在逐渐持续升高,到周一时才去处理。

排查

检查时发现系统日志rsyslogd服务占用 43.6% 的的内存,内存告警问题基本可以确定因他而起,同时 rsylog 服务有如下日志信息,估测 rsyslog 服务可定存在异常,重启 rsyslog 服务后问题依旧存在。

systemctl status rsyslog
● rsyslog.service - System Logging Service
   Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since 一 2023-05-15 09:07:05 CST; 2h 5min ago
     Docs: man:rsyslogd(8)
           http://www.rsyslog.com/doc/
 Main PID: 30976 (rsyslogd)
   CGroup: /system.slice/rsyslog.service
           └─30976 /usr/sbin/rsyslogd -n

5月 15 10:27:53 localhost rsyslogd[30976]: imjournal: journal reloaded... [v8.24.0-38.el7 try http://www.rsyslog.com/e/0 ]
5月 15 10:27:53 localhost rsyslogd[30976]: imjournal: journal reloaded... [v8.24.0-38.el7 try http://www.rsyslog.com/e/0 ]
5月 15 10:33:20 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:37:20 localhost rsyslogd[30976]: imjournal: 13616 messages lost due to rate-limiting
5月 15 10:43:45 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:47:23 localhost rsyslogd[30976]: imjournal: 11329 messages lost due to rate-limiting
5月 15 10:54:22 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:57:24 localhost rsyslogd[30976]: imjournal: 8370 messages lost due to rate-limiting
5月 15 11:06:37 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 11:07:25 localhost rsyslogd[30976]: imjournal: 1268 messages lost due to rate-limiting

后验证 rsyslog 日志文件,发现 journal 日志文件损坏,估计是问题的根因。

journalctl --verify
2c3c70: Data object references invalid entry at 43c3288                                                                                            
File corruption detected at /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal:43c3118 (of 75497472 bytes, 94%).                     
FAIL: /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal (Bad message)
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb62ae3-0005fbb14f1a98bd.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb3f78d-0005fbb0699d49a6.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb1c46e-0005fbaf8525a3a8.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eaf915a-0005fbae9f826634.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ead5e3c-0005fbadbacf03f4.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eab2b2e-0005fbacd4a449ab.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ea8f83e-0005fbabee830b58.journal

检查虚机内存变化,发现内存一直在持续升高,应该是文件损坏,导致导致无法 flush 所致。

mem-history

备份文件之后再将文件清空,journal 文件验证恢复正常,最后再重启 rsyslog 服务,观察一段时间后服务恢复正常。

cp /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal /tmp
echo '' >/run/log/journal/981dbe06affb41079816f96853d3216f/system.journal
journalctl --verify
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal                                                                             
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb62ae3-0005fbb14f1a98bd.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb3f78d-0005fbb0699d49a6.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb1c46e-0005fbaf8525a3a8.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eaf915a-0005fbae9f826634.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ead5e3c-0005fbadbacf03f4.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eab2b2e-0005fbacd4a449ab.journal          
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ea8f83e-0005fbabee830b58.journal

systemctl status rsyslog
● rsyslog.service - System Logging Service
   Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since 一 2023-05-15 11:31:15 CST; 10s ago
     Docs: man:rsyslogd(8)
           http://www.rsyslog.com/doc/
 Main PID: 25282 (rsyslogd)
   CGroup: /system.slice/rsyslog.service
           └─25282 /usr/sbin/rsyslogd -n

5月 15 11:31:15 localhost systemd[1]: Starting System Logging Service...
5月 15 11:31:15 localhost rsyslogd[25282]:  [origin software="rsyslogd" swVersion="8.24.0-38.el7" x-pid="25282" x-info="http://www.rsyslog.com"] start
5月 15 11:31:15 localhost systemd[1]: Started System Logging Service.

最后

也可以通过修改 rsyslog.service 文件,将如下参数添加进去,保证内存不会占用过大。

  • MemoryAccounting=yes # 开启内存占用统计
  • MemoryHigh=10M # 尽可能限制该单元中的进程最多可以使用多少内存
  • MemoryMax=100M # 绝对刚性的限制该单元中的进程最多可以使用多少内存

最后重载系统服务并重启 rsyslog 服务。

Infee Fang
Infee Fang
互联网二手搬砖工