Rsyslogd 占用高内存
前因
近日告警系统不停收到虚机内存告警,恰好问题发生在周末,并且内存在逐渐持续升高,到周一时才去处理。
排查
检查时发现系统日志rsyslogd
服务占用 43.6%
的的内存,内存告警问题基本可以确定因他而起,同时 rsylog 服务有如下日志信息,估测 rsyslog 服务可定存在异常,重启 rsyslog
服务后问题依旧存在。
systemctl status rsyslog
● rsyslog.service - System Logging Service
Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
Active: active (running) since 一 2023-05-15 09:07:05 CST; 2h 5min ago
Docs: man:rsyslogd(8)
http://www.rsyslog.com/doc/
Main PID: 30976 (rsyslogd)
CGroup: /system.slice/rsyslog.service
└─30976 /usr/sbin/rsyslogd -n
5月 15 10:27:53 localhost rsyslogd[30976]: imjournal: journal reloaded... [v8.24.0-38.el7 try http://www.rsyslog.com/e/0 ]
5月 15 10:27:53 localhost rsyslogd[30976]: imjournal: journal reloaded... [v8.24.0-38.el7 try http://www.rsyslog.com/e/0 ]
5月 15 10:33:20 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:37:20 localhost rsyslogd[30976]: imjournal: 13616 messages lost due to rate-limiting
5月 15 10:43:45 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:47:23 localhost rsyslogd[30976]: imjournal: 11329 messages lost due to rate-limiting
5月 15 10:54:22 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 10:57:24 localhost rsyslogd[30976]: imjournal: 8370 messages lost due to rate-limiting
5月 15 11:06:37 localhost rsyslogd[30976]: imjournal: begin to drop messages due to rate-limiting
5月 15 11:07:25 localhost rsyslogd[30976]: imjournal: 1268 messages lost due to rate-limiting
后验证 rsyslog
日志文件,发现 journal
日志文件损坏,估计是问题的根因。
journalctl --verify
2c3c70: Data object references invalid entry at 43c3288
File corruption detected at /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal:43c3118 (of 75497472 bytes, 94%).
FAIL: /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal (Bad message)
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb62ae3-0005fbb14f1a98bd.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb3f78d-0005fbb0699d49a6.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb1c46e-0005fbaf8525a3a8.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eaf915a-0005fbae9f826634.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ead5e3c-0005fbadbacf03f4.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eab2b2e-0005fbacd4a449ab.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ea8f83e-0005fbabee830b58.journal
检查虚机内存变化,发现内存一直在持续升高,应该是文件损坏,导致导致无法 flush 所致。
备份文件之后再将文件清空,journal
文件验证恢复正常,最后再重启 rsyslog
服务,观察一段时间后服务恢复正常。
cp /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal /tmp
echo '' >/run/log/journal/981dbe06affb41079816f96853d3216f/system.journal
journalctl --verify
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb62ae3-0005fbb14f1a98bd.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb3f78d-0005fbb0699d49a6.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eb1c46e-0005fbaf8525a3a8.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eaf915a-0005fbae9f826634.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ead5e3c-0005fbadbacf03f4.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003eab2b2e-0005fbacd4a449ab.journal
PASS: /run/log/journal/981dbe06affb41079816f96853d3216f/system@1150002650dc44b5ba4f9eeea354796d-000000003ea8f83e-0005fbabee830b58.journal
systemctl status rsyslog
● rsyslog.service - System Logging Service
Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
Active: active (running) since 一 2023-05-15 11:31:15 CST; 10s ago
Docs: man:rsyslogd(8)
http://www.rsyslog.com/doc/
Main PID: 25282 (rsyslogd)
CGroup: /system.slice/rsyslog.service
└─25282 /usr/sbin/rsyslogd -n
5月 15 11:31:15 localhost systemd[1]: Starting System Logging Service...
5月 15 11:31:15 localhost rsyslogd[25282]: [origin software="rsyslogd" swVersion="8.24.0-38.el7" x-pid="25282" x-info="http://www.rsyslog.com"] start
5月 15 11:31:15 localhost systemd[1]: Started System Logging Service.
最后
也可以通过修改 rsyslog.service
文件,将如下参数添加进去,保证内存不会占用过大。
MemoryAccounting=yes
# 开启内存占用统计MemoryHigh=10M
# 尽可能限制该单元中的进程最多可以使用多少内存MemoryMax=100M
# 绝对刚性的限制该单元中的进程最多可以使用多少内存
最后重载系统服务并重启 rsyslog
服务。