Environment
- pacemaker.x86_64 2.1.0-8.el8
- corosync.x86_64 3.1.5-1.el8
- pgpool-II-pg13.x86_64 4.2.4-1pgdg.rhel8
- PostgreSQL13
Issus
名前解決が一時的に不可となった
等の理由で ping-monitor が timeout する場合がある.
crm_mon -rfA Migration Summary: * Node: db-a001: * ping: migration-threshold=1 fail-count=1 last-failure='Thu Oct 6 14:39:33 2020' Failed Resource Actions: * ping_monitor_10000 on db-a001 'error' (1): call=32, status='Timed Out', exitreason='', last-rc-change='2020-10-06 14:39:33 +09:00', queued=0ms, exec=0ms
syslog
Oct 6 14:39:33 pacemaker-controld[1996]: error: Result of monitor operation for ping on db-a001: Timed Out Oct 6 14:39:33 pacemaker-controld[1996]: notice: Transition 0 action 9 (ping_monitor_10000 on db-a001): expected 'ok' but got 'error' Oct 6 14:39:33 pacemaker-controld[1996]: notice: State transition S_IDLE -> S_POLICY_ENGINE Oct 6 14:39:33 pacemaker-attrd[1994]: notice: Setting fail-count-ping#monitor_10000[db-a001]: (unset) -> 1 Oct 6 14:39:33 pacemaker-attrd[1994]: notice: Setting last-failure-ping#monitor_10000[db-a001]: (unset) -> 1665034773 Oct 6 14:39:33 pacemaker-schedulerd[1995]: warning: Unexpected result (error) was recorded for monitor of ping:0 on db-a001 at Oct 6 14:39:33 2020
pacemaker log
Oct 06 14:39:33 pacemaker-schedulerd[1995] (pcmk__native_allocate) info: Resource ping:1 cannot run anywhere Oct 06 14:39:33 pacemaker-schedulerd[1995] (RecurringOp) info: Start recurring monitor (10s) for ping:0 on db-a001 Oct 06 14:39:33 pacemaker-schedulerd[1995] (log_list_item) notice: Actions: Recover ping:0 ( db-a001 ) Oct 06 14:39:34 pacemaker-schedulerd[1995] (pcmk__log_transition_summary) notice: Calculated transition 24, saving inputs in /var/lib/pacemaker/pengine/pe-input-971.bz2 Oct 06 14:39:34 pacemaker-controld [1996] (handle_response) info: pe_calc calculation pe_calc-dc-1665034773-121 is obsolete Oct 06 14:39:34 pacemaker-schedulerd[1995] (unpack_config) notice: On loss of quorum: Ignore Oct 06 14:39:34 pacemaker-schedulerd[1995] (unpack_rsc_op_failure) warning: Unexpected result (error) was recorded for monitor of ping:0 on db-a001 at Oct 6 14:39:33 2020 | rc=1 id=ping_last_failure_0
Recovery
ping resource を cleanup するだけで良い.
pcs resource failcount show Failcounts for resource 'ping' db-a001: 1 pcs resource cleanup ping
cleanup
は障害状態となった resource のみ対象となり, pcs resource refresh ping
だと全 resource を走査し再検出する.
refresh --full
だと状態が不明な resource も対象になる.