안녕하세요.
운영 중인 DB 를 모니터링하다 11월 16일에 수행한 풀 백업 스크립트의 pg_stop_backup 이 제대로 수행되지 않고 있던게 확인되어 로그를 보니 아래와 같았고,
/xlog/archive_status 에는 .ready 파일이 엄청 많이 쌓여있었습니다. (0000000100018A7300000056.ready 부터 였습니다.)
wal 파일이 과다생성되면서 max_wal_size 값에 따라 체크포인트가 일어났고, 0000000100018A7300000056 을 아카이빙 하기 전에
덮어써서 파일을 찾을 수 없었던걸까요..?
혹시 archive_command 를 이용해 해당 사항을 피하거나 아니면 max_wal_size 와 min_wal_size 를 50GB 정도로 높이면 해결할 수 있을까요?
관련 파라미터는 아래와 같습니다.
checkpoint_timeout = 20min
checkpoint_completion_target = 0.9
max_wal_size = 2GB
min_wal_size = 2GB
(xlog가 저장되는 파일 시스템 사이즈는 250GB 입니다.)
#archive_timeout = 0
archive_mode = on
archive_command = 'dd conv=fdatasync bs=256k if=%p of=/arch/%f && \mv -vf /arch/%f /arch/db'
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 00:00:19 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 00:00:19 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 00:00:20 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 00:00:20 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 00:00:21 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 00:00:21 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
2021-01-16 00:00:21 KST @/ (11201) WARNING: archiving transaction log file "0000000100018A7300000056" failed too many times, will try again later
...
...
2021-01-16 11:17:07 KST postgres@[local]/postgres (8468) WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
2021-01-16 11:17:07 KST postgres@[local]/postgres (8468) HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 11:17:12 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 11:17:12 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 11:17:13 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 11:17:13 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 11:17:14 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 11:17:14 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
2021-01-16 11:17:14 KST @/ (11201) WARNING: archiving transaction log file "0000000100018A7300000056" failed too many times, will try again later
2021-01-16 11:18:07 KST postgres@[local]/postgres (8468) WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed)
2021-01-16 11:18:07 KST postgres@[local]/postgres (8468) HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
dd: opening `xlog/0000000100018A7300000056': No such file or directory
2021-01-16 11:18:14 KST @/ (11201) LOG: archive command failed with exit code 1
2021-01-16 11:18:14 KST @/ (11201) DETAIL: The failed archive command was: dd conv=fdatasync bs=256k if=xlog/0000000100018A7300000056 of=/arch/0000000100018A7300000056 && \mv -vf /arch/0000000100018A7300000056 /arch/db
|