Wednesday, November 24, 2021

Adding standby node to Postgresql replication

 Postgresql database version 11 uses write ahead log (WAL) to ensure data integrity. It contains a log of all activity within the database, and helps with recovery of data. WAL records are saved to a WAL file. The insert position is a log sequence number (LSN) that is a byte offset into the logs, increasing with each new record.

Here, I am provided a Postgresql database that have been configured as a MASTER database. A user has been created on the Linux server and a role within the Postgresql called "replication". Now a new linux server have been provided for purpose of running a standby node, this has the same version of Postgresql database as in the MASTER server.

Installation of Standby

For simplicity the IP of MASTER will be 10.1.1.100, and for STANDBY will be 10.1.1.101.

Following information is provided by the MASTER slave;

Linux username: replication

with password: password

Replication standby server is assigned name: replication

Folder of the installation: /var/lib/pgsql/11/data

Within MASTER, the STANDBY server IP has been added to its pg_hba.conf. 

Lets configure the STANBY server

Step 1: Login as postgres user

sudo su postgres

(enter your password)

cd

Step 2: Replace local data with MASTER data

cd 11

mv data data.original

We will use the option -R to generate the default recovery.conf file.

pg_basebackup -h 10.1.1.100 -U replication -D /var/lib/pgsql/11/data -P --password --slot replication -R

(enter password provided by MASTER)

chmod go-rw data

cd data

Step 3: Start Postgresql server

Ensure firewalld allows the database access.

Review contents of recovery.conf, standby_mode should be 'on'.

sudo systemctl start postgresql-11


Monitoring replication process

STANDBY server.

The latest WAL file, is stored in the folder data/pg_wal. Display current process of startup and which WAL file is being pocessed.

ps -eaf|grep startup

postgres  186454  186452  0 17:51 ?        00:00:02 postgres: startup   recovering 000000010000000C00000024

Display the LSN
ps -eaf|grep receiver

postgres  186460  186452  0 17:51 ?        00:00:06 postgres: walreceiver   streaming C/24C1DA10

Login to psql as postgres user,

SELECT pg_last_wal_receive_lsn();

 pg_last_wal_receive_lsn
-------------------------
 C/24D0B740
(1 row)

SELECT pg_last_wal_replay_lsn();

 pg_last_wal_replay_lsn
------------------------
 C/24D15138
(1 row)

MASTER server

Display the replication connection.
ps -eaf|grep postgres

It should display a line that contains
postgres: walsender replication 10.1.1.101(49840) streaming C/24C90AF8

login the psql,

sudo su postgres
psql

SELECT pg_current_wal_lsn();

 pg_current_wal_lsn
--------------------
 C/24D243A0
(1 row)


SELECT * FROM pg_stat_replication;


-[ RECORD 1 ]----+------------------------------
pid              | 2322396
usesysid         | 98654
usename          | replication
application_name | walreceiver
client_addr      | 10.1.1.101
client_hostname  |
client_port      | 49840
backend_start    | 2021-11-24 17:51:54.164144+08
backend_xmin     |
state            | streaming
sent_lsn         | C/24D88B18
write_lsn        | C/24D88B18
flush_lsn        | C/24D88B18
replay_lsn       | C/24D88B18
write_lag        | 00:00:00.000447
flush_lag        | 00:00:00.001416
replay_lag       | 00:00:00.001419
sync_priority    | 0
sync_state       | async


Determine if STANDBY is lagging.

On the STANDBY server, run these in psql.

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

It should result in a value of 0 if everything is in sync.

Display the last last records written.
select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------
pg_is_in_recovery             | t
pg_is_wal_replay_paused       | f
pg_last_wal_receive_lsn       | C/24DF6580
pg_last_wal_replay_lsn        | C/24DF6580
pg_last_xact_replay_timestamp | 2021-11-24 17:52:29.764273+08


Troubleshooting

Error Canceling statement due to conflict with recovery

This is due to standby server getting updates/deletes in the WAL stream that will invalidate data currently being accessed by a running query. Usually occurs when queries are time consuming.

Solution:
Edit data/postgresql.conf to increase max_standby_streaming_delay, which allows the standby server to intentionally increase replication lag to allow queries to finish. Case where write of archives are taking a longer time than expected, then increase max_standby_archive_delay.

Note: Posting this ahead of time, for comments on understandability of information.

No comments:

Blog Archive