PostgreSQL replication is a method of copying and maintaining database objects, such as schemas and tables, in multiple database systems distributed across different locations. This article will guide you through the steps to set up and enable data replication for PostgreSQL on Ubuntu.
Table of Contents
Types of PostgreSQL Data Replication
PostgreSQL data replication refers to the process of copying data from one PostgreSQL database to another, and keeping the data in these databases synchronized. This practice is often used in database management to enhance data security, increase availability, and distribute the workload across multiple systems.
At a basic level, data replication is like making a copy of a book, and then ensuring that any changes to the original book are also made to the copy. In the context of databases, replication involves copying the entire database (full replication) or parts of the database (partial replication) to another system. Any changes made to the data in the original database are then mirrored in the replicated database.
There are two main forms of PostgreSQL replication: physical replication and logical replication.
Streaming Replication
Physical replication, often referred to as streaming replication, duplicates the exact physical files of the database from one server to another. It’s a binary copy of the database cluster’s files, providing a hot standby system. Physical replication is straightforward and efficient, as it duplicates data exactly as is, but it lacks flexibility because you can’t replicate a subset of the database or transform data during the replication process.
Logical Replication
Logical replication, on the other hand, is more flexible. It allows you to replicate specific tables and even transform data while replicating. This is achieved by sending the changes of the data (the write-ahead log or WAL) in a logical format that includes the data and the changes, rather than in a binary format.
PostgreSQL data replication is an essential aspect of maintaining and managing databases. It helps in disaster recovery, load balancing, and it can also be a strategic part of migrating data across different systems. Understanding how it works is crucial for anyone managing PostgreSQL databases, whether using physical or logical replication.
How To Set Up PostgreSQL Replication
Prerequisites
- Two Ubuntu 20.04 servers: A primary server and a standby server.
- PostgreSQL installed on both servers.
- Sudo or root access to both servers.
Step 1: Configuring the Primary Server
1.1 Editing the PostgreSQL Configuration File
Open the PostgreSQL configuration file located at /etc/postgresql/{version}/main/postgresql.conf
in a text editor:
sudo nano /etc/postgresql/{version}/main/postgresql.conf
Replace {version}
with your PostgreSQL version.
Modify the following parameters:
wal_level
– set this toreplica
.max_wal_senders
– set this to the number of standby servers that you have.wal_keep_segments
– set this to a higher value like 64. This represents the number of past log file segments kept in the pg_xlog directory.
wal_level = replica
max_wal_senders = 3
wal_keep_segments = 64
1.2 Editing the pg_hba.conf File
In the same directory, edit the pg_hba.conf
file:
sudo nano /etc/postgresql/{version}/main/pg_hba.conf
Add the following line at the end of the file. Replace {standby-ip-address}
with the IP address of the standby server:
host replication all {standby-ip-address}/32 md5
1.3 Restart PostgreSQL
After editing the configuration files, restart the PostgreSQL service:
sudo systemctl restart postgresql
Step 2: Configuring the Standby Server
2.1 Stop the PostgreSQL Service
Before we can configure the standby server, we must stop the PostgreSQL service:
sudo systemctl stop postgresql
2.2 Copying Data from the Primary Server
We will use the pg_basebackup
tool to copy data from the primary server. This tool makes a binary copy of the database files while the system is running.
sudo -u postgres pg_basebackup -h {primary-ip-address} -D /var/lib/postgresql/{version}/main -U replication -v -P --wal-method=stream
Replace {primary-ip-address}
with the IP address of the primary server and {version}
with your PostgreSQL version.
2.3 Creating the Recovery Configuration File
The recovery configuration file tells the standby server to run in standby mode and fetch missing WAL files from the primary server.
Create a file named standby.signal
in the PostgreSQL data directory:
sudo touch /var/lib/postgresql/{version}/main/standby.signal
Then, create a postgresql.auto.conf
file:
sudo nano /var/lib/postgresql/{version}/main/postgresql.auto.conf
Add the following lines:
primary_conninfo = 'host={primary-ip-address} user=replication password={your-password} application_name={standby-hostname}'
Replace {primary-ip-address}
with the IP address of the primary server, {your-password}
with the password you want to use, and {standby-hostname}
with the hostname of the standby server.
2.4 Start the PostgreSQL Service
Finally, start the PostgreSQL service:
sudo systemctl start postgresql
For subsequent configuration updates to configuration, you can reload without restarting PostgreSQL service.