High Availability of databases is one of major Enterprise-grade features required by any Organization. It is important that databases are always accessible, minimizing downtime and maintaining business continuity. In the world of databases like PostgreSQL or Oracle or SQL Server, High Availability is achievable with the introduction of a Standby (or Replica), that continuously replicates data from its Primary (or Master). PostgreSQL, being one of the robust Open Source databases, supports multiple replication methods like Streaming and Logical Replication, that allow us to achieve high availability. In this article, we shall discuss the internals of Streaming Replication for setting up high availability in PostgreSQL along with its architecture. In the next article of this series, we shall see the detailed steps involved in setting up Streaming Replication for High Availability.
Please Note: To maintain the best quality in our articles, we avoid using any LLMs to create content as LLMs can hallucinate. Our high quality content may contain some manageable grammatical errors.
High Availability in PostgreSQL
In order to achieve High Availability in PostgreSQL, there is a need to setup a replica or a standby that is able to continuously stream the changes from a primary database. This will enable the Application to switch to a standby in the event of a failover.
The most reliable and the HexaCluster recommended ways to achieve the same in PostgreSQL is to setup Streaming Replication. This replication technique is implemented by most robust tools like Patroni that supports Automatic failover. We will discuss about Patroni in another article.
Let us now understand what is Streaming Replication in PostgreSQL.
What is Streaming Replication ?
Streaming replication in PostgreSQL is a mechanism that allows a standby server to maintain an up-to-date copy of the primary server’s data by continuously streaming the changes from the primary to the standby in real-time. This setup is commonly used for high availability, load balancing, and disaster recovery.
In order to achieve this, WAL records containing the changes made to the primary database are streamed to the standby database over a streaming replication protocol, ensuring that the standby is as up-to-date as possible.
WAL (Write-Ahead Logging)
If you would have previously worked on databases like Oracle, SQL Server or MySQL, you might have heard about a REDO log or a Transaction Log. Similarly, a Write-Ahead Log (WAL) records all changes made to the database’s data before they are written to the actual data files.
The core principle of WAL is that the changes must be written to the WAL before they are written to the data files. This ensures that in the event of a crash, the system can replay WAL records to recover the database to a consistent state.
Now that we understood the important of WAL segments for the purpose of Streaming Replication, let us take a look at some of the background processes that enable Streaming Replication.
Following are some of the background processes you should know to understand how Streaming Replication works internally.
- WAL sender
- WAL sender is a background process resposible for streaming WAL records from Primary to a Standby server.
- WAL receiver
- WAL receiver is a background process that continuously receives WAL (Write-Ahead Log) records from the primary server over a network connection. This process writes the incoming WAL records to disk as WAL files (usually in the pg_wal directory). This process additionally triggers the startup process that new WAL records are available for replay.
- Startup Process
- This background process is responsible for reading the WAL records from disk and replays them to update the data files with the changes received. This is the same process resposible for replaying the WAL records from the Archived WAL segments, when configured.
How does Streaming Replication work internally ?
To understand the internals, let us take a High Availability PostgreSQL cluster that involves a Primary (Master) and a Standby (Replica or Reader) as shown in the image above. Consider a DML operation involving an INSERT or UPDATE or a DELETE of some records from a table in the primary postgres instance. Following is the cycle involved ensuring that the primary and standby servers are in replication.
Please Note: In Streaming Replication, the writes involving DDL or DML can only be executed on the primary server, as the standby is always read-only.
- For the changes executed on a database, corresponding WAL (Write-Ahead Log) records are generated. These WAL records includes all the information necessary to reproduce the operation.
- WAL records are temporarily stored in WAL buffers located in the main memory of the primary server. Each WAL record begins with a header that includes its LSN (Log Sequence Number) which is unique from other WAL records. LSN is helpful in identifying the position of a particular WAL record across WAL segments. The LSN increments with each WAL record as it is mandatory to follow the order in which the changes have to be recovered on a Standby instance.
- The WAL writer background process writes the buffered WAL records to the WAL Segements on the disk to ensure durability. So, the changes performed through the DML operation are also written to WAL segments on the disk before they are acknowledged as committed.
- When Streaming Replication is configured, the WAL Receiver (wal receiver process) sends a connection request to the primary server.
- Once the Primary server accepts the connection request, the WAL Sender process (wal sender) is started on the primary server for each connection received using the streaming replication protocol. In the case of Streaming Replication, we can only have one WAL sender and only one WAL receiver process per a PostgreSQL Instance.
- The WAL Receiver process on the Standby server now initiates a handshake along with its last known LSN (Log Sequence Number).
- The WAL sender streams the data starting from the last know LSN by Standby, until the latest LSN of the primary server.
- The WAL receiver sends a request to the “startup” process on the standby server to replay the WAL records until the WAL receiver’s last known lsn. The startup process applies these changes to the data files and brings the database to a consistent state.
Once we setup the streaming replication, we may see 5 different states in a Primary server.
hexacluster=# SELECT application_name,state FROM pg_stat_replication;
application_name | state
------------------+-----------
standby1 | startup
standby2 | catchup
standby3 | streaming
pg_basebackup | backup
standby4 | stopping
Following is the explanation of each of the above states.
- startup: The WAL sender is still starting up.
- catchup: The standby instance is catching up with the primary instance.
- streaming: The changes are being successfully streamed from primary to standby server and the standby server is caught up with the primary.
- backup: The WAL sender is sending a backup.
- stopping: This WAL sender is stopping.
Conclusion
Streaming Replication is a much needed replication method for achieving high availability in PostgreSQL. It may be possible that you have setup logical replication to achieve an Active-Active replication method similar to Oracle with Golden Gate. While logical replication works almost good, it cannot always be considered as a robust High availability setup. In our next article, we will discuss the steps involved in setting up Streaming Replication in PostgreSQL and also the types of Streaming replication techniques.
Are you looking to migrate from Oracle to PostgreSQL ? We are here to support with a simple and seamless migration experience within 5 clicks using HexaRocket. Request us for a demo on HexaRocket today: Schedule a demo.
Contact Us Today!