Write Ahead Logging — WAL
In the field of computer science, WAL is an acronym of Write Ahead Logging, which is a protocol or a rule to write both changes and actions into a transaction log, whereas in PostgreSQL, WAL is an acronym of Write Ahead Log. There the term is used as synonym of transaction log, and also used to refer to an implemented mechanism related to writing action to a transaction log (WAL).
PostgreSQL : Write A Head Logs
Reliability is one of the important aspects of database management system. PostgreSQL allows reliability using WAL Files.
WAL is a Write – ahead log file which stores the transactions log information in the WAL files which helps us with durability.
Whenever a change is occurred in the database, the changed transaction is written in the wal buffers and upon subsequent commit, an operating systems wal_sync_method call is made and the contents in the wal buffer is written to the wal file.
What is WALWritelock
- This lock is used to write transaction log buffer data to WAL file.
After taking this lock all the transaction log buffer data upto predecided
point will get flushed.
- Places where it get used are
– Flush of transaction log which can be due to Commit, Flush of data buffers, truncate of commit log etc.
– During Switch of Xlog.
– During get of new Xlog buffer, if all buffers are already occupied and not flushed.
- Get the time of the last xlog segment switch
The WAL File is stored under PG_DATA/pg_wal
The first file that is created is 000000010000000000000001.
here,
00000001 is physical
00000000 is logical
00000001 is timelineid
If the first one has been filled up with the writing of XLOG records, the second one 000000010000000000000002 would be provided.
Files of successor is used in ascending order in succession, after 0000000100000000000000FF has been filled up, next one 000000010000000100000000 will be provided. In this way, whenever the last 2-digit carries over, the middle 8-digit number increases one.
Example.,
A logical xlog file has 255 segments, physical is 1 segment, i.e., after 000000FF log switches, the logical value would be incremented by 1.
WAL Configuration:
In general, the transactions are first written to wal buffers and then to the WAL file, these wal files are needed for recovery purpose on a system crash.
The data is the shared buffers is then written to permanent storage at a later point of time especially during CHECKPOINT.
At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file.
In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed.
Excerpt from the WAL File
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 86/E5000028, prev 86/E4000108, desc: RUNNING_XACTS nextXid 2091743 latestCompletedXid 2091742 oldestRunningXid 2091743
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 86/E5000060, prev 86/E5000028, desc: RUNNING_XACTS nextXid 2091743 latestCompletedXid 2091742 oldestRunningXid 2091743
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 86/E5000098, prev 86/E5000060, desc: CHECKPOINT_ONLINE redo 86/E5000060; tli 1; prev tli 1; fpw false; xid 0:2091743; oid 190334; multi 1; offset 0; oldest xid 1664 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 2091743; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 86/E5000108, prev 86/E5000098, desc: RUNNING_XACTS nextXid 2091743 latestCompletedXid 2091742 oldestRunningXid 2091743
rmgr: Heap len (rec/tot): 71/ 71, tx: 2091743, lsn: 86/E5000140, prev 86/E5000108, desc: HOT_UPDATE off 48 xmax 2091743 ; new off 49 xmax 0, blkref #0: rel 1663/182014/190331 blk 1
The content highlighted in yellow and the previous log files are no longer required for recovery.
When full_page_writes is set and the system is very busy, setting wal_buffers higher will help smooth response times during the period immediately following each checkpoint.
Internals of checkpoint
- After a checkpoint process starts, the REDO point is stored in memory; REDO point is the location to write the XLOG record at the moment when the latest checkpoint is started, and is the starting point of database recovery.
- A XLOG record of this checkpoint (i.e. checkpoint record) is written to the WAL buffer. The data-portion of the record is defined by the structure CheckPoint, which contains several variables such as the REDO point stored with step (1).
- In addition, the location to write checkpoint record is literally called the checkpoint.
- All data in shared memory (e.g. the contents of the clog, etc..) are flushed to the storag
- All dirty pages on the shared buffer pool are written and flushed into the storage, gradually.
- The pg_control file is updated. This file contains the fundamental information such as the location where the checkpoint record has written (a.k.a. checkpoint location). The details of this file later.