PostgreSQL Tutorial for Absolute Beginners [Administration]
About Lesson

The easiest way to perform a base backup is to use the pg_basebackup tool. It can create a base backup either as regular files or as a tar archive. If more flexibility than pg_basebackup can provide is required, you can also make a base backup using the low level API which discussed earlier.

Command

pg_basebackup  -D <location_of_backup_file> -F t -z -U postgres -w

 

To make use of the backup, you will need to keep all the WAL segment files generated during and after the file system backup. To aid you in doing this, the base backup process creates a backup history file that is immediately stored into the WAL archive area. This file is named after the first WAL segment file that you need for the file system backup.

For example., here  starting WAL file is 1/C1000028backup history file is created with name 0000000200000001000000C1.00000028.backup

Once you have safely archived the file system backup and the WAL segment files used during the backup (as specified in the backup history file), all archived WAL segments with names numerically less are no longer needed to recover the file system backup and can be deleted.

Case Study

I have a consistent backup taken at 0000000200000001000000C1.00000028.backup.

  • I then stopped my cluster.
  • Extracted the base backup into /u01/pgsql/backupslocation.
  • Deleted the archivelogs from the archive locations and tried starting the cluster with /u01/pgsql/backupsas location and got below error

 

20191026 18:50:48.952 IST [3323] LOG:  database system was interrupted; last known up at 20191026 18:38:10 IST
20191026 18:50:48.972 IST [3323] LOG:  invalid checkpoint record
20191026 18:50:48.972 IST [3323] FATAL:  could not locate required checkpoint record
20191026 18:50:48.972 IST [3323] HINT:  If you are not restoring from a backup,try removing the file “/u01/pgsql/backups/backup_label”.

Point in Time Recovery

  1. Stop the server, if it’s running.
  2. If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you should at least save the contents of the cluster’s pg_wal subdirectory, as it might contain logs which were not archived before the system went down.
  3. Remove all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
  4. Restore the database files from your file system backup. Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.
  5. Remove any files present in pg_wal/; these came from the file system backup and are therefore probably obsolete rather than current. If you didn’t archive pg_wal/ at all, then recreate it with proper permissions, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.
  6. If you have unarchived WAL segment files that you saved in step 2, copy them into pg_wal/. (It is best to copy them, not move them, so you still have the unmodified files if a problem occurs and you have to start over.)
  7. Create a recovery command file recovery.conf in the cluster data directory
  8. Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.
  9. Inspect the contents of the database to ensure you have recovered to the desired state. If not, return to step 1. If all is well, allow your users to connect by restoring pg_hba.conf to normal.

Example.,

Suppose that you made a mistake at 12:05 GMT of 26 October, 2019.

You should remove the database cluster and restore the new one using the base backup you made before that.

Then, create a recovery.conf file, and set the time of the parameter recovery_target_time within this file at the point you made the mistake.

When PostgreSQL starts up, it enters into PITR mode if there are a recovery.conf and a backup_label in the database cluster.

Where are WAL segments/Archive logs read from?

  • Normal recovery mode – from the pg_xlog subdirectory (in version 10 or later, pg_wal subdirectory) under the base directory.
  • PITR mode – from an archival directory set in the configuration parameter archive_command.

Where is the checkpoint location read from?

  • Normal recovery mode – from a pg_control file.
  • PITR mode – from a backup_label file.

The outline of PITR process is described as follows:

(1) In order to find the REDO point, PostgreSQL reads the value of ‘CHECKPOINT LOCATION’ from the backup_label file with the internal function read_backup_label.

(2) PostgreSQL reads some values of parameters from the recovery.conf; in this example, restore_command and recovery_target_time.

(3) PostgreSQL starts replaying WAL data from the REDO point, which can be easily obtained from the value of ‘CHECKPOINT LOCATION’. The WAL data are read from archive logs which are copied from the archival area to a temporary area by executing the command written in the parameter resotere_command. (The copied log files in the temporary area are removed after using.)

In this example, PostgreSQL reads and replays WAL data from the REDO point to the one before the timestamp ‘2019-10-26 12:05:00’ because the parameter recovery_target_time is set to this timestamp. If a recovery target is not set to the recovery.conf, PostgreSQL will replay until end of archiving logs.

(4) When the recovery process completes, a timeline history file, such as ‘00000002.history’, is created in the pg_xlog subdirectory (in version 10 or later, pg_wal subdirectory); if archiving log feature is enabled, same named file is also created in the archival directory. The contents and role of this file are described in the following sections.

TimelineId and Timeline History File

An individual timelineId is assigned to each database cluster. The timelineId of original database cluster created by the initdb utility is 1. Whenever database cluster recovers, timelineId will be increased by 1. For example, in the example of the previous section, the timelineId of the cluster recovered from the original one is 2.

Timeline History File

When a PITR process completes, a timeline history file with names like ‘00000002.history’ is created under the archival directory and the pg_xlog subdirectory (in version 10 or later, pg_wal subdirectory). This file records which timeline it branched off from and when.

A timeline history file contains at least one line, and each line is composed of the following three items:

  • timelineId – timelineId of the archive logs used to recover.
  • LSN – LSN location where the WAL segment switches happened.
  • reason – human-readable explanation of why the timeline was changed.

Example.,

[postgres@postgres02 pg_wal]$ cat 00000006.history

1       1/B8000000      no recovery target specified

2       1/D6000000      no recovery target specified

5       1/E5000000      no recovery target specified

The database cluster (timelineId=5) is based on the base backup whose timelineId is 1, and is recovered by replaying the archive logs until the 0/E5000000.

Point-in-Time Recovery with Timeline History File

Create a recovery.conf file as shown below

restore_command = ‘cp /u01/archives/%f %p’

recovery_target_timeline = 5

and recovery happens like

(1) PostgreSQL reads the value of ‘CHECKPOINT LOCATION’ from the backup_label file.

(2) Some values of parameters are read from the recovery.conf; in this example, restore_command, and recovery_target_timeline.

(3) PostgreSQL reads the timeline history file ‘00000005.history’ which is corresponding to the value of the parameter recovery_target_timeline.

(4) PostgreSQL does replaying WAL data by the following steps:

  • From the REDO point to the LSN ‘0/E5000000’ which is written in the 00000005.history file, PostgreSQL reads and replays WAL data of appropriate archive logs whose timelineId is previous ID.
  • From the one afterLSN0/ E5000000’ to the current file, PostgreSQL reads and replays WAL data (of appropriate archive logs) whose timelineId is 5.

(5) When the recovery process completes, the current timelineId will advance to 3, and new timeline history file named 00000003.history is created in the pg_xlog subdirectory (pg_wal subdirectory if version 10 or later) and the archival directory.

Setting up a new redo log location

Step 1: stop database.

Step 2: create pg_xlog directory at different location and copy pg_xlog files from data directory.

Step 3: delete pg_xlog directory and create a symbolic link at data directory.

ln -s /u02/New_PostgreData/pg_xlog/  /u02/New_PostgreData_1/

Recovery Target Settings

By default, recovery will recover to the end of the WAL log. The following parameters can be used to specify an earlier stopping point. At most oneof recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time, or recovery_target_xid can be used; if more than one of these is specified in the configuration file, the last entry will be used.

recovery_target = ‘immediate’ consistent state has been reached
recovery_target_lsn= ‘0/3019838’ select pg_current_wal_lsn();
recovery_target_time=20191026 19:34:55
recovery_target_xid=634 select txid_current();

FEW MORE CASE STUDIES

Deleted pg_control

In order to recover the pg_control file, restore the pg_control file from a backup, and then re-create the WAL file with specifying the -x option to pg_resetxlog command. If the instance was terminated abnormally, specify -f option also at the same time. Instance cannot start in case that pg_control file is only restored.

Deleted WAL file

If an instance is successful, regardless of the case of abnormal termination, if the entire WAL file has been deleted, Instance cannot start under the condition. Execute pg_resetxlog command to re-create the WAL file. In case of immediately after an abnormal instance termination, specify -f option to pg_resetxlog command to create WAL files forcibly

Behavior on data file deletion

Instance started successfully. Error (ERROR category) occurred on the deleted table when accessing with the SELECT statement. Instance or session are not affected.

Logs are not output at the time of instance startup. At the time of SELECT statement execution, following log has been output.

Behavior of Visibility Map (VM) / Free Space Map (FSM) file deletion

Error does not occur on VM files, FSM file deletion, and SQL statement for the target table succeeds. These files are re-created on the next VACUUM.

Behavior at the time of pg_filenode.map file deletion

When pg_filenode.map file is deleted, it becomes impossible to mapping of the system catalog and the file name, and then cannot use the database.

Behavior at the time of PG_VERSION file deletion

When PG_VERSION file is deleted, the directory cannot be recognized as the directory for PostgreSQL database.

Join the conversation