Internals of PostgreSQL: Chapter#1

fatima raza
Jan 27, 2023
2 min read

Updated: Feb 12, 2023

First chapter basically covers the logical and physical structure of a postgreSQL database clusters, how data is read, written, and the arrangement of the heap file. Logican structure of database cluster: Basically database cluster is collection of databases managed by single postgreSQL server.

Database: collection of database objects
Database Objects: data structure used to either store or to reference data.e-g (heap) table, index, sequence, view, function and so on.infact in PostgreSQL,
databases themselves are database objects which are logically seperated from each other
All database objects are managed by OIDs(Object Identifiers),unsigned 4-byte integers
relation between DB Objects and OIDs are stored in "system catalog"

Physical structure of database cluster: Base Directory (Database Cluster) || Database subdirectories || atleast one file(indexes,tables) sstored under subdirectory of database Also some contains particular data and configuration files Layout of Database Cluster: this basically includes different files and its description Layout of Databases: its a subdirectory under base directory and their names are identical to OIDs Layout of files associated with tables and indexes: Tables and indexes are internally managed by individual OIDs While data files are managed by variable, relfilenode

The relfilenode values of tables and indexes are changed by the TRUNCATE, REINDEX, CLUSTER commands.
When a table is truncated, PostgresSQL assigns a new relfilenode to the table, removes the old data file, and creates a new one. When the file size of tables and indexes exceeds 1GB, PostgreSQL creates a new file named like relfilenode.1 and uses it. If the new file has been filled up, next new file named like relfilenode.2 will be created,and so on.
Another important files are the free space map and visibility map which are suffixed respectively with '_fsm' and '_vm'.

Tablespaces: PostgreSQL also supports tablespaces, which is a directory that contains some data outside of the base directory. A tablespace is created under the directory specified when you issue CREATE TABLESPACE statement. Internal Layout of a Heap Table File:

The data file (heap table, index, the free space map and visibility map) is divided into pages (or blocks) of fixed length,default is 8kb.
The pages within each file are numbered sequentially from 0, and such numbers are called as block numbers.
If the file has been filled up, PostgreSQL adds a new empty page to the end of the file to increase the file size.

A page within a table contains three kinds of data:

heap tuple(s): A heap tuple is a record data itself. They are stacked in order from the bottom of the page
line/item pointer(s): A line pointer (4 byte long) holds a pointer to each heap tuple.
header data: It is 24 byte long and contains general information about the page

The Methods of Writing and Reading Tuples: Writing: Diagrams of before and after insertion shows clear description. Reading:

Sequential scan – All tuples in all pages are sequentially read by scanning all line pointers in each page.
B-tree index scan – An index file contains index tuples, each of which is composed of an index key and a TID pointing to the target heap tuple. If the index tuple with the key that you are looking for has been found,PostgreSQL reads the desired heap tuple using the obtained TID value

More details: https://age.apache.org/

F

Urooj Fatima Raza

Internals of PostgreSQL: Chapter#1

Recent Posts

Comments