What is a PostgreSQL Generated Column?
- A generated column is a special column that is always computed from other columns.
- Support for generated columns was added in PostgreSQL 12.
Virtual And Stored
- PostgreSQL has only implemented support for stored generated columns, and thus the STORED keyword must be provided.
- The other type of generated column that is not supported is virtual.
- A stored generated column is one which is calculated whenever the relevant related data is updated/inserted, and the result is stored and thus takes up storage. On the other hand, a virtual one would simply be calculated at the point of running queries, and would not be stored anywhere.
- These stored generated columns can be indexed, which can dramatically improve query performance.
- However, they can not be used as a partition key for partitioned tables.
- Due to the fact that the value is calculated just once, and stored, this is very efficient for read-heavy workloads, unlike virtual generated columns.
Example
The following example creates a table for rectangles, whose area can be calculated as the width times the height.
CREATE TABLE rectangles ( "id" UUID PRIMARY KEY, "width" DECIMAL(3,2), "height" DECIMAL (3,2), "area" DECIMAL(6,4) GENERATED ALWAYS AS ("width" * "height") STORED );
We use a stored generated column to store the area of the rectangle so that we don't have to perform this mathematical operation in our queries or application layer. By using a generated column rather than a non-generated one, we do not risk having data out-of-sync. E.g. the application layer updates the width or height, but forgets to update the area.
Optionally Add Index
As stated before, one can index these generated columns, which may be useful in situations where you have millions of rows, and want to quickly sort by the area, or just pick the largest/smallest.
CREATE INDEX on rectangles ("area");
Insert Examples
INSERT INTO rectangles ("id", "width", "height") VALUES ('96c90ad7-7fd6-4e7d-87e7-cf9346cd086a', 3.1, 3.2), ('96c90af1-9276-456a-948c-cecdf81fdb9d', 4.5, 2.3), ('96c90af1-9357-4189-820a-ea11adbe4f20', 6.4, 1.2) ;
Now we can select everything from the rectangles table, which will now show the area.
SELECT * FROM rectangles;
id | width | height | area --------------------------------------+-------+--------+--------- 96c90ad7-7fd6-4e7d-87e7-cf9346cd086a | 3.10 | 3.20 | 9.9200 96c90af1-9276-456a-948c-cecdf81fdb9d | 4.50 | 2.30 | 10.3500 96c90af1-9357-4189-820a-ea11adbe4f20 | 6.40 | 1.20 | 7.6800 (3 rows)
Now we can find the largest rectangle by simply executing:
SELECT * FROM rectangles ORDER BY area DESC limit 1;
id | width | height | area --------------------------------------+-------+--------+--------- 96c90af1-9276-456a-948c-cecdf81fdb9d | 4.50 | 2.30 | 10.3500 (1 row)
If we didn't have the generated column, we would have to do something like below, which looks very messy/confusing, especially with all the * symbols.
SELECT *, width * height as area FROM rectangles ORDER BY area DESC LIMIT 1;
This query won't work whilst the table has the generated column called area.