High Performance PostgreSQL for Rails: page 26/27

@andatki

Hi, page 26 says

Data cannot be split across pages.

But two paragraphs later

Data may not fit in a single page, may be split across pages

We are doing a book club on this and came across the same issue

In PostgreSQL, data is laid out in pages. Data cannot be split across pages. Data is stored efficiently when there are minimal gaps and data that is accessed resides within a single page. Data that’s in a single page requires fewer disk operations compared with accessing data that is split across multiple pages.

Do you have additional comments I can share with the club?

1 Like

Hello @pzac! Thanks for reading the book and providing this feedback. These contradictory statements you pointed out are confusing. I’ll work on rewording that section of the Introduction. I’d also like to address the physical storage topic here and add some links.

While PostgreSQL physical storage is beyond the scope of the book, and there are dedicated resources for it[1], this section does try and provide an overview. It is important to me it’s accurate and useful, otherwise it should be cut. Concepts like pages, tuples/row versions, and TOAST, help users build a mental model of what’s happening, and get familiar with terms they’ll encounter in documentation, even if they mostly work with the “abstractions” of tables, columns, and indexes.

I’ll try and briefly communicate those two points more helpfully. There are at least two scenarios where user data doesn’t fit in a page.

One scenario is when a row update would go into a page that’s filled up, or the content would exceed the available space.

If the file is full, PostgreSQL adds a new empty page to the end of the file to increase the file size. [1]

The second scenario is for user data that’s variable in length, or is large sized like JSON text data. PostgreSQL allows up to 1GB of data to be stored in a column,[2] which is a lot!

Since PostgreSQL by default uses 8kb pages,[5] 1GB of data could not fit in that size. How does that work?

PostgreSQL uses a system called “TOAST” (Total Oversized Attribute Storage), which handles storing large sized data beyond the 8kb size limitation. TOAST uses a special toast table behind the scenes, and the data is chunked up and spread out among multiple pages.[3] This is handled transparently for users and adds minimal overhead.

If you’re like me, besides reading docs, you might also want to learn technical topics in other formats. The Postgres.fm podcast covered TOAST[4] in a past episode, and it’s worth a listen.

I’ll revise that section of the book and post revisions back here.

Thank you for this feedback. Please continue to bring up anything that’s contradictory or otherwise confusing, since by putting a spotlight on it, it helps me consider the educational quality of the wording and hopefully improve it.

[1]: The Internals of PostgreSQL : Chapter 1 Database Cluster, Databases, and Tables 1.3. Internal Layout of a Heap Table File
[2]: PostgreSQL: Documentation: 16: Appendix K. PostgreSQL Limits
[3]: PostgreSQL: Documentation: 16: 73.2. TOAST
[4]: Postgres FM | TOAST
[5]: PostgreSQL: Documentation: 16: 73.6. Database Page Layout

Hi @andatki ,
Thanks for the explanation, it is much appreciated. So far I’m quite pleased with your book and I’ll be happy to provide constructive feedback if I’ll spot anything else.
Cheers

1 Like

Hello @ksolo.

That’s so cool to hear that you’re doing a book club. How many members do you have in it? Is there a particular area like query performance or growth management, or something else you’re especially interested in?

I’ve recently answered another similar post about this same section. Can you take a look at the reply linked below. I think it will help you out as well.

I’ll also be revising that section in the book, so you should see those revisions reflected in the next Beta update.

Please keep the feedback coming!

Thank you @andatki

We have 8 members right now. Our goal is to cover 1 chapter a week and we meet every other week. Our next session will cover chapters 3 & 4.

With regards to specific topics, we have some intense queries in our app and would love to find ways to optimize them. Many are written in custom SQL and we are also trying to push them more into ActiveRecord queries. Any tips on leveraging less known ActiveRecord features that can help with performance would be extremely helpful.

1 Like

Wow, that’s great @ksolo! Chapter 6 “Optimizing Active Record” and Chapter 7 “Improving Query Performance” (focused on SQL) will more directly address what you’re interested in. Chapter 8 “Optimized Indexes for Fast Retrieval” helps the team learn to read query execution plans, and various index types and query patterns they’re useful for, along with common issues like missing indexes, or how to check IO/buffers. I can also recommend more books focused specifically on query optimization. I’m interested in those too!

I’m excited to follow along with the book club via the forum. Please continue to post questions or suggestions. Thanks again for sharing this information. It’s really motivational to me to do the best job I can!