Postgresql очистка буферного кэша

иногда я запускаю запрос Postgres, это занимает 30 секунд. Затем я сразу же запускаю тот же запрос, и это занимает 2 секунды. Похоже, что Postgres имеет какое-то кэширование. Могу я как-нибудь увидеть, что в этом тайнике? Могу ли я принудительно очистить все кэши для целей настройки?

Примечание: я в основном ищу версию postgres следующей команды SQL Server:

но я также хотел бы знать, как можно увидеть, что на самом деле содержится в этом буфер.

Спасибо за любую помощь.

вы можете увидеть, что находится в буферном кэше PostgreSQL, используя модуль pg_buffercache. Я сделал презентацию под названием "внутри буферного кэша PostgreSQL

Я не видел никаких команд для очистки кэшей в PostgreSQL. То, что вы видите, скорее всего, просто обычный индекс и кэш данных, считываемые с диска и хранящиеся в памяти. как postgresql, так и кэши в ОС. Чтобы избавиться от всего этого, единственный способ я знаю:

Что вы должны сделать, это:

завершение работы сервера баз данных (pg_ctl, sudo service postgresql stop, так далее.)
echo 3 > /proc/sys/vm / drop_caches Это очистит файл/блок ОС Кэш - очень важно, хотя я не знаю, как это сделать на других ОС.
запустите сервер баз данных

Я использую эту команду в своем окне linux:

Он полностью избавляется от кэша.

ответ Грега Смита о drop_caches был очень полезен. Я счел необходимым остановить и запустить службу postgresql, в дополнение к удалению кэшей. Вот скрипт, который делает трюк. (Моя среда-Ubuntu 14.04 и PostgreSQL 9.3.)

Я тестировал с запросом, который занял 19 секунд в первый раз и менее 2 секунд при последующих попытках. После запуска этого скрипта запрос снова занял 19 секунд.

да, postgresql, безусловно, имеет кэширование. Размер контролируется параметром shared_buffers. Кроме того, как упоминалось в предыдущем ответе, также используется кэш файлов ОС.

Если вы хотите посмотреть, что находится в кэше, есть модуль contrib под названием pg_buffercache доступно (в contrib/ в дереве источника, в contrib RPM или где угодно, где это подходит для того, как вы его установили). Как его использовать, указано в стандарте Документации PostgreSQL.

нет никаких способов очистить кэш буфера, кроме как перезапустить сервер. Вы можете удалить кэш ОС с помощью команды, упомянутой в другом ответе, при условии, что ваша ОС - Linux.

У меня была эта ошибка.

psql:/cygdrive/e / test_insertion.sql: 9: ошибка: тип параметра 53 (t_stat_gardien) не соответствует этому при подготовке плана (t_stat_avant)

Я искал промывку текущего плана и нашел это:

У меня было это между моими вставками, и это решает мою проблему.

здесь pg_buffercache модуль для просмотра shared_buffers кэш. И в какой-то момент мне нужно было удалить кэш, чтобы сделать некоторые тесты производительности на "холодный" кэш, поэтому я написал pg_dropcache расширение, которое делает именно это. Пожалуйста, проверьте это.

да, можно очистить оба общих буфера postgres cache и кэш ОС. Решение bellow для Windows. другие уже дали решение linux.

Как уже говорили многие, чтобы очистить общие буферы, вы можете просто перезапустить Postgres (нет необходимости перезагружать сервер). Но просто это не очистит кэш ОС.

чтобы очистить кэш ОС, используемый Postgres, после остановки службы используйте excelent RamMap (https://technet.microsoft.com/en-us/sysinternals/rammap), из превосходной сюиты Sysinternals. После выполнения RamMap просто нажмите "пустой" - > "пустой список ожидания" в главном меню.

перезапустите Postgres, и вы увидите, что ваш следующий запрос будет медленным из-за отсутствия кэша вообще.

вы также можете выполнить RamMap без закрытия Postgres и, вероятно, будете иметь результаты "без кэша", которые вы хотите, так как, как уже говорили люди, общие буферы обычно дают небольшое влияние по сравнению с кэшем ОС. Но для надежного теста я бы предпочел остановить postgres, как и все, прежде чем очищать кэш ОС, чтобы убедиться.

Примечание: AFAIK, я не рекомендую очищать другие вещи, кроме "списка ожидания" при использовании RamMap, потому что другие данные каким-то образом используются, и вы можете потенциально вызвать проблемы/потерять данные, если вы это сделаете. Помните, что вы очищаете память не только для файлов postgres, но и для любых других приложений и ОС также.

The previous series addressed isolation and multiversion concurrency control, and now we start a new series: on write-ahead logging. To remind you, the material is based on training courses on administration that Pavel pluzanov and I are creating (mostly in Russian, although one course is available in English), but does not repeat them verbatim and is intended for careful reading and self-experimenting.

This series will consist of four parts:

Buffer cache (this article). — how it is structured and used to recover the data. — why we need them and how we set them up. — levels and problems solved, reliability, and performance.

Part of the data that a DBMS works with is stored in RAM and gets written to disk (or other nonvolatile storage) asynchronously, i. e., writes are postponed for some time. The more infrequently this happens the less is the input/output and the faster the system operates.

But what will happen in case of failure, for example, power outage or an error in the code of the DBMS or operating system? All the contents of RAM will be lost, and only data written to disk will survive (disks are not immune to certain failures either, and only a backup copy can help if data on disk are affected). In general, it is possible to organize input/output in such a way that data on disk are always consistent, but this is complicated and not that much efficient (to my knowledge, only Firebird chose this option).

Usually, and specifically in PostgreSQL, data written to disk appear to be inconsistent, and when recovering after failure, special actions are required to restore data consistency. Write-ahead logging (WAL) is just a feature that makes it possible.

Oddly enough, we will start a talk on WAL with discussing the buffer cache. The buffer cache is not the only structure that is stored in RAM, but one of the most critical and complicated of them. Understanding how it works is important in itself; besides we will use it as an example in order to get acquainted with how RAM and disk exchange data.

Caching is used in modern computer systems everywhere; a processor alone has three or four levels of cache. In general cache is needed to alleviate the difference in the performances between two kinds of memory, of which one is relatively fast, but there is not enough of it to go round, and the other one is relatively slow, but there is quite enough of it. And the buffer cache alleviates the difference between the time of access to RAM (nanoseconds) and disk storage (milliseconds).

Note that the operating system also has the disk cache that solves the same problem. Therefore, database management systems usually try to avoid double caching by accessing disks directly rather than through the OS cache. But this is not the case with PostgreSQL: all data are read and written using normal file operations.

Besides, controllers of disk arrays and even disks themselves also have their own cache. This will come in useful when we reach a discussion of reliability.

But let's return to the DBMS buffer cache.

It is called like this because it is represented as an array of buffers. Each buffer consists of space for one data page (block) plus the header. The header, among the rest, contains:

The location of the page in the buffer (the file and block number there).
The indicator of a change to the data on the page, which will sooner or later need to be written to disk (such a buffer is called dirty).
The usage count of the buffer.
The pin count of the buffer.

The cache initially contains empty buffers, and all of them are chained into the list of free buffers. The meaning of the pointer to the «next victim» will be clear a bit later. A hash table in the cache is used to quickly find there a page you need.

When a process needs to read a page, it first attempts to find it in the buffer cache by means of the hash table. The file number and the number of the page in the file are used as the hash key. The process finds the buffer number in the appropriate hash bucket and checks whether it really contains the page needed. Like with any hash table, collisions may arise here, in which case the process will have to check several pages.

Usage of hash tables has long been a source of complaint. A structure like this enables to quickly find a buffer by a page, but a hash table is absolutely useless if, for instance, you need to find all buffers occupied by a certain table. But nobody has suggested a good replacement yet.

If the page needed is found in the cache, the process must «pin» the buffer by incrementing the pin count (several processes can concurrently do this). While a buffer is pinned (the count value is greater than zero), it is considered to be used and to have contents that cannot «drastically» change. For example: a new tuple can appear on the page — this does no harm to anybody because of multiversion concurrency and visibility rules. But a different page cannot be read into the pinned buffer.

It may so happen that the page needed will not be found in the cache. In this case, the page will need to be read from disk into some buffer.

If empty buffers are still available in the cache, the first empty one is chosen. But they will be over sooner or later (the size of a database is usually larger than the memory allocated for the cache), and then we will have to choose one of the occupied buffers, evict the page located there and read the new one into the freed space.

The eviction technique is based on the fact that for each access to a buffer, processes increment the usage count in the buffer header. So the buffers that are used less often than the others have a smaller value of the count and are therefore good candidates for eviction.

The clock-sweep algorithm circularly goes through all buffers (using the pointer to the «next victim») and decreases their usage counts by one. The buffer that is selected for eviction is the first one that:

has a zero usage count
has a zero pin count (i. e. is not pinned)

Once the buffer is found, the following happens to it.

The buffer is pinned to show other processes that it is used. Other locking techniques are used, in addition to pinning, but we will discuss this in more detail later.

If the buffer appears to be dirty, that is, to contain changed data, the page cannot be just dropped — it needs to be saved to disk first. This is hardly a good situation since the process that is going to read the page has to wait until other processes' data are written, but this effect is alleviated by checkpoint and background writer processes, which will be discussed later.

Then the new page is read from disk into the selected buffer. The usage count is set equal to one. Besides, a reference to the loaded page must be written to the hash table in order to enable finding the page in future.

The reference to the «next victim» now points to the next buffer, and the just loaded buffer has time to increase the usage count until the pointer goes circularly through the entire buffer cache and is back again.

As usual, PostgreSQL has an extension that enables us to look inside the buffer cache.

Let's create a table and insert one row there.

What will the buffer cache contain? At a minimum, there must appear the page where the only row is added. Let's check this using the following query, which selects only buffers related to our table (by the relfilenode number) and interprets relforknumber :

Just as we thought: the buffer contains one page. It is dirty ( isdirty ), the usage count ( usagecount ) equals one, and the page is not pinned by any process ( pinning_backends ).

Now let's add one more row and rerun the query. To save keystrokes, we insert the row in another session and rerun the long query using the \g command.

No new buffers were added: the second row fit on the same page. Pay attention to the increased usage count.

The count also increases after reading the page.

But what if we do vacuuming?

VACUUM created the visibility map (one-page) and the free space map (having three pages, which is the minimum size of such a map).

We can set the cache size using the shared_buffers parameter. The default value is ridiculous 128 MB. This is one of the parameters that it makes sense to increase right after installing PostgreSQL.

Note that a change of this parameter requires server restart since all the memory for the cache is allocated when the server starts.

What do we need to consider to choose the appropriate value?

Even the largest database has a limited set of «hot» data, which are intensively processed all the time. Ideally, it's this data set that must fit in the buffer cache (plus some space for one-time data). If the cache size if less, then intensively used pages will be constantly evicting one another, which will cause excessive input/output. But blindly increasing the cache is no good either. When the cache is large, the overhead costs of its maintenance will grow, and besides, RAM is also required for other use.

So, you need to choose the optimal size of the buffer cache for your particular system: it depends on the data, application, and load. Unfortunately, there is no magic, one-size-fits-all value.

It is usually recommended to take 1/4 of RAM for the first approximation (PostgreSQL versions lower than 10 recommended a smaller size for Windows).

And then we should adapt to the situation. It's best to experiment: increase or reduce the cache size and compare the system characteristics. To this end, you, certainly, need a test rig and you should be able to reproduce the workload. — Experiments like these in a production environment are a dubious pleasure.

Be sure to look into Nikolay Samokhvalov's presentation at PostgresConf Silicon Valley 2018: The Art of Database Experiments.

But you can get some information on what's happening directly on your live system by means of the same pg_buffercache extension. The main thing is to look from the right perspective.

For example: you can explore the distribution of buffers by their usage:

In this case, multiple empty values of the count correspond to empty buffers. This is hardly a surprise for a system where nothing is happening.

We can see what share of which tables in our database is cached and how intensively these data are used (by «intensively used», buffers with the usage count greater than 3 are meant in this query):

For example: we can see here that the vac table occupies most space (we used this table in one of the previous topics), but it has not been accessed long and it is not evicted yet only because empty buffers are still available.

You can consider other viewpoints, which will provide you with food for thought. You only need to take into account that:

You need to rerun such queries several times: the numbers will vary in a certain range.
You should not continuously run such queries (as part of monitoring) since the extension momentarily blocks accesses to the buffer cache.

Bulk read and write operations are prone to the risk that useful pages can be fast evicted from the buffer cache by «one-time» data.

To avoid this, so called buffer rings are used: only a small part of the buffer cache is allocated for each operation. The eviction is carried out only within the ring, so the rest of the data in the buffer cache are not affected.

For sequential scans of large tables (whose size is greater than a quarter of the buffer cache), 32 pages are allocated. If during a scan of a table, another process also needs these data, it does not start reading the table from the beginning, but connects to the buffer ring already available. After finishing the scan, the process proceeds to reading the «missed» beginning of the table.

Let's check it. To do this, let's create a table so that one row occupies a whole page — this way it is more convenient to count. The default size of the buffer cache is 128 MB = 16384 pages of 8 KB. It means that we need to insert more than 4096 rows, that is, pages, into the table.

Let's analyze the table.

Now we will have to restart the server to clear the cache of the table data that the analysis has read.

Let's read the whole table after the restart:

And let's make sure that table pages occupy only 32 buffers in the buffer cache:

But if we forbid sequential scans, the table will be read using index scan:

In this case, no buffer ring is used and the entire table will get into the buffer cache (along with almost the entire index):

Buffer rings are used in a similar way for a vacuum process (also 32 pages) and for bulk write operations COPY IN and CREATE TABLE AS SELECT (usually 2048 pages, but not more than 1/8 of the buffer cache).

Temporary tables are an exception from the common rule. Since temporary data are visible to only one process, there's no need for them in the shared buffer cache. Moreover, temporary data exist only within one session and therefore do not need protection against failures.

Temporary data use the cache in the local memory of the process that owns the table. Since such data are available to only one process, they do not need to be protected with locks. The local cache uses the normal eviction algorithm.

Unlike for the shared buffer cache, memory for the local cache is allocated as the need arises since temporary tables are far from being used in many sessions. The maximum memory size for temporary tables in a single session is limited by the temp_buffers parameter.

After server restart, some time must elapse for the cache to «warm up», that is, get filled with live actively used data. It may sometimes appear useful to immediately read the contents of certain tables into the cache, and a specialized extension is available for this:

Earlier, the extension could only read certain tables into the buffer cache (or only into the OS cache). But PostgreSQL 11 enabled it to save the up-to-date state of the cache to disk and restore it after a server restart. To make use of it, you need to add the library to shared_preload_libraries and restart the server.

After the restart, if the value of the pg_prewarm.autoprewarm parameter did not change, the autoprewarm master background process will be launched, which will flush the list of pages stored in the cache once every pg_prewarm.autoprewarm_interval seconds (do not forget to count the new process in when setting the value of max_parallel_processes).

Now the cache does not contain the table big :

If we consider all its contents to be critical, we can read it into the buffer cache by calling the following function:

The list of blocks is flushed into the autoprewarm.blocks file. To see the list, we can just wait until the autoprewarm master process completes for the first time, or we can initiate the flush manually like this:

The number of flushed pages already exceeds 4097; the pages of the system catalog that are already read by the server are counted in. And this is the file:

Now let's restart the server again.

Our table will again be in the cache after the server start.

That same autoprewarm master process provides for this: it reads the file, divides the pages by databases, sorts them (to make reading from disk sequential whenever possible) and passes them to a separate autoprewarm worker process for handling.

Предыдущий цикл был посвящен изоляции и многоверсионности PostgreSQL, а сегодня мы начинаем новый — о механизме журналирования (write-ahead logging). Напомню, что материал основан на учебных курсах по администрированию, которые делаем мы с Павлом pluzanov, но не повторяет их дословно и предназначен для вдумчивого чтения и самостоятельного экспериментирования.

Этот цикл будет состоять из четырех частей:

Буферный кеш (эта статья); — как устроен и как используется при восстановлении; и фоновая запись — зачем нужны и как настраиваются; — уровни и решаемые задачи, надежность и производительность.

, как ее понимают стандарт и PostgreSQL; — что творится на физическом уровне; ; ; ; (vacuum); (autovacuum); .

В процессе работы часть данных, с которыми имеет дело СУБД, хранится в оперативной памяти и записывается на диск (или на другой энергонезависимый носитель) отложенным образом. Чем реже это происходит, тем меньше ввод-вывод и тем быстрее работает система.

Но что произойдет в случае сбоя, например, при выключении электропитания или при ошибке в коде СУБД или операционной системы? Все содержимое оперативной памяти будет потеряно, а останутся лишь данные, записанные на диск (при некоторых видах сбоев может пострадать и диск, но в этом случае поможет лишь резервная копия). В принципе можно организовать ввод-вывод таким образом, чтобы данные на диске всегда поддерживались в согласованном состоянии, но это сложно и не слишком эффективно (насколько я знаю, только Firebird пошел таким путем).

Обычно же — в том числе и в PostgreSQL — данные, записанные на диск, оказываются несогласованными и при восстановлении после сбоя требуются специальные действия, чтобы согласованность восстановить. Журналирование — тот самый механизм, который делает это возможным.

Разговор о журналировании мы, как ни странно, начнем с буферного кеша. Буферный кеш — не единственная структура, которая хранится в оперативной памяти, но одна из самых важных и сложных. Понимание принципа его работы важно само по себе, к тому же на этом примере мы познакомимся с тем, как происходит обмен данными между оперативной памятью и диском.

Кеширование используется в современных вычислительных системах повсеместно, у одного только процессора можно насчитать три-четыре уровня кеша. Вообще любой кеш нужен для того, чтобы сглаживать разницу в производительности двух типов памяти, одна из которых относительно быстрая, но ее на всех не хватает, а другая — относительно медленная, но имеющаяся в достатке. Вот и буферный кеш сглаживает разницу между временем доступа к оперативной памяти (наносекунды) и к дисковой (миллисекунды).

Заметим, что у операционной системы тоже есть дисковый кеш, который решает ту же самую задачу. Поэтому обычно СУБД стараются избегать двойного кеширования, обращаясь к диску напрямую, минуя кеш ОС. Но в случае PostgreSQL это не так: все данные читаются и записываются с помощью обычных файловых операций.

Кроме того, свой кеш бывает также у контроллеров дисковых массивов, и даже у самих дисков. Этот факт нам еще пригодится, когда мы доберемся до вопроса надежности.

Но вернемся к буферному кешу СУБД.

Называется он так потому, что представляет собой массив буферов. Каждый буфер — это место под одну страницу данных (блок), плюс заголовок. Заголовок, в числе прочего, содержит:

расположение на диске страницы, находящейся в буфере (файл и номер блока в нем);
признак того, что данные на странице изменились и рано или поздно должны быть записаны на диск (такой буфер называют грязным);
число обращений к буферу (usage count);
признак закрепления буфера (pin count).

Изначально кеш содержит пустые буферы, и все они связаны в список свободных буферов. Смысл указателя на «следующую жертву» станет ясен чуть позже. Чтобы быстро находить нужную страницу в кеше, используется хеш-таблица.

Когда процессу требуется прочитать страницу, он сначала пытается найти ее в буферном кеше с помощью хеш-таблицы. Ключом хеширования служит номер файла и номер страницы внутри файла. В соответствующей корзине хеш-таблицы процесс находит номер буфера и проверяет, действительно ли он содержат нужную страницу. Как и с любой хеш-таблицей, здесь возможны коллизии; в таком случае процессу придется проверять несколько страниц.

Использование хеш-таблицы давно вызывает нарекания. Такая структура позволяет быстро найти буфер по странице, но совершенно бесполезна, если, например, надо найти все буферы, занятые определенной таблицей. Но хорошую замену пока никто не предложил.

Если нужная страница найдена в кеше, процесс должен «закрепить» буфер, увеличив счетчик pin count (несколько процессов могут сделать это одновременно). Пока буфер закреплен (значение счетчика больше нуля), считается, что буфер используется и его содержимое не должно «радикально» измениться. Например, в странице может появиться новая версия строки — это никому не мешает благодаря многоверсионности и правилам видимости. Но в закрепленный буфер не может быть прочитана другая страница.

Может получиться так, что необходимая страница не будет найдена в кеше. В этом случае ее необходимо считать с диска в какой-либо буфер.

Если в кеше еще остались свободные буферы, то выбирается первый же свободный. Но рано или поздно они закончатся (обычно размер базы данных больше чем память, выделенная под кеш) и тогда придется выбирать один из занятых буферов, вытеснить находящуюся там страницу и на освободившееся место прочитать новую.

Механизм вытеснения основан на том, что при каждом обращении к буферу процессы увеличивают счетчик числа обращений (usage count) в заголовке буфера. Таким образом те буферы, которые используются реже остальных, имеют меньшее значение счетчика и являются хорошими кандидатами на вытеснение.

Алгоритм clock-sweep перебирает по кругу все буферы (используя указатель на «следующую жертву»), уменьшая на единицу их счетчики обращений. Для вытеснения выбирается первый же буфер, который:

имеет нулевой счетчик обращений (usage count),
и не закреплен (нулевой pin count).

После того, как буфер найден, с ним происходит следующее.

Буфер закрепляется, чтобы показать остальным процессам, что он используется. Помимо закрепления используются и другие средства блокировки, но подробнее об этом мы поговорим отдельно.

Если буфер оказался грязным, то есть содержит измененные данные, страницу нельзя просто выбросить — сначала ее требуется сохранить на диск. Это не очень хорошая ситуация, поскольку процессу, который собирается прочитать страницу, приходится ждать записи «чужих» данных, но этот эффект сглаживается процессами контрольной точки и фоновой записи, которые будут рассмотрены позже.

Далее в выбранный буфер читается с диска новая страница. Счетчик числа обращений устанавливается в единицу. Кроме того, ссылку на загруженную страницу необходимо прописать в хеш-таблицу, чтобы в дальнейшем ее можно было найти.

Теперь ссылка на «следующую жертву» указывает на следующий буфер, а у только что загруженного есть время нарастить счетчик обращений, пока указатель не обойдет по кругу весь буферный кеш и не вернется вновь.

Как это принято в PostgreSQL, существует расширение, которое позволяет заглянуть внутрь буферного кеша.

Создадим таблицу и вставим в нее одну строку.

Что окажется в буферном кеше? Как минимум, в нем должна появиться страница, на которую добавлена единственная строка. Проверим это следующим запросом, в котором мы выбираем только буферы, относящиеся к нашей таблице (по номеру файла relfilenode), и расшифровываем номер слоя (relforknumber):

Так и есть — в буфере одна страница. Она грязная (isdirty), счетчик обращений равен единице (usagecount), и она не закреплена ни одним процессом (pinning_backends).

Теперь добавим еще одну строку и повторим запрос. Для экономии букв мы вставляем строку в другом сеансе, а длинный запрос повторяем командой \g .

Новых буферов не прибавилось — вторая строка поместилась на ту же страницу. Обратите внимание, что счетчик использований увеличился.

И после обращения к странице на чтение счетчик тоже увеличивается.

А если выполнить очистку?

Очистка создала карту видимости (одна страница) и карту свободного пространства (три страницы — минимальный размер этой карты).

Размер кеша устанавливается параметром shared_buffers. Значение по умолчанию — смехотворные 128 Мб. Это один из параметров, которые имеет смысл увеличить сразу же после установки PostgreSQL.

Имейте в виду, что изменение параметра требует перезапуска сервера, поскольку вся необходимая под кеш память выделяется при старте сервера.

Из каких соображений выбирать подходящее значение?

Даже самая большая база имеет ограниченный набор «горячих» данных, с которыми ведется активная работа в каждый момент времени. В идеале именно этот набор и должен помещаться в буферный кеш (плюс некоторое место для «одноразовых» данных). Если размер кеша будет меньше, то активно используемые страницы будут постоянно вытеснять друг друга, создавая избыточный ввод-вывод. Но и бездумно увеличивать кеш тоже неправильно. При большом размере будут расти накладные расходы на его поддержание, и кроме того оперативная память требуется и для других нужд.

Таким образом, оптимальный размер буферного кеша будет разным в разных системах: он зависит от данных, от приложения, от нагрузки. К сожалению, нет такого волшебного значения, которое одинаково хорошо подойдет всем.

Стандартная рекомендация — взять в качестве первого приближения 1/4 оперативной памяти (для Windows до версии PostgreSQL 10 рекомендовалось выбирать размер меньше).

А дальше надо смотреть по ситуации. Лучше всего провести эксперимент: увеличить или уменьшить размер кеша и сравнить характеристики системы. Конечно, для этого надо иметь тестовый стенд и уметь воспроизводить типовую нагрузку — на производственной среде такие опыты выглядят сомнительным удовольствием.

Но некоторую информацию о происходящем можно почерпнуть прямо на живой системе с помощью того же расширения pg_buffercache — главное, смотреть под нужным углом.

Например, можно изучить распределение буферов по степени их использования:

В данном случае много пустых значений счетчика — это свободные буферы. Неудивительно для системы, в которой ничего не происходит.

Можно посмотреть, какая доля каких таблиц в нашей базе закеширована и насколько активно используются эти данные (под активным использованием в этом запросе понимаются буферы со счетчиком использования больше 3):

Тут, например, видно, что больше всего место занимает таблица vac (мы использовали ее в одной из прошлых тем), но к ней уже давно никто не обращался и она до сих пор не вытеснена только потому, что еще не закончились свободные буферы.

Можно придумать и другие разрезы, которые дадут полезную информацию для размышлений. Надо только учитывать, что такие запросы:

надо повторять несколько раз: цифры будут меняться в определенных пределах;
не надо выполнять постоянно (как часть мониторинга) из-за того, что расширение кратковременно блокирует работу с буферным кешем.

При операциях, выполняющих массовое чтение или запись данных, есть опасность быстрого вытеснения полезных страниц из буферного кеша «одноразовыми» данными.

Чтобы этого не происходило, для таких операций используются так называемые буферные кольца (buffer ring) — для каждой операции выделяется небольшая часть буферного кеша. Вытеснение действует только в пределах кольца, поэтому остальные данные буферного кеша не страдают.

Для последовательного чтения (sequential scan) больших таблиц (размер которых превышает четверть буферного кеша) выделяется 32 страницы. Если в процессе чтения таблицы другому процессу тоже потребуются эти данные, он не начинает читать таблицу сначала, а подключается к уже имеющемуся буферному кольцу. После окончания сканирования он дочитывает «пропущенное» начало таблицы.

Давайте проверим. Для этого создадим таблицу так, чтобы одна строка занимала целую страницу — так удобнее считать. Размер буферного кеша по умолчанию составляет 128 Мб = 16384 страницы по 8 Кб. Значит, в таблицу надо вставить больше 4096 страниц-строк.

Теперь нам придется перезапустить сервер, чтобы очистить кеш от данных таблицы, которые прочитал анализ.

После перезагрузки прочитаем всю таблицу:

И убедимся, что табличными страницами в буферном кеше занято только 32 буфера:

Если же запретить последовательное сканирование, то таблица будет прочитана по индексу:

В этом случае буферное кольцо не используется и в буферном кеше окажется вся таблица полностью (и почти весь индекс тоже):

Похожим образом буферные кольца используются для процесса очистки (тоже 32 страницы) и для массовых операций записи COPY IN и CREATE TABLE AS SELECT (обычно 2048 страниц, но не больше 1/8 всего буферного кеша).

Исключение из общего правила представляют временные таблицы. Поскольку временные данные видны только одному процессу, им нечего делать в общем буферном кеше. Более того, временные данные существуют только в рамках одного сеанса, так что их не нужно защищать от сбоя.

Для временных данных используется кеш в локальной памяти того процесса, который владеет таблицей. Поскольку такие данные доступны только одному процессу, их не требуется защищать блокировками. В локальном кеше используется обычный алгоритм вытеснения.

В отличие от общего буферного кеша, память под локальный кеш выделяется по мере необходимости, ведь временные таблицы используются далеко не во всех сеансах. Максимальный объем памяти для временных таблиц одного сеанса ограничен параметром temp_buffers.

После перезапуска сервера должно пройти некоторое время, чтобы кеш «прогрелся» — набрал актуальные активно использующиеся данные. Иногда может оказаться полезным сразу прочитать в кеш данные определенных таблиц, и для этого предназначено специальное расширение:

Раньше расширение могло только читать определенные таблицы в буферный кеш (или только в кеш ОС). Но в версии PostgreSQL 11 оно получило возможность сохранять актуальное состояние кеша на диск и восстанавливать его же после перезагрузки сервера. Чтобы этим воспользоваться, надо добавить библиотеку в shared_preload_libraries и перезагрузить сервер.

Поле рестарта, если не менялось значение параметра pg_prewarm.autoprewarm, будет автоматически запущен фоновый процесс autoprewarm master, который раз в pg_prewarm.autoprewarm_interval будет сбрасывать на диск список страниц, находящихся в кеше (не забудьте учесть новый процесс при установке max_parallel_processes).

Сейчас в кеше нет таблицы big:

Если мы предполагаем, что все ее содержимое очень важно, мы можем прочитать ее в буферный кеш с помощью вызова следующей функции:

Список страниц сбрасывается в файл autoprewarm.blocks. Чтобы его увидеть, можно просто подождать, пока процесс autoprewarm master отработает в первый раз, но мы инициируем это вручную:

Число сброшенных страниц больше 4097 — сюда входят и уже прочитанные сервером страницы объектов системного каталога. А вот и файл:

Теперь снова перезапустим сервер.

И сразу после запуска наша таблица снова оказывается в кеше.

Это обеспечивает тот же самый процесс autoprewarm master: он читает файл, разделяет страницы по базам данных, сортирует их (чтобы чтение с диска было по возможности последовательным) и передает отдельному рабочему процессу autoprewarm worker для обработки.

Подолжаю публиковать авторскую переработку Understanding EXPLAIN от Guillaume Lelarge.
Ещё раз обращу внимание, что часть информации для краткости опущено, так что настоятельно рекомендую ознакомиться с оригиналом.

Предыдущие части:

Что происходит на физическом уровне при выполнениии нашего запроса? Разберёмся. Мой сервер поднят на Ubuntu 13.10. Используются дисковые кэши уровня ОС.
Останавливаю PostgreSQL, принудительно фиксирую изменения в файловой системе, очищаю кэши, запускаю PostgreSQL:

Теперь кэши очищены, пробуем выполнить запрос с опцией BUFFERS

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.10 rows=1000010 width=37) (actual time=0.525..734.754 rows=1000010 loops=1)
Buffers: shared read=8334
Total runtime: 1253.177 ms
(3 rows)

Таблица считывается частями — блоками. Кэш пуст. Таблица полностью считывается с диска. Для этого пришлось считать 8334 блока.
Buffers: shared read — количество блоков, считанное с диска.

Повторим последний запрос

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.10 rows=1000010 width=37) (actual time=0.173..693.000 rows=1000010 loops=1)
Buffers: shared hit=32 read=8302
Total runtime: 1208.433 ms
(3 rows)

Buffers: shared hit — количество блоков, считанных из кэша PostgreSQL.
Если повторите этот запрос несколько раз, то увидите, как PostgreSQL с каждым разом всё больше данных берёт из кэша. С каждым запросом PostgreSQL наполняет свой кэш.
Операции чтения из кэша быстрее, чем операции чтения с диска. Можете заметить эту тенденцию, отслеживая значение Total runtime .
Объём кэша определяется константой shared_buffers в файле postgresql.conf .

WHERE

Добавим в запрос условие

Индексов у таблицы нет. При выполнении запроса последовательно считывается каждая запись таблицы ( Seq Scan ). Каждая запись сравнивается с условием c1 > 500 . Если условие выполняется, запись вводится в результат. Иначе — отбрасывается. Filter означает именно такое поведение.
Значение cost , что логично, увеличилось.
Ожидаемое количество строк результата — rows — уменьшилось.
В оригинале даются объяснения, почему cost принимает именно такое значение, а также каким образом рассчитывается ожидаемое количество строк.

Пора создать индексы.

Ожидаемое количество строк изменилось. Уточнилось. В остальном ничего нового. Что же с индексом?

QUERY PLAN
— Seq Scan on foo (cost=0.00..20834.12 rows=999519 width=37) (actual time=0.572..848.895 rows=999500 loops=1)
Filter: (c1 > 500)
Rows Removed by Filter: 510
Total runtime: 1330.788 ms
(4 rows)

Отфильтровано только 510 строк из более чем миллиона. Пришлось считать более 99,9% таблицы.

Принудительно заставим использовать индекс, запретив Seq Scan:

QUERY PLAN
— Index Scan using foo_c1_idx on foo (cost=0.42..34623.01 rows=999519 width=37) (actual time=0.178..1018.045 rows=999500 loops=1)
Index Cond: (c1 > 500)
Total runtime: 1434.429 ms
(3 rows)

Index Scan , Index Cond вместо Filter — используется индекс foo_c1_idx .
При выборке практически всей таблицы использование индекса только увеличивает cost и время выполнения запроса. Планировщик не глуп!

Не забываем отменить запрет на использование Seq Scan:

QUERY PLAN
— Index Scan using foo_c1_idx on foo (cost=0.42..25.78 rows=491 width=37)
Index Cond: (c1 < 500)
(2 rows)

Тут планировщик решил использовать индекс.

Усложним условие. Используем текстовое поле.

QUERY PLAN
— Index Scan using foo_c1_idx on foo (cost=0.42..27.00 rows=1 width=37)
Index Cond: (c1 < 500)
Filter: (c2 ~~ 'abcd%'::text)
(3 rows)

Как видим, используется индекс foo_c1_idx для условия c1 < 500 . Для c2 ~~ 'abcd%'::text используется фильтр.
Обратите внимание, что в выводе результатов используется POSIX формат оператора LIKE.

Если в условии только текстовое поле:

QUERY PLAN
— Seq Scan on foo (cost=0.00..20834.12 rows=100 width=37) (actual time=14.497..412.030 rows=10 loops=1)
Filter: (c2 ~~ 'abcd%'::text)
Rows Removed by Filter: 1000000
Total runtime: 412.120 ms
(4 rows)

Ожидаемо, Seq Scan .

Строим индекс по c2 :

QUERY PLAN
— Seq Scan on foo (cost=0.00..20834.12 rows=100 width=37) (actual time=20.992..424.946 rows=10 loops=1)
Filter: (c2 ~~ 'abcd%'::text)
Rows Removed by Filter: 1000000
Total runtime: 425.039 ms
(4 rows)

Опять Seq Scan ? Индекс не используется потому, что база у меня для текстовых полей использует формат UTF-8.
При создании индекса в таких случаях надо использовать класс оператора text_pattern_ops :

QUERY PLAN
— Bitmap Heap Scan on foo (cost=4.58..55.20 rows=100 width=37)
Filter: (c2 ~~ 'abcd%'::text)
-> Bitmap Index Scan on foo_c2_idx1 (cost=0.00..4.55 rows=13 width=0)
Index Cond: ((c2 ~>=~ 'abcd'::text) AND (c2 ~ (4 rows)

Ура! Получилось!
Bitmap Index Scan — используется индекс foo_c2_idx1 для определения нужных нам записей, а затем PostgreSQL лезет в саму таблицу: ( Bitmap Heap Scan ) -, чтобы убедиться, что эти записи на самом деле существуют. Такое поведение связано с версионностью PostgreSQL.

Если выбирать не всю строку, а только поле, по которому построен индекс

QUERY PLAN
— Index Only Scan using foo_c1_idx on foo (cost=0.42..25.78 rows=491 width=4)
Index Cond: (c1 < 500)
(2 rows)

Index Only Scan выполняется быстрее, чем Index Scan за счёт того, что не требуется читать строку таблицы полностью: width=4 .

Sometimes I run a Postgres query and it takes 30 seconds. Then, I immediately run the same query and it takes 2 seconds. It appears that Postgres has some sort of caching. Can I somehow see what that cache is holding? Can I force all caches to be cleared for tuning purposes?

I'm basically looking for a Postgres version of the following SQL Server command:

But I would also like to know how to see what is actually contained in that buffer.

this is a very helpful feature for debugging queries (speed). I don't know why postgres devs aren't providing it.

11 Answers 11

You can see what's in the PostgreSQL buffer cache using the pg_buffercache module. I've done a presentation called "Inside the PostgreSQL Buffer Cache" that explains what you're seeing, and I show some more complicated queries to help interpret that information that go along with that.

It's also possible to look at the operating system cache too on some systems, see [pg_osmem.py] for one somewhat rough example.

There's no way to clear the caches easily. On Linux you can stop the database server and use the drop_caches facility to clear the OS cache; be sure to heed the warning there to run sync first.

Is it possible to simply bypass the caching within a single session? We often need to performance test different queries and this caching makes it very difficult to assess whether one method is better than another (except when comparing the cached performance!)

There is no way to bypass or flush the database's cache. All you can do to clear it is restart the server.

Is it conceivable that this could be made to be possible, for example in future development? Or is this just something that with the current systems (PG and Linux) won't be possible if if one was to try?

When using a managed PostgreSQL installation such as Amazon RDS, one doesn't have access to the OS, and emptying OS caches for testing purposes can be very hard, so this feature would be very beneficial in PostgreSQL.

Can't reproduce a slow query it's a problem, how can I be sure that my query is performing after a tunning? Restarts the server is not an option I am testing the query in prod because just prod has concurrency, locks and records enough to reproduce the issue

I haven't seen any commands to flush the caches in PostgreSQL. What you see is likely just normal index and data caches being read from disk and held in memory. by both postgresql and the caches in the OS. To get rid of all that, the only way I know of:

What you should do is:

Shutdown the database server (pg_ctl, sudo service postgresql stop , sudo systemctl stop postgresql , etc.)
echo 3 > /proc/sys/vm/drop_caches This will clear out the OS file/block caches - very important though I don't know how to do that on other OSs. (In case of permission denied, try sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" as in that question)
Start the database server (e.g. sudo service postgresql start , sudo systemctl start postgresql )

Thought it would be helpful to note: if Postgres' data directory is not on the same volume that '/' is mounted on, you may need to umount before/after the operation above (not sure which, really). In addition (maybe a little voodoo) try running 'sync' before and after the those steps.

Greg Smith's answer about drop_caches was very helpful. I did find it necessary to stop and start the postgresql service, in addition to dropping the caches. Here's a shell script that does the trick. (My environment is Ubuntu 14.04 and PostgreSQL 9.3.)

I tested with a query that took 19 seconds the first time, and less than 2 seconds on subsequent attempts. After running this script, the query once again took 19 seconds.

I use this command on my linux box:

It completely gets rid of the cache.

If Postgresql version is not 9.0: sync; sudo service postgresql stop; echo 1 > /proc/sys/vm/drop_caches; sudo service postgresql start

I think, sync should be done after stopping the server, immediately before drop_caches , as Postgres can write something during stop process again.

I had this error.

psql:/cygdrive/e/test_insertion.sql:9: ERROR: type of parameter 53 (t_stat_gardien) does not match that when preparing the plan (t_stat_avant)

I was looking for flushing the current plan and a found this:

I had this between my inserts and it solves my problem.

The right syntax is DISCARD PLANS; . And, as documentation states: "DISCARD releases internal resources associated with a database session".

Yes, postgresql certainly has caching. The size is controlled by the setting shared_buffers. Other than that, there is as the previous answer mentions, the OS file cache which is also used.

If you want to look at what's in the cache, there is a contrib module called pg_buffercache available (in contrib/ in the source tree, in the contrib RPM, or wherever is appropriate for how you installed it). How to use it is listed in the standard PostgreSQL documentation.

There are no ways to clear out the buffer cache, other than to restart the server. You can drop the OS cache with the command mentioned in the other answer - provided your OS is Linux.

Yes, it is possible to clear both the shared buffers postgres cache AND the OS cache. Solution bellow is for Windows. others have already given the linux solution.

As many people already said, to clear the shared buffers you can just restart Postgres (no need to restart the server). But just doing this won't clear the OS cache.

To clear the OS cache used by Postgres, after stopping the service, use the excelent RamMap (https://technet.microsoft.com/en-us/sysinternals/rammap), from the excelent Sysinternals Suite. Once you execute RamMap, just click "Empty"->"Empty Standby List" in the main menu.

Restart Postgres and you'll see now your next query will be damm slow due to no cache at all.

You can also execute the RamMap without closing Postgres, and probably will have the "no cache" results you want, since as people already said, shared buffers usually gives little impact compared to the OS cache. But for a reliable test, I would rather stop postgres as all before clearing the OS cache to make sure.

Note: AFAIK, I don't recommend clearing the other things besides "Standby list" when using RamMap, because the other data is somehow being used, and you can potentially cause problems/loose data if you do that. Remember that you are clearing memory not only used by postgres files, but any other app and OS as well.

Читайте также: