Index skip scan oracle что это
Официально не поддерживается, начиная с Oracle 10, используется оптимизация по точно оределённым правилам (Rule Based Optimization) без учёта статистики объектов бд. Описание применяемых правил для инилизационного параметра optimizer_mode = rule
В частности, несмотря на использование подсказки RULE Oracle будет использовать Cost-Based Optimization, если:
- кроме RULE в запросе используются другие подсказки
- запрос использует партицированные или таблицы, организованные в виде индекса (IOT), или материализованные представления (mview)
- в запросе используются кляузы SAMPLE, SPREADSHEET, конструкции GROUPING SETS
- в запросе используются ANSI left|full outer join
- запрос выполняется параллельно
- используется Flashback cursor (as of [scn|timestamp])
- …
/*+ ALL_ROWS */
«Подсказка ALL_ROWS определяет целью скорейшее выполнение всего запроса с минимальным расходом ресурсов (best throughput при извлечении всего результирующего набора данных). При одновременном с ALL_ROWS или FIRST_ROWS указании подсказок, определяющих методы доступа к данным (NO_INDEX_SS, INDEX_COMBINE. ) или указывающие методы объединения объектов БД (LEADING, USE_NL_WITH_INDEX. ), оптимизатор отдаёт предпочтение подсказкам методов доступа и объединения»
/*+ FIRST_ROWS */
Из документации Oracle 8: «Подсказка FIRST_ROWS определяет стоимостной подход (cost-based approach) для оптимизации блоков запроса (statement block) с целью лучшего времени отклика (response time, минимального расхода ресурсов для возвращения первых строк запроса). В соответствии с этой подсказкой оптимизатор делает следующие предпочтения [в выборе операций доступа к данным и методов соединения]:
- При наличии оптимизатор использует сканирование по индексу (index scan) вместо полного сканирования таблицы (full table scan)
- Если доступно сканирование по индексу (index scan), оптимизатор выбирает nested loops join вместо sort-merge join в случае, когда сканируемая индексированная таблица может быть использована как ведомая таблица (inner table) для операции nested loops
- Если использование индекса (index scan) может быть использовано для получения отсортированных данных (в порядке, определённом фразой ORDER BY), оптимизатор выбирает индексный доступ во избежание дополнительной сортировки»
Начиная с Oracle 9i: «Подсказка FIRST_ROWS указанная без аргументов, предназначенная для оптимизации плана выполнения с целью скорейшего возвращения первой строки запроса, сохраняется только для обратной совместимости (backward compatibility) и стабильности планов выполнения (plan stability)»
Значение инилизационного параметра OPTIMIZER_MODE=FIRST_ROWS (что равносильно применению подсказки FIRST_ROWS для всех запросов) аннонсируется в документации вплоть до версии Oracle 11.2
/*+ FIRST_ROWS(n) */
Оптимизация, основанная на стоимости (Cost Based Optimization) + использование правил (предпочтений в выборе плана) с целью получения лучшего времени отклика для получения первых n строк. План рассчитывается с учётом значения n, как целевого количества выбранных запросом строк (query cardinality).
См. описание правил для параметра optimizer_mode = first_rows
Не совсем понятная фраза в документации: « Оптимизатор игнорирует эту подсказку в SQL предложениях DELETE и UPDATE и в запросах SELECT, включающих блокирующие операции, такие как сортировки и группировки . Такие SQL предложения не могут быть оптимизированы с целью наименьшего времени отклика (best response time), поскольку Oracle должен обработать все строки запроса до того, как вернуть первую строку результата. При указании этой подсказки запросы указанного типа оптимизируются с целью лучшего времени получения всех строк запроса с минимальным расходом ресурсов (best throughput, как при использовании подсказки ALL_ROWS)»
В Oracle 11.2 ничего не изменилось — независимо от количества обновляемых строк (Rows), устанавливаемого функцией rownum ни планы выполнения (кроме дополнительной операции COUNT STOPKEY), ни стоимость, ни ожидаемое время (Time) не меняются:
— при попытке обновить только одну строку Oracle выбирает тот же план с недешёвой операцией HASH JOIN SEMI (в запроса используется конструкция EXISTS) — более подходящей для получения всех строк обновляемой таблицы (ALL_ROWS mode)
Простой тест для показывает, как по-разному режимы ALL_ROWS, FIRST_ROWS и FIRST_ROWS(n) влияют на поведение оптимизатора: собственно, планы выполнения и методы доступа к данным + Cost + Rows :) на примерах сортировки или группировки при использовании бессмысленного условия object_id > 1 — которому удовлетворяют все строки таблицы T1 — по условию создания таблицы min(object_id)=2)*:
*) статистика для таблицы T1 актуальна и оптимизатор прекрасно «знает» о минимальном и максимальном значении T1.Object_id:
или так, попроще:
Access Path Hints
Подсказки,определяющие конкретные способы доступа к данным, порядок и применяемые методы объединения промежуточных наборов данных (result sets)*
Том Кайт называет эти подсказки плохими (bad hints):
«Плохие подсказки указывают оптимизатору как следует действовать [при выполнении запроса], какой индекс использовать, в каком порядке обрабатывать таблицы, с помощью какой операции (join technique) производить соединение [источников данных]»
*) Перед применение необходимо учитывать вероятность изменения статистики системы и объектов (таблиц и индексов), используемых в запросе со временем и, как следствие, возможную неоптимальность указываемых операций для будущих наборов данных. План выполнения, оптимизированный с помощью подсказок этого типа в среде разработки, может оказаться далеко не оптимальным для боевой бд (production system) из-за отличающихся наборов данных и характеристик системы («железо»)
/*+ LEADING( [@query_block] [tablespec],[tablespec]. ) */
«Подсказка LEADING указывает оптимизатору использовать перечисленный порядок доступа к таблицам при построении плана выполнения запроса… более гибкая, чем ORDERED… Полностью игнорируются при использовании двух или более конфликтующих подсказок LEADING. Для оптимизатора подсказка ORDERED имеет преимущество против LEADING»
В версии 11.2 может игнорироваться оптимизатором при выполнении преобразований, основанных на стоимости (Cost-Based Query Transformation), например, Table Expansion
/*+ ORDERED */
«Подсказка ORDERED указывает Oracle [при выполнении запроса] проводить соединение таблиц в том же порядке, в котором таблицы перечислены в конструкции FROM. Oracle рекомендует вместо ORDERED использовать подсказку LEADING, обладающую большей гибкостью…», т.е. дающей оптимизатору больше возможностей в выборе плана выполнения
/*+ USE_HASH( [@query_block] [tablespec] [tablespec]… ) */
/*+ NO_USE_HASH( [@query_block] [tablespec] [tablespec]… ) */
…указывает оптимизатору использовать / не использовать операцию hash join для соединения каждой указанной таблицы с прочими источниками данных Операции CBO: Hash Join
Не найдя на хабре статьи, объединяющей в удобном для чтения виде информацию о методах доступа к данным, используемых СУБД Oracle, я решил совершить «пробу пера» и написать эту статью.
Общая информация
Не углубляясь в детали, можно утверждать что Oracle хранит данные в таблицах, вместе с которыми могут существовать особые структуры данных – индексы, призванные ускорить запросы к таблицам. При выполнении запросов Oracle по-разному обращается к таблицам и индексам – способы доступа к данным в различных ситуациях и являются предметом этой статьи.
Для примеров мы будем использовать следующую таблицу и данные в ней:
Для анализа плана выполнения запроса будем пользоваться следующими средствами:
После создания индекса и использования его в примерах и перед созданием следующего индекса, он должен быть удален. Это можно сделать с помощью следующего запроса:
TABLE FULL SCAN
Данный метод доступа, как следует из названия, подразумевает перебор всех строк таблицы с исключением тех, которые не удовлетворяют предикату where (если таковой есть). Применяется он либо в случае, когда условия предиката отсутствуют в индексе, либо когда индекса нет в принципе. Примеры:
TABLE ACCESS BY ROWID, он же ROWID
- Мы указали идентификатор строки в предикате where;
- ROWID запрошенной записи был найден в индексе;
INDEX FULL SCAN
Данный метод доступа просматривает все листовые блоки индекса для поиска соответствий условиям предиката. Для того чтобы Oracle мог применить этот метод доступа, хотя бы одно из полей ключа должно иметь ограничение NOT NULL, т.к. только в этом случае соответствующая строка таблицы попадет в индекс. Этот метод обычно быстрее чем TABLE FULL SCAN, но медленнее, чем INDEX RANGE SCAN (см. ниже).
INDEX FAST FULL SCAN
Этот метод доступа применяется, когда выполнены все требования для INDEX FULL SCAN, а также все данные, выбираемые запросом, содержатся в индексе и таким образом доступ к самой таблице не требуется. В отличие от INDEX FULL SCAN этот метод может читать блоки индекса в несколько параллельных потоков и таким образом порядок возвращаемых значений не регламентирован. Oracle также не может использовать этот метод для bitmap-индексов.
INDEX RANGE SCAN
INDEX UNIQUE SCAN
Данный метод доступа применяется когда в силу ограничений UNIQUE/PRIMARY KEY, а также условия предиката, запрос должен вернуть ноль или одно значение.
Пример:
INDEX SKIP SCAN
Этот метод доступа используется в случае, если в предикате where не используется первый столбец индекса.
Для примера использования этого метода доступа нам потребуется другая таблица (обратите внимание, что количество строк, данные и т.д. будут зависеть от того, что есть в используемой схеме, и поэтому данный пример может не воспроизвестись сразу):
DISCLAIMER
Утверждения о том, что при определенных условиях cost-based-optimizer (CBO) выберет тот или иной метод доступа, могут быть не совсем справедливыми в отдельных случаях, так как логика определения оптимального метода оптимизатором очень сложна.
An access path is a technique used by a query to retrieve rows from a row source.
This chapter contains the following topics:
Introduction to Access Paths
A row source is a set of rows returned by a step in an execution plan. A row source can be a table, view, or result of a join or grouping operation.
A unary operation such as an access path , which is a technique used by a query to retrieve rows from a row source, accepts a single row source as input. For example, a full table scan is the retrieval of rows of a single row source. In contrast, a join operation is binary and receives inputs from two row sources
The database uses different access paths for different relational data structures. The following table summarizes common access paths for the major data structures.
Bitmap Index Range Scans
The optimizer considers different possible execution plans, and then assigns each plan a cost . The optimizer chooses the plan with the lowest cost. In general, index access paths are more efficient for statements that retrieve a small subset of table rows, whereas full table scans are more efficient when accessing a large portion of a table.
Oracle Database Concepts for an overview of these structures
Table Access by Rowid
A rowid is an internal representation of the storage location of data.
The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row because it specifies the exact location of the row in the database.
Rowids can change between versions. Accessing data based on position is not recommended because rows can move.
When the Optimizer Chooses Table Access by Rowid
In most cases, the database accesses a table by rowid after a scan of one or more indexes. However, table access by rowid need not follow every index scan. If the index contains all needed columns, then access by rowid might not occur (see "Index Fast Full Scans" ).
How Table Access by Rowid Works
To access a table by rowid, the database performs the following steps:
Obtains the rowids of the selected rows, either from the statement WHERE clause or through an index scan of one or more indexes
Table access may be needed for columns in the statement not present in the index.
Locates each selected row in the table based on its rowid
Table Access by Rowid: Example
Assume run the following query:
Step 2 of the following plan shows a range scan of the emp_emp_id_pk index on the hr.employees table. The database uses the rowids obtained from the index to find the corresponding rows from the employees table, and then retrieve them. The BATCHED access shown in Step 1 means that the database retrieves a few rowids from the index, and then attempts to access rows in block order to improve the clustering and reduce the number of times that the database must access a block.
8.2 Table Access Paths
A table is the basic unit of data organization in an Oracle database.
Relational tables are the most common table type. Relational tables have with the following organizational characteristics:
A heap-organized table does not store rows in any particular order.
An index-organized table orders rows according to the primary key values.
An external table is a read-only table whose metadata is stored in the database but whose data is stored outside the database.
Oracle Database Concepts for an overview of tables
8.2.1 About Heap-Organized Table Access
By default, a table is organized as a heap, which means that the database places rows where they fit best rather than in a user-specified order.
As users add rows, the database places the rows in the first available free space in the data segment. Rows are not guaranteed to be retrieved in the order in which they were inserted.
8.2.1.1 Row Storage in Data Blocks and Segments: A Primer
The database stores rows in data blocks. In tables, the database can write a row anywhere in the bottom part of the block. Oracle Database uses the block overhead, which contains the row directory and table directory, to manage the block itself.
An extent is made up of logically contiguous data blocks. The blocks may not be physically contiguous on disk. A segment is a set of extents that contains all the data for a logical storage structure within a tablespace. For example, Oracle Database allocates one or more extents to form the data segment for a table. The database also allocates one or more extents to form the index segment for a table.
By default, the database uses automatic segment space management (ASSM) for permanent, locally managed tablespaces. When a session first inserts data into a table, the database formats a bitmap block. The bitmap tracks the blocks in the segment. The database uses the bitmap to find free blocks and then formats each block before writing to it. ASSM spread out inserts among blocks to avoid concurrency issues.
The high water mark (HWM) is the point in a segment beyond which data blocks are unformatted and have never been used. Below the HWM, a block may be formatted and written to, formatted and empty, or unformatted. The low high water mark (low HWM) marks the point below which all blocks are known to be formatted because they either contain data or formerly contained data.
During a full table scan, the database reads all blocks up to the low HWM, which are known to be formatted, and then reads the segment bitmap to determine which blocks between the HWM and low HWM are formatted and safe to read. The database knows not to read past the HWM because these blocks are unformatted.
Oracle Database Concepts to learn about data block storage
8.2.1.2 Importance of Rowids for Row Access
Every row in a heap-organized table has a rowid unique to this table that corresponds to the physical address of a row piece. A rowid is a 10-byte physical address of a row.
The rowid points to a specific file, block, and row number. For example, in the rowid AAAPecAAFAAAABSAAA , the final AAA represents the row number. The row number is an index into a row directory entry. The row directory entry contains a pointer to the location of the row on the block.
The database can sometimes move a row in the bottom part of the block. For example, if row movement is enabled, then the row can move because of partition key updates, Flashback Table operations, shrink table operations, and so on. If the database moves a row within a block, then the database updates the row directory entry to modify the pointer. The rowid stays constant.
Oracle Database uses rowids internally for the construction of indexes. For example, each key in a B-tree index is associated with a rowid that points to the address of the associated row. Physical rowids provide the fastest possible access to a table row, enabling the database to retrieve a row in as little as a single I/O.
8.2.1.3 Direct Path Reads
In a direct path read, the database reads buffers from disk directly into the PGA, bypassing the SGA entirely.
The following figure shows the difference between scattered and sequential reads, which store buffers in the SGA, and direct path reads.
Situations in which Oracle Database may perform direct path reads include:
Execution of a CREATE TABLE AS SELECT statement
Execution of an ALTER REBUILD or ALTER MOVE statement
Reads from a temporary tablespace
Reads from a LOB segment
Oracle Database Performance Tuning Guide to learn about wait events for direct path reads
8.2.2 Full Table Scans
A full table scan reads all rows from a table, and then filters out those rows that do not meet the selection criteria.
8.2.2.1 When the Optimizer Considers a Full Table Scan
In general, the optimizer chooses a full table scan when it cannot use a different access path, or another usable access path is higher cost.
The following table shows typical reasons for choosing a full table scan.
No index exists.
If no index exists, then the optimizer uses a full table scan.
The query predicate applies a function to the indexed column.
Unless the index is a function-based index, the database indexes the values of the column, not the values of the column with the function applied. A typical application-level mistake is to index a character column, such as char_col , and then query the column using syntax such as WHERE char_col=1 . The database implicitly applies a TO_NUMBER function to the constant number 1 , which prevents use of the index.
A SELECT COUNT(*) query is issued, and an index exists, but the indexed column contains nulls.
The optimizer cannot use the index to count the number of table rows because the index cannot contain null entries.
The query predicate does not use the leading edge of a B-tree index.
For example, an index might exist on employees(first_name,last_name) . If a user issues a query with the predicate WHERE last_name='KING' , then the optimizer may not choose an index because column first_name is not in the predicate. However, in this situation the optimizer may choose to use an index skip scan.
If the optimizer determines that the query requires most of the blocks in the table, then it uses a full table scan, even though indexes are available. Full table scans can use larger I/O calls. Making fewer large I/O calls is cheaper than making many smaller calls.
The table statistics are stale.
For example, a table was small, but now has grown large. If the table statistics are stale and do not reflect the current size of the table, then the optimizer does not know that an index is now most efficient than a full table scan.
The table is small.
If a table contains fewer than n blocks under the high water mark, where n equals the setting for the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter, then a full table scan may be cheaper than an index range scan. The scan may be less expensive regardless of the fraction of tables being accessed or indexes present.
The table has a high degree of parallelism.
A high degree of parallelism for a table skews the optimizer toward full table scans over range scans. Query the value in the ALL_TABLES.DEGREE column to determine the degree of parallelism.
The query uses a full table scan hint.
The hint FULL( table alias ) instructs the optimizer to use a full table scan.
8.2.2.2 How a Full Table Scan Works
In a full table scan, the database sequentially reads every formatted block under the high water mark. The database reads each block only once.
The following graphic depicts a scan of a table segment, showing how the scan skips unformatted blocks below the high water mark.
Because the blocks are adjacent, the database can speed up the scan by making I/O calls larger than a single block, known as a multiblock read . The size of a read call ranges from one block to the number of blocks specified by the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter. For example, setting this parameter to 4 instructs the database to read up to 4 blocks in a single call.
The algorithms for caching blocks during full table scans are complex. For example, the database caches blocks differently depending on whether tables are small or large.
Oracle Database Concepts for an overview of the default caching mode
Oracle Database Reference to learn about the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter
8.2.2.3 Full Table Scan: Example
This example scans the hr.employees table.
The following statement queries monthly salaries over $4000:
The following plan was retrieved using the DBMS_XPLAN.DISPLAY_CURSOR function. Because no index exists on the salary column, the optimizer cannot use an index range scan, and so uses a full table scan.
8.2.3 Table Access by Rowid
A rowid is an internal representation of the storage location of data.
The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row because it specifies the exact location of the row in the database.
Rowids can change between versions. Accessing data based on position is not recommended because rows can move.
8.2.3.1 When the Optimizer Chooses Table Access by Rowid
In most cases, the database accesses a table by rowid after a scan of one or more indexes.
However, table access by rowid need not follow every index scan. If the index contains all needed columns, then access by rowid might not occur.
8.2.3.2 How Table Access by Rowid Works
To access a table by rowid, the database performs multiple steps.
The database does the following:
Obtains the rowids of the selected rows, either from the statement WHERE clause or through an index scan of one or more indexes
Table access may be needed for columns in the statement not present in the index.
Locates each selected row in the table based on its rowid
8.2.3.3 Table Access by Rowid: Example
This example demonstrates rowid access of the hr.employees table.
Assume that you run the following query:
Step 2 of the following plan shows a range scan of the emp_emp_id_pk index on the hr.employees table. The database uses the rowids obtained from the index to find the corresponding rows from the employees table, and then retrieve them. The BATCHED access shown in Step 1 means that the database retrieves a few rowids from the index, and then attempts to access rows in block order to improve the clustering and reduce the number of times that the database must access a block.
8.2.4 Sample Table Scans
A sample table scan retrieves a random sample of data from a simple table or a complex SELECT statement, such as a statement involving joins and views.
8.2.4.1 When the Optimizer Chooses a Sample Table Scan
The database uses a sample table scan when a statement FROM clause includes the SAMPLE keyword.
The SAMPLE clause has the following forms:
The database reads a specified percentage of rows in the table to perform a sample table scan.
SAMPLE BLOCK ( sample_percent )
The database reads a specified percentage of table blocks to perform a sample table scan.
The sample_percent specifies the percentage of the total row or block count to include in the sample. The value must be in the range .000001 up to, but not including, 100 . This percentage indicates the probability of each row, or each cluster of rows in block sampling, being selected for the sample. It does not mean that the database retrieves exactly sample_percent of the rows.
Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then the database does not sample blocks. To guarantee block sampling for a specific table or index, use the FULL or INDEX_FFS hint.
Sample Table Scans
A sample table scan retrieves a random sample of data from a simple table or a complex SELECT statement, such as a statement involving joins and views.
When the Optimizer Chooses a Sample Table Scan
The database uses a sample table scan when a statement FROM clause includes the SAMPLE keyword.
The SAMPLE clause has the following forms:
The database reads a specified percentage of rows in the table to perform a sample table scan.
SAMPLE BLOCK ( sample_percent )
The database reads a specified percentage of table blocks to perform a sample table scan.
The sample_percent specifies the percentage of the total row or block count to include in the sample. The value must be in the range .000001 up to, but not including, 100 . This percentage indicates the probability of each row, or each cluster of rows in block sampling, being selected for the sample. It does not mean that the database retrieves exactly sample_percent of the rows.
Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then the database does not sample blocks. To guarantee block sampling for a specific table or index, use the FULL or INDEX_FFS hint.
An access path is a technique used by a query to retrieve rows from a row source.
This chapter contains the following topics:
8.1 Introduction to Access Paths
A row source is a set of rows returned by a step in an execution plan. A row source can be a table, view, or result of a join or grouping operation.
A unary operation such as an access path , which is a technique used by a query to retrieve rows from a row source, accepts a single row source as input. For example, a full table scan is the retrieval of rows of a single row source. In contrast, a join operation is binary and receives inputs from two row sources
The database uses different access paths for different relational data structures. The following table summarizes common access paths for the major data structures.
Bitmap Index Range Scans
The optimizer considers different possible execution plans, and then assigns each plan a cost . The optimizer chooses the plan with the lowest cost. In general, index access paths are more efficient for statements that retrieve a small subset of table rows, whereas full table scans are more efficient when accessing a large portion of a table.
Oracle Database Concepts for an overview of these structures
Full Table Scans
A full table scan reads all rows from a table, and then filters out those rows that do not meet the selection criteria.
When the Optimizer Considers a Full Table Scan
In general, the optimizer chooses a full table scan when it cannot use a different access path, or another usable access path is higher cost.
The following table shows typical reasons for choosing a full table scan.
No index exists.
If no index exists, then the optimizer uses a full table scan.
The query predicate applies a function to the indexed column.
Unless the index is a function-based index, the database indexes the values of the column, not the values of the column with the function applied. A typical application-level mistake is to index a character column, such as char_col , and then query the column using syntax such as WHERE char_col=1 . The database implicitly applies a TO_NUMBER function to the constant number 1 , which prevents use of the index.
A SELECT COUNT(*) query is issued, and an index exists, but the indexed column contains nulls.
The optimizer cannot use the index to count the number of table rows because the index cannot contain null entries.
The query predicate does not use the leading edge of a B-tree index.
For example, an index might exist on employees(first_name,last_name) . If a user issues a query with the predicate WHERE last_name='KING' , then the optimizer may not choose an index because column first_name is not in the predicate. However, in this situation the optimizer may choose to use an index skip scan.
If the optimizer determines that the query requires most of the blocks in the table, then it uses a full table scan, even though indexes are available. Full table scans can use larger I/O calls. Making fewer large I/O calls is cheaper than making many smaller calls.
The table statistics are stale.
For example, a table was small, but now has grown large. If the table statistics are stale and do not reflect the current size of the table, then the optimizer does not know that an index is now most efficient than a full table scan.
The table is small.
If a table contains fewer than n blocks under the high water mark, where n equals the setting for the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter, then a full table scan may be cheaper than an index range scan. The scan may be less expensive regardless of the fraction of tables being accessed or indexes present.
The table has a high degree of parallelism.
A high degree of parallelism for a table skews the optimizer toward full table scans over range scans. Query the value in the ALL_TABLES.DEGREE column to determine the degree of parallelism.
The query uses a full table scan hint.
The hint FULL( table alias ) instructs the optimizer to use a full table scan.
How a Full Table Scan Works
In a full table scan, the database sequentially reads every formatted block under the high water mark. The database reads each block only once.
The following graphic depicts a scan of a table segment, showing how the scan skips unformatted blocks below the high water mark.
Because the blocks are adjacent, the database can speed up the scan by making I/O calls larger than a single block, known as a multiblock read . The size of a read call ranges from one block to the number of blocks specified by the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter. For example, setting this parameter to 4 instructs the database to read up to 4 blocks in a single call.
The algorithms for caching blocks during full table scans are complex. For example, the database caches blocks differently depending on whether tables are small or large.
Oracle Database Concepts for an overview of the default caching mode
Oracle Database Reference to learn about the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter
Full Table Scan: Example
The following statement queries salaries over 4000 in the hr.employees table:
The following plan was retrieved using the DBMS_XPLAN.DISPLAY_CURSOR function. Because no index exists on the salary column, the optimizer cannot use an index range scan, and so uses a full table scan.
Table Access Paths
A table is the basic unit of data organization in an Oracle database.
Relational tables are the most common table type. Relational tables have with the following organizational characteristics:
A heap-organized table does not store rows in any particular order.
An index-organized table orders rows according to the primary key values.
An external table is a read-only table whose metadata is stored in the database but whose data is stored outside the database.
This section explains optimizer access paths for heap-organized tables, and contains the following topics:
Oracle Database Concepts for an overview of tables
About Heap-Organized Table Access
By default, a table is organized as a heap, which means that the database places rows where they fit best rather than in a user-specified order.
As users add rows, the database places the rows in the first available free space in the data segment. Rows are not guaranteed to be retrieved in the order in which they were inserted.
Row Storage in Data Blocks and Segments: A Primer
The database stores rows in data blocks. In tables, the database can write a row anywhere in the bottom part of the block. Oracle Database uses the block overhead, which contains the row directory and table directory, to manage the block itself.
An extent is made up of logically contiguous data blocks. The blocks may not be physically contiguous on disk. A segment is a set of extents that contains all the data for a logical storage structure within a tablespace. For example, Oracle Database allocates one or more extents to form the data segment for a table. The database also allocates one or more extents to form the index segment for a table.
By default, the database uses automatic segment space management (ASSM) for permanent, locally managed tablespaces. When a session first inserts data into a table, the database formats a bitmap block. The bitmap tracks the blocks in the segment. The database uses the bitmap to find free blocks and then formats each block before writing to it. ASSM spread out inserts among blocks to avoid concurrency issues.
The high water mark (HWM) is the point in a segment beyond which data blocks are unformatted and have never been used. Below the HWM, a block may be formatted and written to, formatted and empty, or unformatted. The low high water mark (low HWM) marks the point below which all blocks are known to be formatted because they either contain data or formerly contained data.
During a full table scan, the database reads all blocks up to the low HWM, which are known to be formatted, and then reads the segment bitmap to determine which blocks between the HWM and low HWM are formatted and safe to read. The database knows not to read past the HWM because these blocks are unformatted.
Oracle Database Concepts to learn about data block storage
Importance of Rowids for Row Access
Every row in a heap-organized table has a rowid unique to this table that corresponds to the physical address of a row piece. A rowid is a 10-byte physical address of a row.
The rowid points to a specific file, block, and row number. For example, in the rowid AAAPecAAFAAAABSAAA , the final AAA represents the row number. The row number is an index into a row directory entry. The row directory entry contains a pointer to the location of the row on the block.
The database can sometimes move a row in the bottom part of the block. For example, if row movement is enabled, then the row can move because of partition key updates, Flashback Table operations, shrink table operations, and so on. If the database moves a row within a block, then the database updates the row directory entry to modify the pointer. The rowid stays constant.
Oracle Database uses rowids internally for the construction of indexes. For example, each key in a B-tree index is associated with a rowid that points to the address of the associated row. Physical rowids provide the fastest possible access to a table row, enabling the database to retrieve a row in as little as a single I/O.
Direct Path Reads
In a direct path read , the database reads buffers from disk directly into the PGA, bypassing the SGA entirely.
The following figure shows the difference between scattered and sequential reads, which store buffers in the SGA, and direct path reads.
Situations in which Oracle Database may perform direct path reads include:
Execution of a CREATE TABLE AS SELECT statement
Execution of an ALTER REBUILD or ALTER MOVE statement
Reads from a temporary tablespace
Reads from a LOB segment
Oracle Database Performance Tuning Guide to learn about wait events for direct path reads
8.1 Introduction to Access Paths
A row source is a set of rows returned by a step in an execution plan. A row source can be a table, view, or result of a join or grouping operation.
A unary operation such as an access path , which is a technique used by a query to retrieve rows from a row source, accepts a single row source as input. For example, a full table scan is the retrieval of rows of a single row source. In contrast, a join is binary and receives inputs from exactly two row sources
The database uses different access paths for different relational data structures. The following table summarizes common access paths for the major data structures.
Bitmap Index Range Scans
The optimizer considers different possible execution plans, and then assigns each plan a cost . The optimizer chooses the plan with the lowest cost. In general, index access paths are more efficient for statements that retrieve a small subset of table rows, whereas full table scans are more efficient when accessing a large portion of a table.
Oracle Database Concepts for an overview of these structures
8.2 Table Access Paths
A table is the basic unit of data organization in an Oracle database.
Relational tables are the most common table type. Relational tables have with the following organizational characteristics:
A heap-organized table does not store rows in any particular order.
An index-organized table orders rows according to the primary key values.
An external table is a read-only table whose metadata is stored in the database but whose data is stored outside the database.
This section explains optimizer access paths for heap-organized tables, and contains the following topics:
Oracle Database Concepts for an overview of tables
8.2.1 About Heap-Organized Table Access
By default, a table is organized as a heap, which means that the database places rows where they fit best rather than in a user-specified order.
As users add rows, the database places the rows in the first available free space in the data segment. Rows are not guaranteed to be retrieved in the order in which they were inserted.
This section contains the following topics:
8.2.1.1 Row Storage in Data Blocks and Segments: A Primer
The database stores rows in data blocks. In tables, the database can write a row anywhere in the bottom part of the block. Oracle Database uses the block overhead, which contains the row directory and table directory, to manage the block itself.
An extent is made up of logically contiguous data blocks. The blocks may not be physically contiguous on disk. A segment is a set of extents that contains all the data for a logical storage structure within a tablespace. For example, Oracle Database allocates one or more extents to form the data segment for a table. The database also allocates one or more extents to form the index segment for a table.
By default, the database uses automatic segment space management (ASSM) for permanent, locally managed tablespaces. When a session first inserts data into a table, the database formats a bitmap block. The bitmap tracks the blocks in the segment. The database uses the bitmap to find free blocks and then formats each block before writing to it. ASSM spread out inserts among blocks to avoid concurrency issues.
The high water mark (HWM) is the point in a segment beyond which data blocks are unformatted and have never been used. Below the HWM, a block may be formatted and written to, formatted and empty, or unformatted. The low high water mark (low HWM) marks the point below which all blocks are known to be formatted because they either contain data or formerly contained data.
During a full table scan, the database reads all blocks up to the low HWM, which are known to be formatted, and then reads the segment bitmap to determine which blocks between the HWM and low HWM are formatted and safe to read. The database knows not to read past the HWM because these blocks are unformatted.
Oracle Database Concepts to learn about data block storage
8.2.1.2 Importance of Rowids for Row Access
Every row in a heap-organized table has a rowid unique to this table that corresponds to the physical address of a row piece. A rowid is a 10-byte physical address of a row.
The rowid points to a specific file, block, and row number. For example, in the rowid AAAPecAAFAAAABSAAA , the final AAA represents the row number. The row number is an index into a row directory entry. The row directory entry contains a pointer to the location of the row on the block.
The database can sometimes move a row in the bottom part of the block. For example, if row movement is enabled, then the row can move because of partition key updates, Flashback Table operations, shrink table operations, and so on. If the database moves a row within a block, then the database updates the row directory entry to modify the pointer. The rowid stays constant.
Oracle Database uses rowids internally for the construction of indexes. For example, each key in a B-tree index is associated with a rowid that points to the address of the associated row. Physical rowids provide the fastest possible access to a table row, enabling the database to retrieve a row in as little as a single I/O.
8.2.1.3 Direct Path Reads
In a direct path read, the database reads buffers from disk directly into the PGA, bypassing the SGA entirely.
The following figure shows the difference between scattered and sequential reads, which store buffers in the SGA, and direct path reads.
Situations in which Oracle Database may perform direct path reads include:
Execution of a CREATE TABLE AS SELECT statement
Execution of an ALTER REBUILD or ALTER MOVE statement
Reads from a temporary tablespace
Reads from a LOB segment
Oracle Database Performance Tuning Guide to learn about wait events for direct path reads
8.2.2 Full Table Scans
A full table scan reads all rows from a table, and then filters out those rows that do not meet the selection criteria.
This section contains the following topics:
8.2.2.1 When the Optimizer Considers a Full Table Scan
In general, the optimizer chooses a full table scan when it cannot use a different access path, or another usable access path is higher cost.
The following table shows typical reasons for choosing a full table scan.
No index exists.
If no index exists, then the optimizer uses a full table scan.
The query predicate applies a function to the indexed column.
Unless the index is a function-based index, the database indexes the values of the column, not the values of the column with the function applied. A typical application-level mistake is to index a character column, such as char_col , and then query the column using syntax such as WHERE char_col=1 . The database implicitly applies a TO_NUMBER function to the constant number 1 , which prevents use of the index.
A SELECT COUNT(*) query is issued, and an index exists, but the indexed column contains nulls.
The optimizer cannot use the index to count the number of table rows because the index cannot contain null entries.
The query predicate does not use the leading edge of a B-tree index.
For example, an index might exist on employees(first_name,last_name) . If a user issues a query with the predicate WHERE last_name='KING' , then the optimizer may not choose an index because column first_name is not in the predicate. However, in this situation the optimizer may choose to use an index skip scan.
If the optimizer determines that the query requires most of the blocks in the table, then it uses a full table scan, even though indexes are available. Full table scans can use larger I/O calls. Making fewer large I/O calls is cheaper than making many smaller calls.
The table statistics are stale.
For example, a table was small, but now has grown large. If the table statistics are stale and do not reflect the current size of the table, then the optimizer does not know that an index is now most efficient than a full table scan.
The table is small.
If a table contains fewer than n blocks under the high water mark, where n equals the setting for the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter, then a full table scan may be cheaper than an index range scan. The scan may be less expensive regardless of the fraction of tables being accessed or indexes present.
The table has a high degree of parallelism.
A high degree of parallelism for a table skews the optimizer toward full table scans over range scans. Query the value in the ALL_TABLES.DEGREE column to determine the degree of parallelism.
The query uses a full table scan hint.
The hint FULL( table alias ) instructs the optimizer to use a full table scan.
8.2.2.2 How a Full Table Scan Works
In a full table scan, the database sequentially reads every formatted block under the high water mark. The database reads each block only once.
The following graphic depicts a scan of a table segment, showing how the scan skips unformatted blocks below the high water mark.
Because the blocks are adjacent, the database can speed up the scan by making I/O calls larger than a single block, known as a multiblock read . The size of a read call ranges from one block to the number of blocks specified by the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter. For example, setting this parameter to 4 instructs the database to read up to 4 blocks in a single call.
The algorithms for caching blocks during full table scans are complex. For example, the database caches blocks differently depending on whether tables are small or large.
Oracle Database Concepts for an overview of the default caching mode
Oracle Database Reference to learn about the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter
8.2.2.3 Full Table Scan: Example
This example scans the hr.employees table.
The following statement queries monthly salaries over $4000:
The following plan was retrieved using the DBMS_XPLAN.DISPLAY_CURSOR function. Because no index exists on the salary column, the optimizer cannot use an index range scan, and so uses a full table scan.
8.2.3 Table Access by Rowid
A rowid is an internal representation of the storage location of data.
The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row because it specifies the exact location of the row in the database.
Rowids can change between versions. Accessing data based on position is not recommended because rows can move.
This section contains the following topics:
8.2.3.1 When the Optimizer Chooses Table Access by Rowid
In most cases, the database accesses a table by rowid after a scan of one or more indexes.
However, table access by rowid need not follow every index scan. If the index contains all needed columns, then access by rowid might not occur.
8.2.3.2 How Table Access by Rowid Works
To access a table by rowid, the database performs multiple steps.
The database does the following:
Obtains the rowids of the selected rows, either from the statement WHERE clause or through an index scan of one or more indexes
Table access may be needed for columns in the statement not present in the index.
Locates each selected row in the table based on its rowid
8.2.3.3 Table Access by Rowid: Example
This example demonstrates rowid access of the hr.employees table.
Assume that you run the following query:
Step 2 of the following plan shows a range scan of the emp_emp_id_pk index on the hr.employees table. The database uses the rowids obtained from the index to find the corresponding rows from the employees table, and then retrieve them. The BATCHED access shown in Step 1 means that the database retrieves a few rowids from the index, and then attempts to access rows in block order to improve the clustering and reduce the number of times that the database must access a block.
8.2.4 Sample Table Scans
A sample table scan retrieves a random sample of data from a simple table or a complex SELECT statement, such as a statement involving joins and views.
This section contains the following topics:
8.2.4.1 When the Optimizer Chooses a Sample Table Scan
The database uses a sample table scan when a statement FROM clause includes the SAMPLE keyword.
The SAMPLE clause has the following forms:
The database reads a specified percentage of rows in the table to perform a sample table scan.
SAMPLE BLOCK ( sample_percent )
The database reads a specified percentage of table blocks to perform a sample table scan.
The sample_percent specifies the percentage of the total row or block count to include in the sample. The value must be in the range .000001 up to, but not including, 100 . This percentage indicates the probability of each row, or each cluster of rows in block sampling, being selected for the sample. It does not mean that the database retrieves exactly sample_percent of the rows.
Block sampling is possible only during full table scans or index fast full scans. If a more efficient execution path exists, then the database does not sample blocks. To guarantee block sampling for a specific table or index, use the FULL or INDEX_FFS hint.
An access path is a technique used by a query to retrieve rows from a row source.
Читайте также: