Проверка целостности базы oracle
Oracle Database includes an advanced fault diagnosability infrastructure for collecting and managing diagnostic data, so as to diagnose and resolve database problems. Diagnostic data includes the trace files, dumps, and core files that are also present in previous releases, plus new types of diagnostic data that enable customers and Oracle Support to identify, investigate, track, and resolve problems quickly and effectively.
Oracle Database includes a fault diagnosability infrastructure for preventing, detecting, diagnosing, and resolving database problems.
You can use the Enterprise Manager Support Workbench (Support Workbench) to investigate and report a problem (critical error), and in some cases, resolve the problem. You can use a "roadmap" that summarizes the typical set of tasks that you must perform.
This section describes various methods to diagnose problems in an Oracle database.
Using the Enterprise Manager Support Workbench (Support Workbench), you can create, edit, and upload custom incident packages. With custom incident packages, you have fine control over the diagnostic data that you send to Oracle Support.
This section describes how to resolve database problems using advisor tools, such as SQL Repair Advisor and Data Recovery Advisor, and the resource management tools, such as the Resource Manager and related APIs.
Обнаружение ошибок в носителях
Повреждения в носителях могут возникать по массе причин, начиная от ошибки пользователя и неполадок в программном обеспечении операционной системы и заканчивая дефектными дисками, ошибками диспетчера логических томов ( Logical Volume Manager — LVM) и неисправными микросхемами памяти. Они могут приводить, в свою очередь, к возникновению повреждений в управляющих файлах, журналах повторного выполнения, словаре данных, табличных данных и данных индексов.
Применение утилиты DBVERIFY
При возникновении подозрений в повреждении блоков данных еще можно использовать поставляемую Oracle утилиту DBVERIFY. Эта утилита запускается на уровне операционной системы. Она выполняет проверку на предмет повреждения структурной целостности файлов базы данных.
Для иллюстрации применения утилиты DBVERIFY ниже приведен пример выполнения верификации файла на платформе Windows (на платформах UNIX команда будет работать точно так же). Администратор базы данных может легко писать для выполнения верификации файлов данных специальный сценарий и затем настраивать для него график регулярного выполнения с помощью crontab. В листинге 1 показаны результаты применения утилиты DBVERIFY.
Этот пример иллюстрирует упрощенный вариант применения утилиты DBVERIFY, которая вызывается командой DBV на платформах Windows и UNIX. Ключевое слово FILE указывает, какой файл данных требуется проверить на предмет повреждения. Согласно приведенному здесь выводу, общее количество страниц, помеченных как поврежденные (Total Pages Marked Corrupt), равняется нулю, а это значит, что в базе данных нет никаких проблем со структурной целостностью.
Overview of RMAN Validation
Validation enables you to check the integrity of your backups.
Purpose of RMAN Validation
The main purpose of RMAN validation is to check for corrupt blocks and missing files. You can also use RMAN to determine whether backups can be restored.
You can use the following RMAN commands to perform validation:
Basic Concepts of RMAN Validation
The database prevents operations that result in unusable backup files or corrupted restored data files.
The database automatically does the following:
Blocks access to data files while they are being restored or recovered
Permits only one restore operation for each data file at a time
Ensures that incremental backups are applied in the correct order
Stores information in backup files to allow detection of corruption
Checks a block every time it is read or written in an attempt to report a corruption as soon as it has been detected
About Checksums and Corrupt Blocks
A corrupt block is a block that has been changed so that it differs from what Oracle Database expects to find.
Block corruptions can be caused by several different failures including, but not limited to the following:
Faulty disks and disk controllers
Oracle Database software defects
DB_BLOCK_CHECKSUM is a database initialization parameter that controls the writing of checksums for the blocks in data files and online redo log files in the database (not backups). If DB_BLOCK_CHECKSUM is typical , then the database computes a checksum for each block during normal operations and stores it in the header of the block before writing it to disk. When the database reads the block from disk later, it recomputes the checksum and compares it to the stored value. If the values do not match, then the block is corrupt.
By default, the BACKUP command computes a checksum for each block and stores it in the backup. The BACKUP command ignores the values of DB_BLOCK_CHECKSUM because this initialization parameter applies to data files in the database, not backups.
About Physical and Logical Block Corruption
In a physical corruption, which is also called a media corruption, the database does not recognize the block at all: the checksum is invalid, the block contains all zeros, or the header and footer of the block do not match.
By default, the BACKUP command computes a checksum for each block and stores it in the backup. If you specify the NOCHECKSUM option, then RMAN does not perform a checksum of the blocks when creating the backup.
In a logical corruption , the contents of the block are logically inconsistent. Examples of logical corruption include corruption of a row piece or index entry. If RMAN detects logical corruption, then it logs the block in the alert log and server session trace file.
By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry, and log them in the alert log located in the Automatic Diagnostic Repository (ADR) . If you use RMAN with the following configuration when backing up or restoring files, then it detects all types of block corruption that are possible to detect:
In the initialization parameter file of a database, set DB_BLOCK_CHECKSUM=typical so that the database calculates data file checksums automatically (not for backups, but for data files in use by the database)
Do not precede the BACKUP command with SET MAXCORRUPT so that RMAN does not tolerate any unmarked block corruptions.
In a BACKUP command, do not specify the NOCHECKSUM option so that RMAN calculates a checksum when writing backups
In BACKUP and RESTORE commands, specify the CHECK LOGICAL option so that RMAN checks for logical and physical corruption
About Limits for Corrupt Blocks in RMAN Backups
You can use the SET MAXCORRUPT command to set the total number of unmarked corruptions permitted in a file for RMAN backups. The default is zero, meaning that RMAN does not tolerate unmarked corrupt blocks of any kind.
If the MAXCORRUPT limit is exceeded when RMAN encounters an unmarked corrupt block during a backup, then RMAN terminates the backup. Otherwise, RMAN writes the newly detected corrupt block to the backup with a special header indicating that the block is marked corrupt. You can use the VALIDATE command to determine which blocks are marked as corrupt and to find any unmarked corrupt blocks.
Because RMAN allows marked corrupt blocks in a backup, and because RMAN can be instructed to allow unmarked corrupt blocks to be marked as corrupt in the backup (when MAXCORRUPT is used), it is possible to restore a data file that has several blocks marked as corrupt. If you backup this restored data file (assuming no new corruptions have happened), even without MAXCORRUPT setting, the backup succeeds. This is because the previously marked corruptions do not stop RMAN from completing the backup.
About Detecting Block Corruption
Oracle Database supports different techniques for detecting, repairing, and monitoring block corruption.
The technique depends on whether the corruption is interblock corruption or intrablock corruption . In intrablock corruption, the corruption occurs within the block itself. This corruption can be either physical or logical. In an interblock corruption, the corruption occurs between blocks and can only be logical.
For example, the V$DATABASE_BLOCK_CORRUPTION view records intrablock corruptions, while the Automatic Diagnostic Repository (ADR) tracks all types of corruptions. Table 16-1 summarizes how the database treats different types of block corruption.
Table 16-1 Detection, Repair, and Monitoring of Block Corruption
All database utilities detect intrablock corruption, including RMAN (for example, the BACKUP command) and the DBVERIFY utility. If a database process can encounter the ORA-1578 error, then it can detect the corruption and monitor it.
Only DBVERIFY and the ANALYZE statement detect interblock corruption.
The V$DATABASE_BLOCK_CORRUPTION view displays blocks marked corrupt by Oracle Database components such as RMAN commands, ANALYZE , SQL queries, and so on. Any process that encounters an intrablock corruption records the block corruption in this view and in ADR.
The database monitors this type of block corruption in ADR.
Repair techniques include block media recovery , restoring data files, recovering with incremental backups, and block newing. Block media recovery can repair physical corruptions, but not logical corruptions.
Any RMAN command that fixes or detects that a block is repaired updates V$DATABASE_BLOCK_CORRUPTION . For example, RMAN updates the repository at end of successful block media recovery. If a BACKUP , RESTORE , or VALIDATE command detects that a block is no longer corrupted, then it removes the repaired block from the view.
You must fix interblock corruption using manual techniques such as dropping an object, rebuilding an index, and so on.
Настройка параметров инициализации
Установив такой параметр инициализации, как DB_BLOCK_CHECKSUM, можно заставить базу данных Oracle вычислять контрольные суммы (check-summing) для каждого блока данных и сохранять их в заголовках блоков. Тогда при чтении данных эти контрольные суммы сравниваются и выявляются поврежденные блоки данных. В Oracle рекомендуют оставить для параметра DB_BLOCK_CHECKSUM принятое для него по умолчанию значение TYPICAL (равнозначное значению TRUE, которое использовалось в предыдущих версиях). Согласно заявлениям Oracle, использование этой функции в режиме TYPICAL приводит к увеличению накладных расходов всего лишь на 1–2%. Применение ее в другом возможном режиме FULL приводит к увеличению накладных расходов уже на 4–5%.
Параметр DB_BLOCK_CHECKING является более сложным и предусматривает выполнение проверки блоков данных и индексов только тогда, когда они действительно изменяются. Он обнаруживает повреждения до присвоения блокам данных статуса поврежденных. По умолчанию для него используется значение OFF. Другие значения, которые он может принимать: LOW, MEDIUM и FULL. Его применение может приводить к увеличению объема накладных расходов на 1–10%; этот объем напрямую зависит от количества выполняемых в базе данных операций обновления и вставки. При наличии возможности справляться с дополнительными накладными расходами, в СУБД Oracle рекомендуют устанавливать для этого параметра значение FULL. Конфигурировать этот параметр можно в файле init.ora, как показано в следующем примере, где для него выбрано значение LOW:
Его также можно конфигурировать и динамически с помощью оператора ALTER SESSION:
Еще одним параметром инициализации, который можно устанавливать, является DB_ULTRA_SAFE. Этот параметр применяется для управления значениями параметров DB_BLOCK_CHECKSUM и DB_BLOCK_CHECKING. В случае если для него оставляется принятое по умолчанию значение (OFF), база данных устанавливает для обоих связанных с выявлением повреждений параметров значение TYPICAL, что означает выполнение минимальных проверок и, следовательно, меньшее потребление ресурсов ЦП. В случае же установки для него значения DATA_ONLY или DATA_AND_INDEX, база данных будет устанавливать для двух связанных с выявлением повреждений параметров значение FULL, что будет приводить к выполнению более интенсивных проверок на предмет повреждений и, следовательно, большему потреблению ресурсов.
Применение команды ANALYZE
Команду ANALYZE удобно применять для перехвата поврежденных блоков данных. Например, выполнение показанной ниже команды ANALYZE приведет к проверке каждого блока данных в таблице customer и, в случае обнаружения любых поврежденных блоков — добавлению всех подозрительных строк в таблицу invalid_rows:
Помимо выполнения проверки на предмет наличия поврежденных блоков, эта команда еще также проверит, соответствуют ли данные индекса данным таблицы.
9.1 About the Oracle Database Fault Diagnosability Infrastructure
Oracle Database includes a fault diagnosability infrastructure for preventing, detecting, diagnosing, and resolving database problems.
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. An incident is a single occurrence of a problem.
The fault diagnosability infrastructure consists of several components, including the Automatic Diagnostic Repository (ADR), various logs, trace files, the Enterprise Manager Support Workbench, and the ADRCI Command-Line Utility.
The Automatic Diagnostic Repository (ADR) is a directory structure that is stored outside of the database. It is therefore available for problem diagnosis when the database is down.
9.1.1 Fault Diagnosability Infrastructure Overview
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.
When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as trace files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR)—a file-based repository outside the database—where it can later be retrieved by incident number and analyzed.
The goals of the fault diagnosability infrastructure are the following:
Limiting damage and interruptions after a problem is detected
Reducing problem diagnostic time
Reducing problem resolution time
Simplifying customer interaction with Oracle Support
The keys to achieving these goals are the following technologies:
Automatic capture of diagnostic data upon first failure —For critical errors, the ability to capture error information at first-failure greatly increases the chance of a quick problem resolution and reduced downtime. An always-on memory-based tracing system proactively collects diagnostic data from many database components, and can help isolate root causes of problems. Such proactive diagnostic data is similar to the data collected by airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data. The data is stored in a repository that is outside the database (and therefore available when the database is down), and is easily accessible with command line utilities and Oracle Enterprise Manager Cloud Control (Cloud Control).
Standardized trace formats —Standardizing trace formats across all database components enables DBAs and Oracle Support personnel to use a single set of tools for problem analysis. Problems are more easily diagnosed, and downtime is reduced.
Health checks — Upon detecting a critical error, the fault diagnosability infrastructure can run one or more health checks to perform deeper analysis of a critical error. Health check results are then added to the other diagnostic data collected for the error. Individual health checks look for data block corruptions, undo and redo corruption, data dictionary corruption, and more. As a DBA, you can manually invoke these health checks, either on a regular basis or as required.
Incident packaging service (IPS) and incident packages — The IPS enables you to automatically and easily gather the diagnostic data—traces, dumps, health check reports, and more—pertaining to a critical error and package the data into a zip file for transmission to Oracle Support. Because all diagnostic data relating to a critical error are tagged with that error's incident number, you do not have to search through trace files and other files to determine the files that are required for analysis; the incident packaging service identifies the required files automatically and adds them to the zip file. Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package (package). Packages are stored in the Automatic Diagnostic Repository. If you choose to, you can access this intermediate logical structure, view and modify its contents, add or remove additional diagnostic data at any time, and when you are ready, create the zip file from the package. After these steps are completed, the zip file is ready to be uploaded to Oracle Support.
Data Recovery Advisor — The Data Recovery Advisor integrates with database health checks and RMAN to display data corruption problems, assess the extent of each problem (critical, high priority, low priority), describe the impact of a problem, recommend repair options, conduct a feasibility check of the customer-chosen option, and automate the repair process.
SQL Test Case Builder — For many SQL-related problems, obtaining a reproducible test case is an important factor in problem resolution speed. The SQL Test Case Builder automates the sometimes difficult and time-consuming process of gathering as much information as possible about the problem and the environment in which it occurred. After quickly gathering this information, you can upload it to Oracle Support to enable support personnel to easily and accurately reproduce the problem.
9.1.2 Incidents and Problems
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. An incident is a single occurrence of a problem.
To facilitate diagnosis and resolution of critical errors, the fault diagnosability infrastructure introduces two concepts for Oracle Database: problems and incidents.
It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the fault diagnosability infrastructure applies flood control to incident generation after certain thresholds are reached.
For any problem identified in a database instance, the diagnosability framework can identify related problems across the topology of your Oracle Database installation.
9.1.2.1 About Incidents and Problems
To facilitate diagnosis and resolution of critical errors, the fault diagnosability infrastructure introduces two concepts for Oracle Database: problems and incidents.
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. Critical errors manifest as internal errors, such as ORA-00600 , or other severe errors, such as ORA-07445 (operating system exception) or ORA-04031 (out of memory in the shared pool). Problems are tracked in the ADR. Each problem has a problem key , which is a text string that describes the problem. It includes an error code (such as ORA 600 ) and in some cases, one or more error parameters.
An incident is a single occurrence of a problem. When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the Automatic Diagnostic Repository (ADR). Each incident is identified by a numeric incident ID, which is unique within the ADR. When an incident occurs, the database:
Makes an entry in the alert log.
Sends an incident alert to Cloud Control.
Gathers first-failure diagnostic data about the incident in the form of dump files (incident dumps).
Tags the incident dumps with the incident ID.
Stores the incident dumps in an ADR subdirectory created for that incident.
Diagnosis and resolution of a critical error usually starts with an incident alert. Incident alerts are displayed on the Cloud Control Database Home page or Oracle Automatic Storage Management Home page. The Database Home page also displays in its Related Alerts section any critical alerts in the Oracle ASM instance or other Oracle products or components. After viewing an alert, you can then view the problem and its associated incidents with Cloud Control or with the ADRCI command-line utility.
9.1.2.2 Incident Flood Control
It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the fault diagnosability infrastructure applies flood control to incident generation after certain thresholds are reached.
A flood-controlled incident is an incident that generates an alert log entry, is recorded in the ADR, but does not generate incident dumps. Flood-controlled incidents provide a way of informing you that a critical error is ongoing, without overloading the system with diagnostic data. You can choose to view or hide flood-controlled incidents when viewing incidents with Cloud Control or the ADRCI command-line utility.
Threshold levels for incident flood control are predetermined and cannot be changed. They are defined as follows:
After five incidents occur for the same problem key in one hour, subsequent incidents for this problem key are flood-controlled. Normal (non-flood-controlled) recording of incidents for that problem key begins again in the next hour.
After 25 incidents occur for the same problem key in one day, subsequent incidents for this problem key are flood-controlled. Normal recording of incidents for that problem key begins again on the next day.
In addition, after 50 incidents for the same problem key occur in one hour, or 250 incidents for the same problem key occur in one day, subsequent incidents for this problem key are not recorded at all in the ADR. In these cases, the database writes a message to the alert log indicating that no further incidents will be recorded. As long as incidents continue to be generated for this problem key, this message is added to the alert log every ten minutes until the hour or the day expires. Upon expiration of the hour or day, normal recording of incidents for that problem key begins again.
9.1.2.3 Related Problems Across the Topology
For any problem identified in a database instance, the diagnosability framework can identify related problems across the topology of your Oracle Database installation.
In a single instance environment, a related problem could be identified in the local Oracle ASM instance. In an Oracle RAC environment, a related problem could be identified in any database instance or Oracle ASM instance on any other node. When investigating problems, you are able to view and gather information on any related problems.
A problem is related to the original problem if it occurs within a designated time period or shares the same execution context identifier. An execution context identifier (ECID) is a globally unique identifier used to tag and track a single call through the Oracle software stack, for example, a call to Oracle Fusion Middleware that then calls into Oracle Database to retrieve data. The ECID is typically generated in the middle tier and is passed to the database as an Oracle Call Interface (OCI) attribute. When a single call has failures on multiple tiers of the Oracle software stack, problems that are generated are tagged with the same ECID so that they can be correlated. You can then determine the tier on which the originating problem occurred.
9.1.3 Fault Diagnosability Infrastructure Components
The fault diagnosability infrastructure consists of several components, including the Automatic Diagnostic Repository (ADR), various logs, trace files, the Enterprise Manager Support Workbench, and the ADRCI Command-Line Utility.
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products.
The alert log is an XML file that is a chronological log of messages and errors.
Trace files, dumps, and core files contain diagnostic data that are used to investigate problems. They are stored in the ADR.
The data definition language (DDL) log is a file that has the same format and basic behavior as the alert log, but it only contains the DDL statements issued by the database.
An Oracle Database component can detect conditions, states, or events that are unusual, but which do not inhibit correct operation of the detecting component. The component can issue a warning about these conditions, states, or events. The debug log is a file that records these warnings.
In addition to files mentioned in the previous sections, the ADR contains health monitor reports, data repair records, SQL test cases, incident packages, and more. These components are described later in the chapter.
The Enterprise Manager Support Workbench (Support Workbench) is a facility that enables you to investigate, report, and in some cases, repair problems (critical errors), all with an easy-to-use graphical interface.
The ADR Command Interpreter (ADRCI) is a utility that enables you to investigate problems, view health check reports, and package first-failure diagnostic data, all within a command-line environment.
9.1.3.1 Automatic Diagnostic Repository (ADR)
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products.
The database, Oracle Automatic Storage Management (Oracle ASM), the listener, Oracle Clusterware, and other Oracle products or components store all diagnostic data in the ADR. Each instance of each product stores diagnostic data underneath its own home directory within the ADR. For example, in an Oracle Real Application Clusters environment with shared storage and Oracle ASM, each database instance and each Oracle ASM instance has an ADR home directory. ADR's unified directory structure, consistent diagnostic data formats across products and instances, and a unified set of tools enable customers and Oracle Support to correlate and analyze diagnostic data across multiple instances. With Oracle Clusterware, each host node in the cluster has an ADR home directory.
Because all diagnostic data, including the alert log, are stored in the ADR, the initialization parameters BACKGROUND_DUMP_DEST and USER_DUMP_DEST are deprecated. They are replaced by the initialization parameter DIAGNOSTIC_DEST , which identifies the location of the ADR.
Регулярное создание резервных копий производственной базы данных Oracle является обязательным, но эти резервные копии никак не помогут, если по какой-то причине окажутся непригодными для использования. Этап тестирования резервных копий часто игнорируется в процессах резервного копирования и восстановления. К сожалению, многие администраторы осознают его необходимость, будучи уже в тяжелых обстоятельствах.
Создаваемые резервные копии файлов базы данных могут становиться бесполезными во время восстановления по нескольким причинам: из-за повреждения файлов данных и журналов повторного выполнения, из-за случайного перезаписывания файлов, из-за дефектов на ленте и даже из-за несуществующих файлов. Поэтому лучше обзавестись привычкой регулярно тестировать производственные резервные копии в соответствие с графиком. Это поможет перехватывать любые повреждения данных. Под повреждением подразумевается несоответствие данных тому виду, в котором они должны быть. Здесь интерес главным образом представляет так называемое повреждение блоков (ошибки блоков), которое может быть как логическим, так и физическим.
Инициатива HARD
Применение RAID обеспечивает избыточность только на уровне устройств хранения данных, чтобы позволить при потере нескольких дисков не терять данные. А что если используется система с зеркальным отображением дисков, но данные, записываемые на зеркальную пару дисков, повреждены? Тогда на обоих дисках в зеркальной паре, конечно же, будут содержаться поврежденные данные. Поэтому в Oracle недавно объявили о новой инициативе для предотвращения повреждения данных еще до его возникновения, которая получила название Hardware Assisted Resilient Data (Обеспечение устойчивости данных на уровне аппаратных средств), или просто HARD. В рамках этой инициативы Oracle будет встраивать в устройства хранения, продаваемые участвующими в этой инициативе производителями, специальные алгоритмы верификации данных и тем самым предотвращать окончательную запись поврежденных данных на диск. В частности, инициатива HARD направлена на решение проблем следующего рода:
DBVERIFY is an external command-line utility that performs a physical data structure integrity check.
DBVERIFY can be used on offline or online databases, as well on backup files. You use DBVERIFY primarily when you need to ensure that a backup database (or data file) is valid before it is restored, or as a diagnostic aid when you have encountered data corruption problems. Because DBVERIFY can be run against an offline database, integrity checks are significantly faster.
DBVERIFY checks are limited to cache-managed blocks (that is, data blocks). Because DBVERIFY is only for use with data files, it does not work against control files or redo logs.
There are two command-line interfaces to DBVERIFY. With the first interface, you specify disk blocks of a single data file for checking. With the second interface, you specify a segment for checking. Both interfaces are started with the dbv command. The following sections provide descriptions of these interfaces:
Parent topic: Other Utilities
Checking for Block Corruption with the VALIDATE Command
You can use the VALIDATE command to manually check for physical and logical corruptions in database files.
This command performs the same types of checks as BACKUP VALIDATE , but VALIDATE can check a larger selection of objects. For example, you can validate individual blocks with the VALIDATE DATAFILE . BLOCK command.
To specify a copy number for the backup piece being validated, run the VALIDATE FROM COPY NUMBER command.
When validating whole files, RMAN checks every block of the input files. If the backup validation discovers previously unmarked corrupt blocks, then RMAN updates the V$DATABASE_BLOCK_CORRUPTION view with rows describing the corruptions.
Use VALIDATE BACKUPSET when you suspect that one or more backup pieces in a backup set are missing or have been damaged. This command checks every block in a backup set to ensure that the backup can be restored. If RMAN finds block corruption, then it issues an error and terminates the validation. The command VALIDATE BACKUPSET enables you to choose which backups to check, whereas the VALIDATE option of the RESTORE command lets RMAN choose.
To use VALIDATE to check database files and backups:
Start RMAN and connect to a target database.
Execute the VALIDATE command with the desired options.
For example, to validate all data files and control files (and the server parameter file if one is in use), execute the following command at the RMAN prompt:
Alternatively, you can validate a particular backup set by using the form of the command shown in the following example (sample output included).
The following example illustrates how you can check individual data blocks within a data file for corruption.
Make Parallel the Validation of a Data File
If you must validate a large data file, then RMAN can make the work parallel by dividing the file into sections and processing each file section in parallel. If multiple channels are configured or allocated, and if you want the channels to make parallel the validation, then specify the SECTION SIZE parameter of the VALIDATE command.
If you specify a section size that is larger than the size of the file, then RMAN does not create file sections. If you specify a small section size that would produce more than 256 sections, then RMAN increases the section size to a value that results in exactly 256 sections.
To make parallel the validation of a data file:
- Start RMAN and connect to a target database. The target database must be mounted or open.
- Run VALIDATE with the SECTION SIZE parameter.
The following example allocates two channels and validates a large data file. The section size is 1200 MB.
Применение пакета DBMS_REPAIR
Несмотря на то что утилита DBVERIFY очень проста в применении, использовать ее для исправления поврежденных данных нельзя, а это является очень серьезным ограничением. Поэтому еще в версии Oracle8i появился пакет DBMS_REPAIR, позволяющий не только выявлять, но и исправлять поврежденные блоки данных без перевода файлов данных в автономный режим. Прежде чем использовать этот пакет, нужно войти в систему от имени пользователя SYS и создать две специальные таблицы: одну с приставкой repair_ и вторую с именем orphan_key.
После создания таблицы repair_table пакет DBMS_REPAIR можно запускать. В эту таблицу будет заноситься информация обо всех поврежденных данных. Выполнение содержащейся в пакете DBMS_REPAIR процедуры CHECK_OBJECT будет приводить к выявлению поврежденных блоков и отображению рекомендуемых вариантов для их исправления, а выполнение после процедуры CHECK_OBJECT запроса к таким столбцам таблицы repair_table, как OBJECT_NAME и CORRUPT_DESCRIPTION — выяснить, существуют ли повреждения в блоках данных, и если да, то какого типа.
Различные способы исправления поврежденных блоков данных будут рассматриваться в следующей главе, поскольку они подразумевают проведение восстановления базы данных из резервных копий.
Обнаружение ошибок в блоках данных
Ошибки блоков данных происходят при появлении несогласованных данных в таблицах или индексах СУБД Oracle. Обычно невозможность устранить поврежденные блоки приводит к потере приличного количества данных. Хотя можно предпринимать несколько мер для предотвращения повреждения, своевременное обнаружение поврежденных файлов данных будет тоже помогать.
- Своевременное обнаружение позволяет отыскивать быстрые способы спасения всех или насколько возможно большей части пострадавших данных.
- Своевременное обнаружение избавляет от сюрпризов при выполнении восстановления после ошибок на уровне носителей, поскольку будет сводить к минимуму проблему, позволяя переводить файлы в автономный режим и тем самым сокращать потенциальный объем ущерба.
Существует несколько методов, которые можно применять для обнаружения повреждений в блоках данных БД Oracle. Во-первых, можно устанавливать несколько специальных параметров инициализации и тем самым обеспечивать возможность перехвата информации о поврежденных блоках. Во-вторых, можно использовать утилиты наподобие DBVERIFY и DBMS_REPAIR или команду ANALYZE и тем самым обеспечивать возможность выявления повреждений в блоках данных. Эти методы не являются взаимоисключающими; напротив, их следует рассматривать как дополнения друг к другу, поскольку каждый обладает своими собственными привлекательными возможностями. В следующих подразделах более подробно рассказывается о том, как применять каждый из этих приемов.
20.1 Using DBVERIFY to Validate Disk Blocks of a Single Data File
In this mode, DBVERIFY scans one or more disk blocks of a single data file and performs page checks.
If the file you are verifying is an Oracle Automatic Storage Management (Oracle ASM) file, then you must supply a USERID . This is because DBVERIFY needs to connect to an Oracle instance to access Oracle ASM files.
20.1.1 DBVERIFY Syntax When Validating Blocks of a Single File
The syntax for DBVERIFY when you want to validate disk blocks of a single data file is as follows:
20.1.2 DBVERIFY Parameters When Validating Blocks of a Single File
Descriptions of the DBVERIFY parameters used to validate blocks of a single file are as follows:
Specifies your username and password.
This parameter is only necessary when the files being verified are Oracle ASM files.
If you do specify this parameter, both a username and a password must be entered; otherwise a DBV-00112: USERID incorrectly specified error is returned.
The name of the database file to verify.
The starting block address to verify. Specify block addresses in Oracle blocks (as opposed to operating system blocks). If you do not specify START , then DBVERIFY defaults to the first block in the file.
The ending block address to verify. If you do not specify END , then DBVERIFY defaults to the last block in the file.
BLOCKSIZE is required only if the file to be verified does not have a block size of 2 KB. If the file does not have block size of 2 KB and you do not specify BLOCKSIZE , then you will receive the error DBV-00103.
When a value is specified for HIGH_SCN , DBVERIFY writes diagnostic messages for each block whose block-level SCN exceeds the value specified.
This parameter is optional. There is no default.
Specifies the file to which logging information should be written. The default sends output to the terminal display.
Causes DBVERIFY to send a progress display to the terminal in the form of a single period (.) for n number of pages verified during the DBVERIFY run. If n = 0, then there is no progress display.
Provides online help.
Specifies the name of the parameter file to use. You can store various values for DBVERIFY parameters in flat files. This enables you to customize parameter files to handle different types of data files and to perform specific types of integrity checks on data files.
20.1.3 Sample DBVERIFY Output For a Single Data File
The following is a sample verification of the file t_db1.dbf. The feedback parameter has been given the value 100 to display one period (.) for every 100 pages processed. A portion of the resulting output is also shown.
Total Pages Examined = number of blocks in the file
Total Pages Processed = number of blocks that were verified (formatted blocks)
Total Pages Failing (Data) = number of blocks that failed the data block checking routine
Total Pages Failing (Index) = number of blocks that failed the index block checking routine
Total Pages Marked Corrupt = number of blocks for which the cache header is invalid, thereby making it impossible for DBVERIFY to identify the block type
Total Pages Influx = number of blocks that are being read and written to at the same time. If the database is open when DBVERIFY is run, then DBVERIFY reads blocks multiple times to get a consistent image. But because the database is open, there may be blocks that are being read and written to at the same time (INFLUX). DBVERIFY cannot get a consistent image of pages that are in flux.
9.1 About the Oracle Database Fault Diagnosability Infrastructure
Oracle Database includes a fault diagnosability infrastructure for preventing, detecting, diagnosing, and resolving database problems.
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. An incident is a single occurrence of a problem.
The fault diagnosability infrastructure consists of several components, including the Automatic Diagnostic Repository (ADR), various logs, trace files, the Enterprise Manager Support Workbench, and the ADRCI Command-Line Utility.
The Automatic Diagnostic Repository (ADR) is a directory structure that is stored outside of the database. It is therefore available for problem diagnosis when the database is down.
9.1.1 Fault Diagnosability Infrastructure Overview
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption.
When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as trace files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR)—a file-based repository outside the database—where it can later be retrieved by incident number and analyzed.
The goals of the fault diagnosability infrastructure are the following:
Limiting damage and interruptions after a problem is detected
Reducing problem diagnostic time
Reducing problem resolution time
Simplifying customer interaction with Oracle Support
The keys to achieving these goals are the following technologies:
Automatic capture of diagnostic data upon first failure —For critical errors, the ability to capture error information at first-failure greatly increases the chance of a quick problem resolution and reduced downtime. An always-on memory-based tracing system proactively collects diagnostic data from many database components, and can help isolate root causes of problems. Such proactive diagnostic data is similar to the data collected by airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data. The data is stored in a repository that is outside the database (and therefore available when the database is down), and is easily accessible with command line utilities and Oracle Enterprise Manager Cloud Control (Cloud Control).
Standardized trace formats —Standardizing trace formats across all database components enables DBAs and Oracle Support personnel to use a single set of tools for problem analysis. Problems are more easily diagnosed, and downtime is reduced.
Health checks — Upon detecting a critical error, the fault diagnosability infrastructure can run one or more health checks to perform deeper analysis of a critical error. Health check results are then added to the other diagnostic data collected for the error. Individual health checks look for data block corruptions, undo and redo corruption, data dictionary corruption, and more. As a DBA, you can manually invoke these health checks, either on a regular basis or as required.
Incident packaging service (IPS) and incident packages — The IPS enables you to automatically and easily gather the diagnostic data—traces, dumps, health check reports, and more—pertaining to a critical error and package the data into a zip file for transmission to Oracle Support. Because all diagnostic data relating to a critical error are tagged with that error's incident number, you do not have to search through trace files and other files to determine the files that are required for analysis; the incident packaging service identifies the required files automatically and adds them to the zip file. Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package (package). Packages are stored in the Automatic Diagnostic Repository. If you choose to, you can access this intermediate logical structure, view and modify its contents, add or remove additional diagnostic data at any time, and when you are ready, create the zip file from the package. After these steps are completed, the zip file is ready to be uploaded to Oracle Support.
Data Recovery Advisor — The Data Recovery Advisor integrates with database health checks and RMAN to display data corruption problems, assess the extent of each problem (critical, high priority, low priority), describe the impact of a problem, recommend repair options, conduct a feasibility check of the customer-chosen option, and automate the repair process.
SQL Test Case Builder — For many SQL-related problems, obtaining a reproducible test case is an important factor in problem resolution speed. The SQL Test Case Builder automates the sometimes difficult and time-consuming process of gathering as much information as possible about the problem and the environment in which it occurred. After quickly gathering this information, you can upload it to Oracle Support to enable support personnel to easily and accurately reproduce the problem.
9.1.2 Incidents and Problems
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. An incident is a single occurrence of a problem.
To facilitate diagnosis and resolution of critical errors, the fault diagnosability infrastructure introduces two concepts for Oracle Database: problems and incidents.
It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the fault diagnosability infrastructure applies flood control to incident generation after certain thresholds are reached.
For any problem identified in a database instance, the diagnosability framework can identify related problems across the topology of your Oracle Database installation.
9.1.2.1 About Incidents and Problems
To facilitate diagnosis and resolution of critical errors, the fault diagnosability infrastructure introduces two concepts for Oracle Database: problems and incidents.
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. Critical errors manifest as internal errors, such as ORA-00600 , or other severe errors, such as ORA-07445 (operating system exception) or ORA-04031 (out of memory in the shared pool). Problems are tracked in the ADR. Each problem has a problem key , which is a text string that describes the problem. It includes an error code (such as ORA 600 ) and in some cases, one or more error parameters.
An incident is a single occurrence of a problem. When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the Automatic Diagnostic Repository (ADR). Each incident is identified by a numeric incident ID, which is unique within the ADR. When an incident occurs, the database:
Makes an entry in the alert log.
Sends an incident alert to Cloud Control.
Gathers first-failure diagnostic data about the incident in the form of dump files (incident dumps).
Tags the incident dumps with the incident ID.
Stores the incident dumps in an ADR subdirectory created for that incident.
Diagnosis and resolution of a critical error usually starts with an incident alert. Incident alerts are displayed on the Cloud Control Database Home page or Oracle Automatic Storage Management Home page. The Database Home page also displays in its Related Alerts section any critical alerts in the Oracle ASM instance or other Oracle products or components. After viewing an alert, you can then view the problem and its associated incidents with Cloud Control or with the ADRCI command-line utility.
9.1.2.2 Incident Flood Control
It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the fault diagnosability infrastructure applies flood control to incident generation after certain thresholds are reached.
A flood-controlled incident is an incident that generates an alert log entry, is recorded in the ADR, but does not generate incident dumps. Flood-controlled incidents provide a way of informing you that a critical error is ongoing, without overloading the system with diagnostic data. You can choose to view or hide flood-controlled incidents when viewing incidents with Cloud Control or the ADRCI command-line utility.
Threshold levels for incident flood control are predetermined and cannot be changed. They are defined as follows:
After five incidents occur for the same problem key in one hour, subsequent incidents for this problem key are flood-controlled. Normal (non-flood-controlled) recording of incidents for that problem key begins again in the next hour.
After 25 incidents occur for the same problem key in one day, subsequent incidents for this problem key are flood-controlled. Normal recording of incidents for that problem key begins again on the next day.
In addition, after 50 incidents for the same problem key occur in one hour, or 250 incidents for the same problem key occur in one day, subsequent incidents for this problem key are not recorded at all in the ADR. In these cases, the database writes a message to the alert log indicating that no further incidents will be recorded. As long as incidents continue to be generated for this problem key, this message is added to the alert log every ten minutes until the hour or the day expires. Upon expiration of the hour or day, normal recording of incidents for that problem key begins again.
9.1.2.3 Related Problems Across the Topology
For any problem identified in a database instance, the diagnosability framework can identify related problems across the topology of your Oracle Database installation.
In a single instance environment, a related problem could be identified in the local Oracle ASM instance. In an Oracle RAC environment, a related problem could be identified in any database instance or Oracle ASM instance on any other node. When investigating problems, you are able to view and gather information on any related problems.
A problem is related to the original problem if it occurs within a designated time period or shares the same execution context identifier. An execution context identifier (ECID) is a globally unique identifier used to tag and track a single call through the Oracle software stack, for example, a call to Oracle Fusion Middleware that then calls into Oracle Database to retrieve data. The ECID is typically generated in the middle tier and is passed to the database as an Oracle Call Interface (OCI) attribute. When a single call has failures on multiple tiers of the Oracle software stack, problems that are generated are tagged with the same ECID so that they can be correlated. You can then determine the tier on which the originating problem occurred.
9.1.3 Fault Diagnosability Infrastructure Components
The fault diagnosability infrastructure consists of several components, including the Automatic Diagnostic Repository (ADR), various logs, trace files, the Enterprise Manager Support Workbench, and the ADRCI Command-Line Utility.
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products.
The alert log is an XML file that is a chronological log of messages and errors.
Trace files, dumps, and core files contain diagnostic data that are used to investigate problems. They are stored in the ADR.
The data definition language (DDL) log is a file that has the same format and basic behavior as the alert log, but it only contains the DDL statements issued by the database.
An Oracle Database component can detect conditions, states, or events that are unusual, but which do not inhibit correct operation of the detecting component. The component can issue a warning about these conditions, states, or events. The debug log is a file that records these warnings.
In addition to files mentioned in the previous sections, the ADR contains health monitor reports, data repair records, SQL test cases, incident packages, and more. These components are described later in the chapter.
The Enterprise Manager Support Workbench (Support Workbench) is a facility that enables you to investigate, report, and in some cases, repair problems (critical errors), all with an easy-to-use graphical interface.
The ADR Command Interpreter (ADRCI) is a utility that enables you to investigate problems, view health check reports, and package first-failure diagnostic data, all within a command-line environment.
9.1.3.1 Automatic Diagnostic Repository (ADR)
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products.
The database, Oracle Automatic Storage Management (Oracle ASM), the listener, Oracle Clusterware, and other Oracle products or components store all diagnostic data in the ADR. Each instance of each product stores diagnostic data underneath its own home directory within the ADR. For example, in an Oracle Real Application Clusters environment with shared storage and Oracle ASM, each database instance and each Oracle ASM instance has an ADR home directory. ADR's unified directory structure, consistent diagnostic data formats across products and instances, and a unified set of tools enable customers and Oracle Support to correlate and analyze diagnostic data across multiple instances. With Oracle Clusterware, each host node in the cluster has an ADR home directory.
Because all diagnostic data, including the alert log, are stored in the ADR, the initialization parameters BACKGROUND_DUMP_DEST and USER_DUMP_DEST are deprecated. They are replaced by the initialization parameter DIAGNOSTIC_DEST , which identifies the location of the ADR.
Регулярное создание резервных копий производственной базы данных Oracle является обязательным, но эти резервные копии никак не помогут, если по какой-то причине окажутся непригодными для использования. Этап тестирования резервных копий часто игнорируется в процессах резервного копирования и восстановления. К сожалению, многие администраторы осознают его необходимость, будучи уже в тяжелых обстоятельствах.
Создаваемые резервные копии файлов базы данных могут становиться бесполезными во время восстановления по нескольким причинам: из-за повреждения файлов данных и журналов повторного выполнения, из-за случайного перезаписывания файлов, из-за дефектов на ленте и даже из-за несуществующих файлов. Поэтому лучше обзавестись привычкой регулярно тестировать производственные резервные копии в соответствие с графиком. Это поможет перехватывать любые повреждения данных. Под повреждением подразумевается несоответствие данных тому виду, в котором они должны быть. Здесь интерес главным образом представляет так называемое повреждение блоков (ошибки блоков), которое может быть как логическим, так и физическим.
20.2 Using DBVERIFY to Validate a Segment
In this mode, DBVERIFY enables you to specify a table segment or index segment for verification. It checks to ensure that a row chain pointer is within the segment being verified.
This mode requires that you specify a segment (data or index) to be validated. It also requires that you log on to the database with SYSDBA privileges, because information about the segment must be retrieved from the database.
During this mode, the segment is locked. If the specified segment is an index, then the parent table is locked. Note that some indexes, such as IOTs, do not have parent tables.
20.2.1 DBVERIFY Syntax When Validating a Segment
The syntax for DBVERIFY when you want to validate a segment is as follows:
Команда ANALYZE используется для проверки блоков данных в анализируемом объекте. Если обнаружено повреждение, информация о нем помещается в таблицу INVALID_ROWS.
DB_BLOCK_CHECKING
Когда этот параметр установлен в значение TRUE, Oracle проверяет корректность содержимого блока. Но не все так просто с этим параметром. Проверка эта потребляет довольно много ресурсов, следовательно установка этого параметра в значение TRUE должно быть хорошо продумана.
RMAN VALIDATE
Для Oracle 11g с помощью RMAN можно проверить datafiles, tablespaces или всю базу данных следующим образом:
Все поврежденные блоки можно увидеть в представлении V$DATABASE_BLOCK_CORRUPTION.
DBMS_REPAIR
- этот пакет позволяет обнаружить и устранить коррупцию.
Назначение основных процедур в пакете DBMS_REPAIR:
- CHECK_OBJECT – проверка и сбор информации о поврежденных блоках в таблице или индексе
- FIX_CORRUPT_BLOCKS - помечает выявленные с помощью CHECK_OBJECT блоки, как логически поврежденные
- DUMP_ORPHAN_KEYS – выгружает индексные ключи, указывающие на поврежденные блоки.
- REBUILD_FREELISTS - перестраивает списки свободных блоков
- SKIP_CORRUPT_BLOCKS - позволяет пропускать блоки, отмеченные как поврежденные, во время сканирования таблицы или индекса. Если не использовать, то можно получить ошибку ORA-01578.
Сначала нужно создать две таблицы, в которые будет помещаться информация о поврежденных блоках и индексных ключах, указывающих на эти блоки. Таблицы создаются следующим образом:
Проверка таблицы проводится вот так:
После чего получаем перечень битых блоков:
Если коррупция была обнаружена, то колонки CORRUPTION_DESCRIPTION и REPAIR_DESCRIPTION в таблице REPAIR_TABLE содержат полную информацию о поврежденных блоках.
Выявленные блоки теперь нужно пометить как испорченные, чтобы команды DML их пропускали и корректно отрабатывали. Это делается с помощью процедуры FIX_CORRUPT_BLOCKS
Если в таблице были обнаружены поврежденные блоки, то можно попытаться спасти информацию, которая содержится в индексах, ссылающихся на битые блоки.
А теперь можно просмотреть висячие ключи:
Если были обнаружены висячие ключи, то индекс обязательно нужно перестроить.
Для перестройки списка свободных блоков используется процедура DBMS_REPAIR.rebuild_freelists. Эта процедура просканирует таблицу и свободные блоки поместит в freelists. Все блоки, помеченные как битые, при сканировании будут пропущены.
Чтобы DML команды игнорировали блоки, помеченные как битые, используется процедура DBMS_REPAIR.skip_corrupt_blocks
По колонке SKIP_CORRUPT в представлении DBA_TABLES можно судить насколько успешным было выполнение этой процедуры.
После манипуляций, изложенных выше, таблица стала доступной, с ней можно работать. Но не забывайте, что нужно устранить потерю данных, попавших в коррумпированные блоки.
This chapter explains how to check the integrity of database files and backups. This chapter contains the following topics:
Читайте также:
- Определите какие слова из левого столбика соответствуют понятиям из правого фломастер компьютер
- 3 интерфейсы vga dvi hdmi вид разъема нумерация контактов назначение сигналов
- Межсетевой экран брандмауэр это программное или аппаратное обеспечение
- Aiolog что это за файл
- При зарядке планшета мигает экран и не включается