Oracle удалить дубликаты в таблице

For you to delete rows from a table, the table must be in your own schema or you must have the DELETE object privilege on the table.

For you to delete rows from an updatable materialized view, the materialized view must be in your own schema or you must have the DELETE object privilege on the materialized view.

For you to delete rows from the base table of a view, the owner of the schema containing the view must have the DELETE object privilege on the base table. Also, if the view is in a schema other than your own, you must have the DELETE object privilege on the view.

The DELETE ANY TABLE system privilege also allows you to delete rows from any table or table partition or from the base table of any view.

You must also have the SELECT object privilege on the object from which you want to delete if:

The object is on a remote database or

The SQL92_SECURITY initialization parameter is set to TRUE and the DELETE operation references table columns, such as the columns in a where_clause

Specify a comment that passes instructions to the optimizer on choosing an execution plan for the statement.

Use the FROM clause to specify the database objects from which you are deleting rows.

The ONLY syntax is relevant only for views. Use the ONLY clause if the view in the FROM clause belongs to a view hierarchy and you do not want to delete rows from any of its subviews.

Use this clause to specify the objects from which data is being deleted.

Specify the schema containing the table or view. If you omit schema , then Oracle Database assumes the table or view is in your own schema.

table | view | materialized view | subquery

Specify the name of a table, view, materialized view, or the column or columns resulting from a subquery, from which the rows are to be deleted.

When you delete rows from an updatable view, Oracle Database deletes rows from the base table.

You cannot delete rows from a read-only materialized view. If you delete rows from a writable materialized view, then the database removes the rows from the underlying container table. However, the deletions are overwritten at the next refresh operation. If you delete rows from an updatable materialized view that is part of a materialized view group, then the database also removes the corresponding rows from the master table.

If table or the base table of view or the master table of materialized_view contains one or more domain index columns, then this statements executes the appropriate indextype delete routine.

Oracle Data Cartridge Developer's Guide for more information on these routines

Issuing a DELETE statement against a table fires any DELETE triggers defined on the table.

All table or index space released by the deleted rows is retained by the table and index.

Specify the name of the partition or subpartition targeted for deletes within the object.

You need not specify the partition name when deleting values from a partitioned object. However, in some cases, specifying the partition name is more efficient than a complicated where_clause .

Specify the complete or partial name of a database link to a remote database where the object is located. You can delete rows from a remote object only if you are using Oracle Database distributed functionality.

If you omit dblink , then the database assumes that the object is located on the local database.

The subquery_restriction_clause lets you restrict the subquery in one of the following ways:

WITH READ ONLY Specify WITH READ ONLY to indicate that the table or view cannot be updated.

WITH CHECK OPTION Specify WITH CHECK OPTION to indicate that Oracle Database prohibits any changes to the table or view that would produce rows that are not included in the subquery. When used in the subquery of a DML statement, you can specify this clause in a subquery in the FROM clause but not in subquery in the WHERE clause.

CONSTRAINT constraint Specify the name of the CHECK OPTION constraint. If you omit this identifier, then Oracle automatically assigns the constraint a name of the form SYS_C n , where n is an integer that makes the constraint name unique within the database.

The table_collection_expression lets you inform Oracle that the value of collection_expression should be treated as a table for purposes of query and DML operations. The collection_expression can be a subquery, a column, a function, or a collection constructor. Regardless of its form, it must return a collection value—that is, a value whose type is nested table or varray. This process of extracting the elements of a collection is called collection unnesting .

The optional plus (+) is relevant if you are joining the TABLE expression with the parent table. The + creates an outer join of the two, so that the query returns rows from the outer table even if the collection expression is null.

In earlier releases of Oracle, when collection_expression was a subquery, table_collection_expression was expressed as THE subquery . That usage is now deprecated.

You can use a table_collection_expression in a correlated subquery to delete rows with values that also exist in another table.

collection_expression Specify a subquery that selects a nested table column from the object from which you are deleting.

Restrictions on the dml_table_expression_clause Clause This clause is subject to the following restrictions:

You cannot execute this statement if table or the base or master table of view or materialized_view contains any domain indexes marked IN_PROGRESS or FAILED.

You cannot insert into a partition if any affected index partitions are marked UNUSABLE .

You cannot specify the ORDER BY clause in the subquery of the DML_table_expression_clause .

You cannot delete from a view except through INSTEAD OF triggers if the defining query of the view contains one of the following constructs:

If you specify an index, index partition, or index subpartition that has been marked UNUSABLE , the DELETE statement will fail unless the SKIP_UNUSABLE_INDEXES initialization parameter has been set to true .

Use the where_clause to delete only rows that satisfy the condition. The condition can reference the object from which you are deleting and can contain a subquery. You can delete rows from a remote object only if you are using Oracle Database distributed functionality. Please refer to Chapter 7, "Conditions" for the syntax of condition .

If this clause contains a subquery that refers to remote objects, then the DELETE operation can run in parallel as long as the reference does not loop back to an object on the local database. However, if the subquery in the DML_table_expression_clause refers to any remote objects, then the DELETE operation will run serially without notification. Please refer to the parallel_clause in the CREATE TABLE documentation for additional information.

If you omit dblink , then the database assumes that the table or view is located on the local database.

If you omit the where_clause , then the database deletes all rows of the object.

t_alias Provide a correlation name for the table, view, materialized view, subquery, or collection value to be referenced elsewhere in the statement. This alias is required if the DML_table_expression_clause references any object type attributes or object type methods. Table aliases are generally used in DELETE statements with correlated queries.

This clause lets you return values from deleted columns, and thereby eliminate the need to issue a SELECT statement following the DELETE statement.

The returning clause retrieves the rows affected by a DML statement. You can specify this clause for tables and materialized views and for views with a single base table.

When operating on a single row, a DML statement with a returning_clause can retrieve column expressions using the affected row, rowid, and REFs to the affected row and store them in host variables or PL/SQL variables.

When operating on multiple rows, a DML statement with the returning_clause stores values from expressions, rowids, and REFs involving the affected rows in bind arrays.

expr Each item in the expr list must be a valid expression syntax.

INTO The INTO clause indicates that the values of the changed rows are to be stored in the variable(s) specified in data_item list.

data_item Each data_item is a host variable or PL/SQL variable that stores the retrieved expr value.

For each expression in the RETURNING list, you must specify a corresponding type-compatible PL/SQL variable or host variable in the INTO list.

Restrictions The following restrictions apply to the RETURNING clause:

The expr is restricted as follows:

For UPDATE and DELETE statements each expr must be a simple expression or a single-set aggregate function expression. You cannot combine simple expressions and single-set aggregate function expressions in the same returning_clause . For INSERT statements, each expr must be a simple expression. Aggregate functions are not supported in an INSERT statement RETURNING clause.

Single-set aggregate function expressions cannot include the DISTINCT keyword.

If the expr list contains a primary key column or other NOT NULL column, then the update statement fails if the table has a BEFORE UPDATE trigger defined on it.

You cannot specify the returning_clause for a multitable insert.

You cannot use this clause with parallel DML or with remote objects.

You cannot retrieve LONG types with this clause.

You cannot specify this clause for a view on which an INSTEAD OF trigger has been defined.

PL/SQL User's Guide and Reference for information on using the BULK COLLECT clause to return multiple values to collection variables

The error_logging_clause has the same behavior in DELETE statement as it does in an INSERT statement. Please refer to the INSERT statement error_logging_clause for more information.

Deleting Rows: Examples The following statement deletes all rows from the sample table oe.product_descriptions where the value of the language_id column is AR :

The following statement deletes from the sample table hr.employees purchasing clerks whose commission rate is less than 10%:

The following statement has the same effect as the preceding example, but uses a subquery:

Deleting Rows from a Remote Database: Example The following statement deletes specified rows from the locations table owned by the user hr on a database accessible by the database link remote :

Deleting Nested Table Rows: Example For an example that deletes nested table rows, please refer to "Table Collections: Examples".

Deleting Rows from a Partition: Example The following example removes rows from partition sales_q1_1998 of the sh.sales table:

Using the RETURNING Clause: Example The following example returns column salary from the deleted rows and stores the result in bind variable :bnd1 . The bind variable must already have been declared.

У меня нет первичного ключа в этой таблице .Но у меня уже есть вышеупомянутые записи в моей таблице. Я хочу удалить дубликаты записей, которые имеют одинаковое значение в полях EmpId и EmpSSN.

может ли кто-нибудь помочь мне создать запрос для удаления этих дубликатов записей

добавить первичный ключ (код ниже)

выполнить правильное удаление (код ниже)

подумайте, почему вы не хотите хранить этот первичный ключ.

предполагая MSSQL или совместимость:

Это очень просто. Я пробовал в SQL Server 2008

использовать номер строки, чтобы различать повторяющиеся записи. Сохраните номер первой строки для EmpID / EmpSSN и удалите остальные:

это обновит таблицу и удалить все дубликаты из таблицы!

и newtablename не будет иметь повторяющихся записей.

просто измените имя таблицы( newtablename ), нажав F2 в обозревателе объектов в sql server.

как сказал Джош, - даже если вы знаете дубликаты, удаление их будет невозможно, так как вы не можете ссылаться на конкретную запись, если она является точной копией другой записи.

код

объяснение

используйте внутренний запрос для создания представления над таблицей, которое включает поле на основе Row_Number() , секционированный теми столбцами, которые вы хотите быть уникальными.

удалить из результатов этого внутреннего запроса, выбрав все, что не имеет номер строки 1; т. е. дубликаты; не оригинал.

на order by предложение функции окна row_number необходимо для допустимый синтаксис; здесь можно поместить любое имя столбца. Если вы хотите изменить, какой из результатов рассматривается как дубликат(например, сохранить самый ранний или самый последний и т. д.), То используемые здесь столбцы имеют значение; т. е. вы хотите указать порядок, в котором запись, которую вы хотите сохранить, будет первой в результате.

Если вы не хотите создавать новый первичный ключ, вы можете использовать команду TOP в SQL Server:

Я тестирую что-то в Oracle и заполняю таблицу некоторыми образцами данных, но в процессе я случайно загрузил дубликаты записей, поэтому теперь я не могу создать первичный ключ, используя некоторые столбцы.

Как удалить все повторяющиеся строки и оставить только одну из них?

использовать rowid псевдостолбцом.

здесь column1 , column2 и column3 составьте идентифицирующий ключ для каждой записи. Вы можете перечислить все свои колонки.

(исправлена отсутствующая скобка)

где столбец1, столбец2 и т. д. это ключ, который вы хотите использовать.

создать таблицу t2 как выбрать distinct * from t1;

для выбора дубликатов только формат запроса может быть:

таким образом, правильный запрос в соответствии с другим предложением:

этот запрос сохранит самую старую запись в базе данных для критериев, выбранных в WHERE CLAUSE .

Oracle Certified Associate (2008)

использование self join -

1. решение

2. натра

3.решение

4. решение

5. решение

и вы также можете удалить дубликаты записей другим способом

вы должны сделать небольшой блок pl / sql, используя курсор для цикла и удалить строки, которые вы не хотите сохранять. Например:

самый быстрый способ для действительно больших таблиц

создать таблицу исключений со структурой ниже: exceptions_table

если количество строк для удаления велико, то создайте новую таблицу (со всеми грантами и индексами) антисоединение с exceptions_table по rowid и переименуйте исходную таблицу в таблицу original_dups и переименуйте new_table_with_no_dups в исходную таблицу

Проверьте ниже сценарии -

вы увидите здесь 6-записи.
4.выполнить запрос ниже -

вы увидите, что дубликаты записей были удалены.
Надеюсь, это решит ваш запрос. Спасибо :)

что-то в примечание:

1)мы проверяем только дублирование полей в предложении partition.

2) Если у вас есть причина выбрать один дубликат над другими, вы можете использовать предложение order by, чтобы эта строка имела row_number () = 1

3) Вы можете изменить номер дубликата, сохраненный изменение предложения final where на "Where RN > N" с N >= 1 (я думал, что N = 0 удалит все строки, которые имеют дубликаты, но он просто удалит все строки).

4) добавлено поле Sum partition запрос CTE, который будет помечать каждую строку числовыми строками в группе. Поэтому для выбора строк с дубликатами, включая первый элемент, используйте "где cnt > 1".

I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.

How can I delete all duplicate rows and leave only one of them?

24 Answers 24

Use the rowid pseudocolumn.

Where column1 , column2 , and column3 make up the identifying key for each record. You might list all your columns.

+1 I had to find two duplicate phone numbers buried in 12,000+ records. Changed the DELETE to SELECT and this found them in seconds. Saved me a ton of time, thank you.

This approach did not work for me. I don't know why. When I replaced "DELETE" with "SELECT *", it returned the rows I wanted to delete, but when I executed with "DELETE" it was just hanging indefinitely.

If the select works, but the delete does not, that might be due to the size of the resulting subquery. It might be interesting to first do a create table with the subquery result, build an index on the min(rowid) column, and then run the delete statement.

(fixed the missing parenthesis)

Where column1, column2, etc. is the key you want to use.

create table t2 as select distinct * from t1;

not an answer - distinct * will take every record which differs in at least 1 symbol in 1 column. All you need is to select distinct values only from columns you want to make primary keys - Bill's answer is great example of this approach.

Another disadvantage of this method is that you have to create a copy of your table. For huge tables, this implies providing additional tablespace, and deleting or shrinking the tablespace after the copy. Bill's method has more benefits, and no additional disadvantages.

You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:

I believe the downvote is because you are using PL/SQL when you can do it in SQL, incase you are wondering.

Just because you can do it in SQL, doesn't mean its the only solution. I posted this solution, after I had seen the SQL-only solution. I thought down votes were for incorrect answers.

To select the duplicates only the query format can be:

So the correct query as per other suggestion is:

This query will keep the oldest record in the database for the criteria chosen in the WHERE CLAUSE .

Oracle Certified Associate (2008)

This blog post was really helpful for general cases:

If the rows are fully duplicated (all values in all columns can have copies) there are no columns to use! But to keep one you still need a unique identifier for each row in each group. Fortunately, Oracle already has something you can use. The rowid. All rows in Oracle have a rowid. This is a physical locator. That is, it states where on disk Oracle stores the row. This unique to each row. So you can use this value to identify and remove copies. To do this, replace min() with min(rowid) in the uncorrelated delete:

The Fastest way for really big tables

Create exception table with structure below: exceptions_table

Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.

Join your table with exceptions_table by rowid and delete dups

If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table

Using self join-

dense rank with partition by gives the rank for duplicate rows with same number for example three rows having rank 1 , 1 , 1 and rowid create for every row as unic and we are trying to delete those rowids which are not matching.

1. solution

2. sloution

3.solution

4. solution

5. solution

and you can also delete duplicate records in another way

For best performance, here is what I wrote :
(see execution plan)

Check below scripts -

You will see that duplicate records have been deleted.
Hope this solves your query. Thanks :)

I didn't see any answers that use common table expressions and window functions. This is what I find easiest to work with.

Somethings to note:

1) We are only checking for duplication on the fields in the partition clause.

2) If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1

3) You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).

4) Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".

wikiHow работает по принципу вики, а это значит, что многие наши статьи написаны несколькими авторами. При создании этой статьи над ее редактированием и улучшением работали, в том числе анонимно, 13 человек(а).

Повторяющиеся строки в Oracle могут быть дифференцированы только с помощью их ‘RowId' (адрес строки).