Oracle чем заменить not in
I understand that the performance of our queries is improved when we use EXISTS and NOT EXISTS in the place of IN and NOT IN, however, is performance improved further when we replace NOT IN with an OUTER JOIN as opposed to NOT EXISTS?
For example, the following query selects all models from a PRODUCT table that are not in another table called PC. For the record, no model values in the PRODUCT or PC tables are null:
The following OUTER JOIN will display the same results:
Seeing as these both return the same values, which option should we use to better improve the performance of our queries?
The two should be the same. Oracle has a good optimizer and both of these are ways of expression an semi anti join.
"I understand that the performance of our queries is improved when we use EXISTS and NOT EXISTS in the place of IN and NOT IN". In some cases. In other cases IN or NOT IN perform better. Performance depends on the relative size of the driving query and the subquery.
Answers
Is there any reason not to use a foreign key on delete cascade (on T2) to do it automatically ?
As for the answer to which I would say it depends on volumes of T1 and T2(except for Minus, which i wouldn't use since there would be the need to scan T1 twice), but the not in option seems to be the most efficient with the change you plan to implement. Do you have any explain plan for your query ?
If you have HASH ANTI JOIN then it's probably the best plan in your case.
Edited by: B.Guillaumin on Jul 7, 2009 5:20 PM
There are always 2 answers that are always true:
- It depends
- Why?
In your case the first answer applies.
You need to TEST the best solution, there are not general rules regarding finding 'the best' one.
It depends on your data, selectivity, your indexes, statistics etc. etc.
Just measure (explain plan, trace, tkprof) and test the different scenario's and you'll find the best solution that works for you.
Ash_ wrote:
Dear All,
I have to run delete for parent and child records on production database which has very high volume of data (millions of records) on Oracle10g. Currently the scripts are using NOT INs to delete child records as below:
delete from T1 where T1.x is not null and T1.x not in (select x from T2);
I was planning to change it to :
delete from T1 where T1.x is not null and T1.x not in (select x from T2 where x is NOT NULL);
Could any one please help me with a better way of writing this ? Will it be faster if I use MINUS or NOT Exists instead on NOT IN?
Sorry to bother you all with the basic question but there doesn't seem to be specific rule for use of any one above (may be because Oracle10g uses CBO) ?.
Any help is appreciated and many thanks in advance.
In a previous project I worked on there was a very slow query of the form: It executed in 1/10 the time after I changed it to: Of course like always YMMV.
Eventually you may try the hint no_unnest, which may work reasonably if you have an index on custno in other_mast:
select custno from cust_mast cm
where not exists in (select /*+ no_unnest */ 1 from other_mast om where cm.custno=om.custno)
Anyway, an index on custno in other_mast will help.
The problem with "not in" is that for some thing to be determined as not being in a set of things, then everything in that set of things has to be checked. It's not like "in" or "exists" where you can stop looking as soon as you find a match.
As BEDE says, having an index on custno will help.
It depends what you're ultimately trying to achieve. MINUS, as user_ suggests is an option, but if you're actual query contains other columns then you could use a left outer join e.g.
It's not possible to an appropriate answer to this question, because you haven't supplied know enough information. It's possible that the query is already running as quickly as it could - without seeing the execution plan, knowing the version, know what indexes exist, whether or not custno is declared NOT NULL on either table, and what uniqueness constraints are in place, the volume of data and what you're doing to get the data into a client program from the server we can't say much about "fastest".
BluShadow has described why the the operation is necessary a relatively expensive one - but maybe Oracle is running extremely efficiently and the result set appears only slowly because you're fetching it one row at a time over a slow networks.
The problem with "not in" is that for some thing to be determined as not being in a set of things, then everything in that set of things has to be checked. It's not like "in" or "exists" where you can stop looking as soon as you find a match.
That makes no sense.
For any given condition, some rows will satisfy it and some won't, and the condition must be evaluated for ALL rows. For an "in" or "exists" conditions, you can stop looking as soon as you find a match - so that will save time for the rows that DO satisfy the condition, but it won't save any time for those that don't. To say with certainty that a row fails the condition, you still must check all the values in the IN list. Similarly for a "not in" or "not exists" condition, **you can stop looking as soon as you find a match** - that will save time for the rows that DO NOT satisfy the condition, but it won't save any time for those that do.
The work required by a "not in" condition is exactly the same as for the "in" condition that has the same "in" ("not in") list.
Apologies Mathguy, I wasn't clear (as usual LOL! 😀 )
If there is an index on the column in question then using IN will be able to use the index to lookup the value pretty efficiently. Whereas using NOT IN cannot use the index as efficiently and usually results in a full scan (depends on uniqueness of the value, so if a primary key it's more likely). At least that's what I was taught at Oracle University. Can't honestly say I've gone ahead and put it to the test; I usually avoid NOT IN for most queries unless I know it's small sets of data. 😉
Deepest regrets first of all
had ahemorage and was in ICu .
have logged in today only.
okay .. the issue is still eatinh me . I suspect some issue with datbase or privilees.
Actually I am trying to execute the query on DB2 written in oracle.
could that be a reason?
Request your attention at a last para here
the query is taking days to parse on DB2.
I know it is not a right forum to ask thsi question. stillll
Sorry to hear that you've been in ICU, and hope you're not being pushed back into work before you're fully recovered.
Since you're asking only about the last part of the UNION ALL, and since the display of the full SQL statement shows the "not in" clause commented out, is it safe to assume that the full query takes a reasonable time when the "not in" is removed, but a very long time when it present, and that you've satisfied yourself that the ONLY thing consuming a lot of time is that subquery clause.
I haven't read carefully through the whole query, but it looks as if you've got
and it's only in the last branch of the union all that you have the NOT IN clause.
We need to see the execution plan - possibly of a simplified statement, but I think the following may be happening (and it's only a guess at this point):
- the two subqueries are materializing (into global temporary tables) in the local database because they are used many times so most of the branches of the query are hash joins with aggregation of the two local "tables"
- the final branch is now between two local tables with a remote correlated subquery, so for every row in the join you are executing a remote select - and it's the repeated execution of the query that's taking the time.
Possibly you've tried testing the branch with the subquery by itself and not been able to find the problem - but with a single use of the factored subqueries they probably won't materialize, so the test query becomes a "fully remote" query that operates efficiently at DB2. (Again this is a guess about what you've done so far and what might have happened.)
If all this guesswork is correct then you could test to see what happens when you put the text of the subquery into a factored subquery with the /*+ materialize */ hint, viz:
then your "not in" predicate is just where (. ) not in (select lnrcode, prdacctid from not_on_subq)
You'll need to include at least one other branch of the UNION ALL when testing this, or you may find that this subquery materializes but the other two don't and again you'll have a distributed query doing something expensive (or you could put the materialize hint into all three factored subqueries.
If the materialization works you may still find you then have to control the execution plan with a couple more hints - for example forcing an UNNEST and HASH ANTIJOIN - and the options there will be affected by the version of Oracle. (You haven't said which it is).
First step, though - check the real execution plan (pulled from memory, NOT from "explain plan").
I have a following query and want to get rid of the "NOT EXISTS' clause without changing the end results.
WHERE NOT EXISTS (SELECT 1
WHERE B.c = A.c AND B.d = A.d AND B.e = A.e);
1 Answer 1
The query plan will tell you. It will depend on the data and tables. In the case of OUTER JOIN and NOT EXISTS they are the same.
However to your opening sentence, NOT IN and NOT EXISTS are not the same if NULL is accepted on model. In this case you say model cannot be null so you might find they all have the same plan anyway. However when making this assumption, the database must be told there cannot be nulls (using NOT NULL) as opposed to there simply not being any. If you don't it will make different plans for each query which may result in different performance depending on your actual data. This is generally true and particularly true for ORACLE which does not index NULLs.
Check out EXPLAIN PLAN
The queries result in the same result set, regardless of whether the columns are NULL . They are likely to have the same execution plan. Hence, they are very likely to be exactly equivalent.
Yes agreed that without nulls in the data the results will be the same regardless. However if there were nulls they would be different results. The DB needs to know that NULL is not an option otherwise the EXISTS and IN will not generate the same plan even though we know the result will be the same.
NOT EXISTS and LEFT JOIN / WHERE IS NULL are exactly the same. You may be confusing this with NOT IN and NOT EXISTS , where NULL handling is different.
If both objectid columns have indexs, which one is better in oracle?
Apart from speed, you should know that they don't produce the same resultset in every situation. You cannot always use them as substitutes in each scenario.
Best Answer
There are always 2 answers that are always true:
- It depends
- Why?
In your case the first answer applies.
You need to TEST the best solution, there are not general rules regarding finding 'the best' one.
It depends on your data, selectivity, your indexes, statistics etc. etc.
Just measure (explain plan, trace, tkprof) and test the different scenario's and you'll find the best solution that works for you.
3 Answers 3
The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work.
The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN, they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.
[NOT] IN and [NOT] EXISTS operators are processed differently. [NOT] IN is processed more like a join whereas [NOT] EXISTS is processed more like a loop with IF condition. Choosing one over another, of course, depends on a situation: on volume of data that driven and driving queries return. In the case of [NOT] IN operator inner query ( ..where id in (select id from table )) is a driving query whereas in the case of [NOT] EXISTS outer query is a driving query. So if the sub-query (inner query) returns small amount of data because of either a table in the sub-query contains small number of rows or there is an intensive filtering applied to the sub-query [NOT] IN operator may give better performance. If the sub-query returns large volume of data or the major filtering is happening in outer-query [NOT] EXISTS operator is preferable.
Both of them do a subquery. Faster would be:
The question was for a SELECT not a DELETE so there was no reason my answer should have been a DELETE. For a DELETE I would use a subquery DELETE FROM library_t lib_del WHERE lib_del.objectId in (select lib.objectID from . ) . The difference in using a subquery here is that it is not a co-related subquery. The examples in the question require going through the results of the subquery for what is not in the subquery which the database optimizer might not find an optimal way to do. With the DELETE above, most likely it will find an optimal way to do the DELETE.
Yes you are correct, this question is for a select. I was researching the cleanest way to perform a delete, read several other posts, and was still thinking about my original question when I read your answer. I have upvoted your answer as I think it is the cleanest and best solution (for both select and delete). I did apply the left join in my delete query, and after specifying the table name after the DELETE keyword, it worked well, just as left join has worked for many select queries I have written previously. I have removed my original comment as it is off-topic.
I have to run delete for parent and child records on production database which has very high volume of data (millions of records) on Oracle10g. Currently the scripts are using NOT INs to delete child records as below:
delete from T1 where T1.x is not null and T1.x not in (select x from T2);
I was planning to change it to :
delete from T1 where T1.x is not null and T1.x not in (select x from T2 where x is NOT NULL);
Could any one please help me with a better way of writing this ? Will it be faster if I use MINUS or NOT Exists instead on NOT IN?
Sorry to bother you all with the basic question but there doesn't seem to be specific rule for use of any one above (may be because Oracle10g uses CBO) ?.
Any help is appreciated and many thanks in advance.
Answers
You can usually use IN sub-queries to get the same results as EXISTS sub-queries:
An anti-join is another way, but it would probably be less efficient than either NOT IN or NOT EXISTS.
NOT IN and NOT EXISTS are not the same. Beware of NULL's !!
You need to provide a explanation on why do you want to eliminate the use of NOT EXISTS? If its related to performance how did you narrowed down that the problem is with the NOT EXISTS clause?
This is one example of why posting some sample data and expected results is so important.
Thanks Frank and Karthik for the reply.
I am encountering the performance issue while using the not in and not exists , that is the reason I want to get rid of these.
Is there any other way to convert this into in - line view and do it.
Note: The table A has around 100 million records , however the table B has only 24,000 records
For performance issues, see the Forum FAQ:
use a left join
FROM A left join B on B.c = A.c AND B.d = A.d AND B.e = A.e
where B.C is null and B.d is null and B.e is null
the ( + ) can behave odd with this
You haven't supplied an execution plan so there's no way for us to know how the optimizer has chosen to handle your query, so it's possible that there is a way to improve the performance of the code you have written. However, your requirement is to check FOR EVERY ROW IN A that a match does not exist in B, so your code HAS to access every row in A at some point - have you allowed for that volume of work when considering the performance of the query ?
If it were my problem I would first hope that the optimizer had transformed the query into a hash anti-join with B as the build table, minising the work to little more than a tablescan of A with a little CPU for each probe, so if the query runs with a filter subquery plan I'd want to find out why.
Thanks for the reply. I have attached the execution plan. Apology for not supplying the execution plan earlier.
ROWIDTOCHAR (ril.ROWID) rwid
FROM repl_item_loc ril,
WHERE ril.item = uil.item
TO_DATE (:ls_vdate, 'YYYYMMDD')
TO_DATE (:ls_vdate, 'YYYYMMDD')
AND ffswumt.uda_id = uil.uda_id
AND ffswumt.uda_value = uil.uda_value
AND ril.stock_cat = 'W'
AND ffswumt.store = ril.location
AND ffswumt.source_wh != ril.source_wh
FROM wh_uda_matrix ffwum
WHERE ffwum.uda_id = ffswumt.uda_id
Here the repl_item_loc has - 100 millions record
wh_uda_matrix has only 70 records. Without exists clause it is fetcing records in 5 min but not returning any value with the not exists clause.
I have attached the explain plan of the query.
What happens when you remove the index hint and run the query? Are you sure you need that hint here?
Or is it to avoid a full table scan (which may perform way faster than indexed reads)?
There's no execution plan attached.
Ideally you want to post the results of DBMS_XPLAN.DISPLAY_CURSOR, see below links to articles for step-by-step instructions:
Читайте также: