Distinct vs Group By performance oracle can be changed due to several conditions. Generally, Both are Distinct and Group By clauses doing similar work. But then why do we have two? Yes, that is not related to performance. purposely both clauses work differently. I here try to explain distinct vs group by performance oracle.
Oracle Both DISTINCT
and GROUP BY
clauses return unique values from the result set. Performance can be different because of context of usability is different for each. Anyway, I here try to compare performance in both keywords and explain about intention of keywords.
DISTINCT
: The DISTINCT
a keyword returns unique values on a single or multiple columns without any aggregate functions.
-- Syntax SELECT DISTINCT column_name FROM table; --
Performance Considerations:
- Objectives: Distinct is well suited to simple queries. Most of the time DISTINCT is used for single columns. DISTINCT can be easily used for such cases. But it’s a bit hard to use in complex queries.
- Performance: Performance is nice on simple cases as we described earlier. In complex cases, GROUP BY will perform well.
GROUP BY
: The GROUP BY
a clause is used to take unique values BUT with some aggregate values such as COUNT
, SUM
, AVG
, etc.
-- Syntax SELECT column_name, COUNT(*) FROM able GROUP BY column_name; --
Performance Considerations:
- Aggregation:
GROUP BY
is suited to some complex queries such as taking some values using aggregate functions. This is used to calculate output in each group. - Multiple Columns: This is flexible compared to DISTINCT.
- Performance: Aggregate function calculates efficiently when using GROUP BY clause. This is performed in complex queries compared to DISTINCT.
Performance Comparison:
I would like to do a comparison for a small example. I here take simple queries for distinct vs group by performance oracle. I create an ORDERS table with column product_category and model_type and I hope to retrieve distinct product categories and types.
-- Using DISTINCT SELECT DISTINCT product_category, model_type FROM Orders; Versus: -- Using GROUP BY SELECT product_category, model_type, COUNT(*) FROM Orders GROUP BY product_category, model_type;
In this small case, I like to take distinct values in the first query and take the unique values from the second query. But we need an aggregate function on the second query. I will create a sample table and add a sample record from the below query.
-- create table CREATE TABLE Orders( order_id VARCHAR2(50) , product_category DATE , model_type DATE , quantity NUMBER, PRIMARY KEY(order_id) ); -- Insert sample records declare -- insert 100000 records into database i integer:=0; type namesarray IS VARRAY(10) OF VARCHAR2(50); names namesarray; names1 namesarray; begin names := namesarray('Type_1', 'Type_2', 'Type_3', 'Type_4', 'Type_5','Type_6','Type_7','Type_8','Type_9','Type_10'); names1 := namesarray('Cat_1', 'Cat_2', 'Cat_3', 'Cat_4', 'Cat_5','Cat_6','Cat_7','Cat_8','Cat_9','Cat_10'); LOOP i:= i+1; INSERT INTO Orders (order_id, product_category, model_type,quantity) VALUES ('Po-'||i, names1(round(dbms_random.value(1,10),0)), names(round(dbms_random.value(1,10),0)),round(dbms_random.value(1,1000),0)); EXIT WHEN i > 999999; END LOOP; end; --
I added 1 million records to our table to make our comparison more practical. Now I can compare the queries by using EXPLAIN PLAN and the execution time for each query:
- DISTINCT Execution Plan:
- According to the EXPLAIN PLAN, we can see a full table scan.
- GROUP BY Execution Plan:
- Both explain the plan full table scan available. therefore, hope adding an index also improves the performance in these cases.
Now I can create an index for the table like below. But I hope to create a BITMAP index in this case. Because we have low cardinal data in both columns. You may know B-TREE index performs well on high cardinal columns like sequence numbers.
-- Create index CREATE BITMAP INDEX idx_order ON Orders(product_category, model_type); --
I run the same query for the execution plan now and you can realize the execution plan will improve. As you may know, once you include DISTINCT or GROUP BY in our queries oracle does the sorting then Index helps to do the sorting as well.
Performance Considerations:
- Data Distribution:
- Data load and distribution of data are the first things that change the performance. If there is low cardinality, then both DISTINCT and GROUP BY with aggregate function perform well.
- Indexes:
- If the table contains a huge amount of data indexes improve considerable time of query. When creating an index for a table you have to think about the behavior of data. If there are low cardinal data in a column, then the BITMAP index may perform well compared to the B-TREE index. Because the different type of indexes performs different type of data
- Query Complexity:
- Query complexity is the next consideration. Because as we described earlier small and single-column queries suit with DISTINCT. But for complex queries, a better option may be GROUP BY with aggregate functions.
- Cardinality:
- Low cardinality will perform both well but big cardinalities with multiple columns then performance reduce for both options.
Summary:
Let’s summarize the whole discussion into a few tips. This is not the final and the only thing for DISTINCT vs GROUP BY Performance Oracle. We must stick to a situation but generally, I hope to summarize things that help to write queries.
DISTINCT:
- Simple queries.
- Low cardinality column
- Distinct use with Single-column
GROUP BY:
- More complex queries
- Use with other aggregate functions like AVG, COUNT, or others.
Finally, I would like to suggest this. Performance can be varied with the below conditions. Query complexity, available indexes in the table, data volume, and server conditions. The first thing you can do is analyze both queries with EXPLAIN PLAN or any other tool. Then compare the queries with real data then you can decide the best option according to the case.