How to Change DISTINCT vs GROUP BY Performance Oracle

Asiri Gunasena

Published:

SQL

distinct vs group by performance oracle

Distinct vs Group By performance oracle can be changed due to several conditions. Generally, Both are Distinct and Group By clauses doing similar work. But then why do we have two? Yes, that is not related to performance. purposely both clauses work differently. I here try to explain distinct vs group by performance oracle.

Oracle Both DISTINCT and GROUP BY clauses return unique values from the result set. Performance can be different because of context of usability is different for each. Anyway, I here try to compare performance in both keywords and explain about intention of keywords.

DISTINCT : The DISTINCT a keyword returns unique values on a single or multiple columns without any aggregate functions.

-- Syntax
SELECT DISTINCT column_name FROM table; 
--

Performance Considerations:

  • Objectives: Distinct is well suited to simple queries. Most of the time DISTINCT is used for single columns. DISTINCT can be easily used for such cases. But it’s a bit hard to use in complex queries.
  • Performance: Performance is nice on simple cases as we described earlier. In complex cases, GROUP BY will perform well.

GROUP BY: The GROUP BY a clause is used to take unique values BUT with some aggregate values such as COUNT, SUM, AVG, etc.

-- Syntax
SELECT column_name, COUNT(*) FROM able GROUP BY column_name; 
--

Performance Considerations:

  • Aggregation: GROUP BY is suited to some complex queries such as taking some values using aggregate functions. This is used to calculate output in each group.
  • Multiple Columns: This is flexible compared to DISTINCT.
  • Performance: Aggregate function calculates efficiently when using GROUP BY clause. This is performed in complex queries compared to DISTINCT.

Performance Comparison:

I would like to do a comparison for a small example. I here take simple queries for distinct vs group by performance oracle. I create an ORDERS table with column product_category and model_type and I hope to retrieve distinct product categories and types.

-- Using DISTINCT 
SELECT DISTINCT product_category, model_type FROM Orders; 
Versus:
-- Using GROUP BY 
SELECT product_category, model_type, COUNT(*) FROM Orders GROUP BY product_category, model_type; 

In this small case, I like to take distinct values in the first query and take the unique values from the second query. But we need an aggregate function on the second query. I will create a sample table and add a sample record from the below query.

-- create table
CREATE TABLE Orders(
    order_id VARCHAR2(50) ,
    product_category DATE ,
    model_type DATE ,
    quantity NUMBER,
    PRIMARY KEY(order_id)
);
-- Insert sample records
declare 
  -- insert 100000 records into database
   i integer:=0;
   type namesarray IS VARRAY(10) OF VARCHAR2(50); 
   names namesarray; 
   names1 namesarray; 

begin
      names := namesarray('Type_1', 'Type_2', 'Type_3', 'Type_4', 'Type_5','Type_6','Type_7','Type_8','Type_9','Type_10');
names1 := namesarray('Cat_1', 'Cat_2', 'Cat_3', 'Cat_4', 'Cat_5','Cat_6','Cat_7','Cat_8','Cat_9','Cat_10');
LOOP
   i:= i+1;
   INSERT INTO Orders (order_id, product_category, model_type,quantity) 
   VALUES ('Po-'||i, names1(round(dbms_random.value(1,10),0)), 
   names(round(dbms_random.value(1,10),0)),round(dbms_random.value(1,1000),0));
   EXIT WHEN i > 999999;
END LOOP;
end;
--

I added 1 million records to our table to make our comparison more practical. Now I can compare the queries by using EXPLAIN PLAN and the execution time for each query:

  1. DISTINCT Execution Plan:
  • According to the EXPLAIN PLAN, we can see a full table scan.
distinct vs group by performance oracle
distinct vs group by performance oracle
  1. GROUP BY Execution Plan:
  • Both explain the plan full table scan available. therefore, hope adding an index also improves the performance in these cases.
distinct vs group by performance oracle
distinct vs group by performance oracle

Now I can create an index for the table like below. But I hope to create a BITMAP index in this case. Because we have low cardinal data in both columns. You may know B-TREE index performs well on high cardinal columns like sequence numbers.

-- Create index
CREATE BITMAP INDEX idx_order ON Orders(product_category, model_type);
--

I run the same query for the execution plan now and you can realize the execution plan will improve. As you may know, once you include DISTINCT or GROUP BY in our queries oracle does the sorting then Index helps to do the sorting as well.

distinct vs group by performance oracle

Performance Considerations:

  1. Data Distribution:
  • Data load and distribution of data are the first things that change the performance. If there is low cardinality, then both DISTINCT and GROUP BY with aggregate function perform well.
  1. Indexes:
  • If the table contains a huge amount of data indexes improve considerable time of query. When creating an index for a table you have to think about the behavior of data. If there are low cardinal data in a column, then the BITMAP index may perform well compared to the B-TREE index. Because the different type of indexes performs different type of data
  1. Query Complexity:
  • Query complexity is the next consideration. Because as we described earlier small and single-column queries suit with DISTINCT. But for complex queries, a better option may be GROUP BY with aggregate functions.
  1. Cardinality:
  • Low cardinality will perform both well but big cardinalities with multiple columns then performance reduce for both options.

Summary:

Let’s summarize the whole discussion into a few tips. This is not the final and the only thing for DISTINCT vs GROUP BY Performance Oracle. We must stick to a situation but generally, I hope to summarize things that help to write queries.

DISTINCT:

  • Simple queries.
  • Low cardinality column
  • Distinct use with Single-column

GROUP BY:

  • More complex queries
  • Use with other aggregate functions like AVG, COUNT, or others.

Finally, I would like to suggest this. Performance can be varied with the below conditions. Query complexity, available indexes in the table, data volume, and server conditions. The first thing you can do is analyze both queries with EXPLAIN PLAN or any other tool. Then compare the queries with real data then you can decide the best option according to the case.

Categories SQL
ennicode

Address: 89/1 Rabbegamuwa, Handessa, Kandy, Central, Sri Lanka

Email to: Primary: [email protected]

Services

E-Learning

Company Websites

Support and Configuration work

Banners, Covers, and Post

Web Development & Configurations

Content Writing and Marketing

Contact

Ennicode