Use case 4 – IN or EXISTS

If the query has IN or NOT IN operators, we recommend checking the query plan to confirm that the proper index is being used. If the proper index isn't being used and query performance is taking more time than expected, try to rewrite the query using the EXISTS or NOT EXISTS conditions.

Consider the following example, which uses NOT IN:


EXPLAIN ANALYZE SELECT 
  TEST_VEH.TEST_VEH_ID, 
  TEST_VEH.VEHICLE_ID, 
  TEST_VEH.SERVICEPROGRAM_ID, 
  TEST_VEH.STARTDATE, 
  TEST_VEH.ENDDATE, 
  TEST_VEH.OILTYPE_ID 
FROM PERF_USER.TEST_VEH TEST_VEH 
JOIN  PERF_USER.OILTYPE OT ON  OT.OILTYPE_ID =TEST_VEH.OILTYPE_ID  
JOIN  PERF_USER.SERVICEPROGRAM SP ON  SP.SERVICEPROGRAM_ID = TEST_VEH.SERVICEPROGRAM_ID
WHERE SP.PROGNAME  = '18FCE8FDAF365BB' 
  AND OT.OILTYPE_ID  =3
  AND TEST_VEH.ENDDATE  IS NOT NULL 
  AND TEST_VEH.TEST_VEH_ID NOT IN 
                      (SELECT TEST_VEH_ID
                              FROM PERF_USER.VEHICLESERVICEHISTORY 
                              WHERE  TEST_VEH_ID > 1 
                      );
-----------------------------------------------------------------------
"Nested Loop  (cost=1009.16..1188860356305.01 rows=1 width=76) (actual time=37299.891..37347.853 rows=0 loops=1)"
"  ->  Gather  (cost=1009.16..1188860356303.88 rows=1 width=76) (actual time=37299.890..37347.849 rows=0 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Hash Join  (cost=9.16..1188860355303.78 rows=1 width=76) (actual time=37286.742..37286.751 rows=0 loops=3)"
"              Hash Cond: ((test_veh.serviceprogram_id)::text = (sp.serviceprogram_id)::text)"
"              ->  Parallel Index Scan using test_veh_oiltype_id_ind on test_veh  (cost=0.56..1188860351273.04 rows=1072570 width=76) (actual time=37276.290..37276.292 rows=1 loops=3)"
"                    Index Cond: (oiltype_id = 3)"
"                    Filter: ((enddate IS NOT NULL) AND (NOT (SubPlan 1)))"
"                    Rows Removed by Filter: 0"
"                    SubPlan 1"
"                      ->  Materialize  (cost=0.00..1025071.31 rows=33333332 width=8) (actual time=0.418..23201.432 rows=25001498 loops=4)"
"                            ->  Seq Scan on vehicleservicehistory  (cost=0.00..728195.65 rows=33333332 width=8) (actual time=0.416..13249.975 rows=25001498 loops=4)"
"                                  Filter: (test_veh_id > 1)"
"              ->  Hash  (cost=8.58..8.58 rows=1 width=11) (actual time=9.045..9.046 rows=0 loops=3)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"                    ->  Index Scan using progname_id_ind on serviceprogram sp  (cost=0.56..8.58 rows=1 width=11) (actual time=9.043..9.044 rows=0 loops=3)"
"                          Index Cond: ((progname)::text = '18FCE8FDAF365BB'::text)"
"  ->  Seq Scan on oiltype ot  (cost=0.00..1.12 rows=1 width=8) (never executed)"
"        Filter: (oiltype_id = 3)"
"Planning Time: 37.696 ms"
"Execution Time: 37366.335 ms"

The query is taking more than 37 seconds 366 milliseconds to retrieve 4 million records.

The query plan states that a sequence scan is performed on the table used in the subquery vehicleservicehistory. The sequence scan is producing a large number of records. For each of those records in the subquery, the query is performing a full table scan, which is causing the performance issue.

To avoid the sequence scan on the subquery, rewrite the subquery to use a correlated subquery with NOT EXISTS. The correlated subquery will use an index scan and a reduced number of table scans:


EXPLAIN ANALYZE SELECT 
  TEST_VEH.TEST_VEH_ID, 
  TEST_VEH.VEHICLE_ID, 
  TEST_VEH.SERVICEPROGRAM_ID, 
  TEST_VEH.STARTDATE, 
  TEST_VEH.ENDDATE, 
  TEST_VEH.OILTYPE_ID 
FROM PERF_USER.TEST_VEH TEST_VEH 
JOIN  PERF_USER.OILTYPE OT ON  OT.OILTYPE_ID =TEST_VEH.OILTYPE_ID  
JOIN  PERF_USER.SERVICEPROGRAM SP ON  SP.SERVICEPROGRAM_ID = TEST_VEH.SERVICEPROGRAM_ID
WHERE SP.PROGNAME  = '18FCE8FDAF365BB' 
       AND OT.OILTYPE_ID  =3
       AND TEST_VEH.ENDDATE  IS NOT NULL  
       AND  NOT EXISTS 
                    (SELECT TEST_VEH_ID 
                            FROM PERF_USER.VEHICLESERVICEHISTORY 
                            WHERE TEST_VEH.TEST_VEH_ID=VEHICLESERVICEHISTORY.TEST_VEH_ID 
                            AND TEST_VEH_ID > 1 
                    );
---------------------------------------------------------------------
"Nested Loop Anti Join  (cost=1009.03..936146.10 rows=1 width=76) (actual time=12.693..12.810 rows=0 loops=1)"
"  ->  Nested Loop  (cost=1008.59..936141.78 rows=1 width=76) (actual time=12.692..12.809 rows=0 loops=1)"
"        ->  Gather  (cost=1008.59..936140.64 rows=1 width=76) (actual time=12.691..12.807 rows=0 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              ->  Hash Join  (cost=8.59..935140.54 rows=1 width=76) (actual time=0.773..0.774 rows=0 loops=3)"
"                    Hash Cond: ((test_veh.serviceprogram_id)::text = (sp.serviceprogram_id)::text)"
"                    ->  Parallel Seq Scan on test_veh  (cost=0.00..927087.67 rows=2145139 width=76) (actual time=0.672..0.672 rows=1 loops=3)"
"                          Filter: ((enddate IS NOT NULL) AND (oiltype_id = 3))"
"                          Rows Removed by Filter: 7"
"                    ->  Hash  (cost=8.58..8.58 rows=1 width=11) (actual time=0.040..0.040 rows=0 loops=3)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"                          ->  Index Scan using progname_id_ind on serviceprogram sp  (cost=0.56..8.58 rows=1 width=11) (actual time=0.039..0.040 rows=0 loops=3)"
"                                Index Cond: ((progname)::text = '18FCE8FDAF365BB'::text)"
"        ->  Seq Scan on oiltype ot  (cost=0.00..1.12 rows=1 width=8) (never executed)"
"              Filter: (oiltype_id = 3)"
"  ->  Index Only Scan using veh_ser_ind on vehicleservicehistory  (cost=0.44..4.32 rows=1 width=8) (never executed)"
"        Index Cond: ((test_veh_id = test_veh.test_veh_id) AND (test_veh_id > 1))"
"        Heap Fetches: 0"
"Planning Time: 11.115 ms"
"Execution Time: 12.871 ms"

After the modification, the query is taking less than 13 ms to process 4 million records

According to the query plan of the modified query, the table vehicleservicehistory can have an index scan. Using an index scan reduces the cost and the number of affected rows. This way, you can reduce the runtime of a query and increase its performance.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Function call in SELECT

Subqueries or CTEs