Teradata – Partitioned Primary Index

Partitioned Primary Index (PPI) is an indexing mechanism that is useful in improving the performance of certain queries. When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. When a table is defined with PPI, the rows are sorted by their partition number. Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined.

Advantages

  • Avoid full table scan for certain queries.
  • Avoid using secondary index that requires additional physical structure and additional I/O maintenance.
  • Access a subset of a large table quickly.
  • Drop the old data quickly and add new data.

Example

Consider the following Orders table with Primary Index on OrderNo.

StoreNoOrderNoOrderDateOrderTotal
10175012015-10-01900
10175022015-10-021,200
10275032015-10-023,000
10275042015-10-032,454
10175052015-10-031201
10375062015-10-042,454
10175072015-10-051201
10175082015-10-051201

Assume that the records are distributed between AMPs as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.AMP 1

RowHashOrderNoOrderDate
175052015-10-03
275042015-10-03
375012015-10-01
475082015-10-05

AMP 2

RowHashOrderNoOrderDate
175072015-10-05
275022015-10-02
375062015-10-04
475032015-10-02

If you run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. To avoid this, you can define the order date as Partitioned Primary Index. When rows are inserted into orders table, they are partitioned by the order date. Within each partition they will be ordered by their row hash.

The following data shows how the records will be stored in AMPs, if they are partitioned by Order Date. If a query is run to access the records by Order Date, then only the partition that contains the records for that particular order will be accessed.AMP 1

PartitionRowHashOrderNoOrderDate
0375012015-10-01
1175052015-10-03
1275042015-10-03
2475082015-10-05

AMP 2

PartitionRowHashOrderNoOrderDate
0275022015-10-02
0475032015-10-02
1375062015-10-04
2175072015-10-05

Following is an example to create a table with partition primary Index. PARTITION BY clause is used to define the partition.

CREATE SET TABLE Orders (
   StoreNo SMALLINT, 
   OrderNo INTEGER, 
   OrderDate DATE FORMAT 'YYYY-MM-DD', 
   OrderTotal INTEGER 
) 
PRIMARY INDEX(OrderNo) 
PARTITION BY RANGE_N  (
   OrderDate BETWEEN DATE '2010-01-01' AND '2016-12-31' EACH INTERVAL '1' DAY
);

In the above example, the table is partitioned by OrderDate column. There will be one separate partition for each day.

Leave a Reply