Teradata – Hashing Algorithm

A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.

Following is a high level diagram on hashing algorithm.

Hashing Algorithm

Following are the steps to insert the data.

  • The client submits a query.
  • The parser receives the query and passes the PI value of the record to the hashing algorithm.
  • The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.
  • The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.
  • BYNET sends the data to the identified AMP.
  • AMP uses the 32 bit Row hash to locate the row within its disk.
  • If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.
  • The combination of Row hash and Uniqueness ID is called as Row ID.
  • Row ID prefixes each record in the disk.
  • Each table row in the AMP is logically sorted by their Row IDs.

How Tables are Stored

Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.

Row HashUniqueness IDEmployeeNoFirstNameLastName
2A01 26110000 0001101MikeJames
2A01 26120000 0001104AlexStuart
2A01 26130000 0001102RobertWilliams
2A01 26140000 0001105RobertJames
2A01 26150000 0001103PeterPaul

Leave a Reply