Hive – Built-in Functions

  • Post author:
  • Post category:Hive
  • Post comments:0 Comments

In this section we will explains the built-in functions available in Hive. The functions look quite similar to SQL functions, except for their usage.

Built-In Functions

Hive supports the following built-in functions:

Return TypeSignatureDescription
BIGINTround(double a)It returns the rounded BIGINT value of the double.
BIGINTfloor(double a)It returns the maximum BIGINT value that is equal or less than the double.
BIGINTceil(double a)It returns the minimum BIGINT value that is equal or greater than the double.
doublerand(), rand(int seed)It returns a random number that changes from row to row.
stringconcat(string A, string B,…)It returns the string resulting from concatenating B after A.
stringsubstr(string A, int start)It returns the substring of A starting from start position till the end of string A.
stringsubstr(string A, int start, int length)It returns the substring of A starting from start position with the given length.
stringupper(string A)It returns the string resulting from converting all characters of A to upper case.
stringucase(string A)Same as above.
stringlower(string A)It returns the string resulting from converting all characters of B to lower case.
stringlcase(string A)Same as above.
stringtrim(string A)It returns the string resulting from trimming spaces from both ends of A.
stringltrim(string A)It returns the string resulting from trimming spaces from the beginning (left hand side) of A.
stringrtrim(string A)rtrim(string A) It returns the string resulting from trimming spaces from the end (right hand side) of A.
stringregexp_replace(string A, string B, string C)It returns the string resulting from replacing all substrings in B that match the Java regular expression syntax with C.
intsize(Map<K.V>)It returns the number of elements in the map type.
intsize(Array<T>)It returns the number of elements in the array type.
value of <type>cast(<expr> as <type>)It converts the results of the expression expr to <type> e.g. cast(‘1’ as BIGINT) converts the string ‘1’ to it integral representation. A NULL is returned if the conversion does not succeed.
stringfrom_unixtime(int unixtime)convert the number of seconds from Unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of “1970-01-01 00:00:00”
stringto_date(string timestamp)It returns the date part of a timestamp string: to_date(“1970-01-01 00:00:00”) = “1970-01-01”
intyear(string date)It returns the year part of a date or a timestamp string: year(“1970-01-01 00:00:00”) = 1970, year(“1970-01-01”) = 1970
intmonth(string date)It returns the month part of a date or a timestamp string: month(“1970-11-01 00:00:00”) = 11, month(“1970-11-01”) = 11
intday(string date)It returns the day part of a date or a timestamp string: day(“1970-11-01 00:00:00”) = 1, day(“1970-11-01”) = 1
stringget_json_object(string json_string, string path)It extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It returns NULL if the input json string is invalid.

Example

The following queries demonstrate some built-in functions:

round() function

hive> SELECT round(2.6) from temp;

On successful execution of query, you get to see the following response:

3.0

floor() function

hive> SELECT floor(2.6) from temp;

On successful execution of the query, you get to see the following response:

2.0

ceil() function

hive> SELECT ceil(2.6) from temp;

On successful execution of the query, you get to see the following response:

3.0

Aggregate Functions

Hive supports the following built-in aggregate functions. The usage of these functions is as same as the SQL aggregate functions.

Return TypeSignatureDescription
BIGINTcount(*), count(expr),count(*) – Returns the total number of retrieved rows.
DOUBLEsum(col), sum(DISTINCT col)It returns the sum of the elements in the group or the sum of the distinct values of the column in the group.
DOUBLEavg(col), avg(DISTINCT col)It returns the average of the elements in the group or the average of the distinct values of the column in the group.
DOUBLEmin(col)It returns the minimum value of the column in the group.
DOUBLEmax(col)It returns the maximum value of the column in the group.

Leave a Reply