Apache Presto – Home

Apache Presto is an open source distributed SQL engine. Presto originated at Facebook for data analytics needs and later was open sourced. Now, Teradata joins Presto community and offers support.

Apache Presto is very useful for performing queries even petabytes of data. Extensible architecture and storage plugin interfaces are very easy to interact with other file systems. Most of today’s best industrial companies are adopting Presto for its interactive speeds and low latency performance.

This tutorial explores Presto architecture, configuration, and storage plugins. It discusses the basic and advanced queries and finally concludes with real-time examples.

Audience

This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics. This tutorial will give you enough understanding on Apache Presto.

Prerequisites

Before proceeding with this tutorial, you must have a good understanding of Core Java, DBMS and any of the Linux operating systems.

Overview

Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies.

For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data.

What is Apache Presto?

Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes.

A single Presto query can process data from multiple sources like HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies like Airbnb, DropBox, Groupon, Netflix are adopting it.

Presto − Features

Presto contains the following features −

  • Simple and extensible architecture.
  • Pluggable connectors – Presto supports pluggable connector to provide metadata and data for queries.
  • Pipelined executions – Avoids unnecessary I/O latency overhead.
  • User-defined functions – Analysts can create custom user-defined functions to migrate easily.
  • Vectorized columnar processing.

Presto − Benefits

Here is a list of benefits that Apache Presto offers −

  • Specialized SQL operations
  • Easy to install and debug
  • Simple storage abstraction
  • Quickly scales petabytes data with low latency

Presto − Applications

Presto supports most of today’s best industrial applications. Let’s take a look at some of the notable applications.

  • Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data.
  • Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs.
  • Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology.

Why Presto?

Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems.

Presto runs on multiple Hadoop distributions. In addition, Presto can reach out from a Hadoop platform to query Cassandra, relational databases, or other data stores. This cross-platform analytic capability allows Presto users to extract maximum business value from gigabytes to petabytes of data.

Leave a Reply