Saturday, February 20, 2016

Impala Hands On

Comments:
========

-- This line is a comment about a table.
/*
This is a multi-line comment about a query.
*/

Data Types Supported:
=====================

ARRAY Complex Type (CDH 5.5 or higher only)
BIGINT Data Type
BOOLEAN Data Type
CHAR Data Type (CDH 5.2 or higher only)
DECIMAL Data Type (CDH 5.1 or higher only)
DOUBLE Data Type
FLOAT Data Type
INT Data Type
MAP Complex Type (CDH 5.5 or higher only)
REAL Data Type
SMALLINT Data Type
STRING Data Type
STRUCT Complex Type (CDH 5.5 or higher only)
TIMESTAMP Data Type
TINYINT Data Type
VARCHAR Data Type (CDH 5.2 or higher only)
Complex Types (CDH 5.5 and higher only)


ARRAY Complex Type:
===================

CREATE TABLE array_demo
(
  id BIGINT,
  name STRING,
  pets ARRAY <STRING>,
  places_lived ARRAY < STRUCT <    place: STRING,    start_year: INT >>,
  marriages ARRAY < STRUCT <    spouse: STRING,    children: ARRAY <STRING>  >>,
  ancestors MAP < STRING, ARRAY <STRING> >
)
STORED AS PARQUET;

DESCRIBE array_demo;
DESCRIBE array_demo.pets;
DESCRIBE array_demo.marriages;
DESCRIBE array_demo.places_lived;
DESCRIBE array_demo.ancestors;

SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year FROM array_demo, array_demo.places_lived;
SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year  FROM array_demo, array_demo.places_lived;
SELECT id, name, pets.item FROM array_demo, array_demo.pets  WHERE pets.pos in (0, 1, 3);

SELECT id, name, pets.item FROM array_demo, array_demo.pets  WHERE pets.item LIKE 'Mr. %';

SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year  FROM array_demo, array_demo.places_lived WHERE places_lived.place like '%California%';

Restrictions:
=============

Columns with this data type can only be used in tables or partitions with the Parquet file format.
Columns with this data type cannot be used as partition key columns in a partitioned table.
The COMPUTE STATS statement does not produce any statistics for columns of this data type.



Complete Guide to Use Complex Types in Impala:
==============================================

http://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html#complex_types_examples_unique_1

Escaping special characters:
============================

http://www.cloudera.com/documentation/enterprise/latest/topics/impala_literals.html

SQL Operators
============

SQL operators are a class of comparison functions that are widely used within the WHERE clauses of SELECT statements.

Arithmetic Operators
====================
+ and -: Can be used either as unary or binary operators.
* and /: Multiplication and division respectively. Both arguments must be of numeric types.
%: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both arguments must be of one of the integer types.
&, |, ~, and ^: Bitwise operators that return the logical AND, logical OR, NOT, or logical XOR (exclusive OR) of their argument values


Complex_type Example:
======================

select
  r_name,
  count(r_nations.item.n_nationkey) as count,
  sum(r_nations.item.n_nationkey) as sum,
  avg(r_nations.item.n_nationkey) as average,
  min(r_nations.item.n_name) as minimum,
  max(r_nations.item.n_name) as maximum,
  ndv(r_nations.item.n_nationkey) as distinct_values
from
  region, region.r_nations as r_nations
group by r_name
order by r_name;