Hive – The Analytics Journal

Hive: How to drop partitions by range

Posted on February 15, 2019 by admin

This is the general syntax for the drop partition syntax in Apache Hive:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE;

So the syntax to drop a range of partitions in a table that uses year as a partitioning column:

ALTER TABLE mytable DROP [IF EXISTS] PARTITION (year>2019) PURGE;

What is the Hive Metastore URI address? hive-site.xml? hive config resources?

Posted on July 14, 2018July 17, 2018 by admin

In configuring an Apache NiFi Data Flow (within Hortonworks Dataflow) I ran in to the need to configure the Hive Streaming component to connect to a Hive Table, this personal knowledge base article documents the the locations of the resources I needed.

What is my Hive Metastore URI?

This is located on your Hive Metastore host at port 9083 and uses the Thrift protocol, an example URI would look like this:

thrift://<host_name>:9083

Where is my hive-site.xml file located? What should I enter under Hive Config Resources?

When configuring Apache NiFi to connect to a Hive table using Hive Streaming you will need to enter the location of your hive-site.xml file under Hive config resources. Below you can see the location in my hadoop node, to find the location in your installation look under directory /etc/hive the script below can help you with this:

#find the Hive folder
cd /etc/hive
#run a search for the hive-site.xml file, starting at the current location
find . -name hive-site.xml

#in my case after examining the results from the command the file is located at:

/etc/hive/2.6.5.0-292/0/hive-site.xml

Hadoop Ecosystem: Hive – the Data Warehouse and SQL interface

Posted on January 8, 2018July 17, 2018 by admin

The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Hive is both a metadata layer on top of HDFS and a SQL interpreter. This allows companies to store structured or semi-structured data as files on Hadoop without a large initial data modeling effort, once business requirements align with the need to extract new insights from the stored data a development team can leverage the “schema on read” paradigm to create metadata about these files.

Having a SQL interpreter allows business analysts and power users to have access to terabytes or petabytes of information through a familiar query language. This is a dramatic departure from MapReduce where a very specialized skill set would be required to write multiple Map and Reduce functions in order to achieve the same results.

Fresh news on big data, data science and cutting edge analytics!

Tag: Hive

Hive: How to drop partitions by range

What is the Hive Metastore URI address? hive-site.xml? hive config resources?

What is my Hive Metastore URI?

Where is my hive-site.xml file located? What should I enter under Hive Config Resources?

Hadoop Ecosystem: Hive – the Data Warehouse and SQL interface