Hive is a data warehousing and SQL-like query language component of Apache Hadoop. It provides a way to manage and query structured data stored in the Hadoop Distributed File System (HDFS). Hive allows you to perform ad-hoc queries, summarize data, and perform large scale data analysis.
In Hive, data is stored in tables, which can be thought of as similar to tables in a relational database. There are two types of tables in Hive:
- Managed Tables: Managed tables are the regular tables in Hive where the data and the metadata information is stored within the Hive metastore. When a managed table is dropped, both the data and the metadata information will be deleted.
- External Tables: External tables are similar to managed tables, but the data is stored outside of the Hive metastore. When an external table is dropped, only the metadata information will be deleted and the underlying data remains intact. External tables are useful when you want to keep the data in its original location and access it using Hive.
Here is an example of how to create a managed table in Hive:
-- Create a managed table CREATE TABLE sales ( item_id INT, sale_date STRING, sale_amount FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
In this example, we are creating a managed table called “sales” with three columns: item_id, sale_date, and sale_amount. The ROW FORMAT DELIMITED clause specifies that the data in the table is stored as plain text, with fields separated by commas. The STORED AS TEXTFILE clause specifies that the table data will be stored as text files in HDFS.
Similarly, here is an example of how to create an external table in Hive:
-- Create an external table CREATE EXTERNAL TABLE sales_ext ( item_id INT, sale_date STRING, sale_amount FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hive/sales_ext';
In this example, we are creating an external table called “sales_ext” with the same structure as the “sales” table. The LOCATION clause specifies the location of the data in HDFS. The external table “sales_ext” will access the data stored in the specified location and use the Hive metastore to manage the metadata information.