Hive Interview Questions, Hive Interview Questions and Answers - LearnHowToCode SarkariResult.com Interview Questions and Answers LearnHowToCodeOnline
Hive Interview Questions

Hive Interview Questions, Hive Interview Questions and Answers

Q1: What kind of applications is supported by Apache Hive?
A1: Hive supports all those client applications that are written in Java, PHP, Python, C++ or Ruby by exposing its Thrift server.
Q2: Define the difference between Hive and HBase?
A2: The key differences between Apache Hive and HBase are as follows:
  • The Hive is a data warehousing infrastructure whereas HBase is a NoSQL database on top of Hadoop.
  • Apache Hive queries are executed as MapReduce jobs internally whereas HBase operations run in a real-time on its database rather than MapReduce.
Q3: Where does the data of a Hive table gets stored?
A3: By default, the Hive table is stored in an HDFS directory – /user/hive/warehouse. One can change it by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml.
Q4: What is a metastore in Hive?
A4: Metastore in Hive stores the meta data information using RDBMS and an open source ORM (Object Relational Model) layer called Data Nucleus which converts the object representation into relational schema and vice versa.
Q5: Why Hive does not store metadata information in HDFS?
A5: Hive stores metadata information in the metastore using RDBMS instead of HDFS. The reason for choosing RDBMS is to achieve low latency as HDFS read/write operations are time consuming processes.
Q6: What is the difference between local and remote metastore?
A6: Local Metastore:
In local metastore configuration, the metastore service runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM, either on the same machine or on a remote machine.
Remote Metastore:
In the remote metastore configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM. Other processes communicate with the metastore server using Thrift Network APIs. You can have one or more metastore servers in this case to provide more availability.
Q7: What is the default database provided by Apache Hive for metastore?
A7: By default, Hive provides an embedded Derby database instance backed by the local disk for the metastore. This is called the embedded metastore configuration.
Q8: Is it possible to change the default location of a managed table?
A8: Yes, it is possible to change the default location of a managed table. It can be achieved by using the clause – LOCATION ‘’.
Q9: When should we use SORT BY instead of ORDER BY?
A9: We should use SORT BY instead of ORDER BY when we have to sort huge datasets because SORT BY clause sorts the data using multiple reducers whereas ORDER BY sorts all of the data together using a single reducer. Therefore, using ORDER BY against a large number of inputs will take a lot of time to execute.
Q10: What is a partition in Hive?
A10: Hive organizes tables into partitions for grouping similar type of data together based on a column or partition key. Each Table can have one or more partition keys to identify a particular partition. Physically, a partition is nothing but a sub-directory in the table directory.
Q11: Why do we perform partitioning in Hive?
A11: Partitioning provides granularity in a Hive table and therefore, reduces the query latency by scanning onlyrelevant partitioned data instead of the whole data set.
For example, we can partition a transaction log of an e – commerce website based on month like Jan, February, etc. So, any analytics regarding a particular month, say Jan, will have to scan the Jan partition (sub – directory) only instead of the whole table data.
Q12: What is dynamic partitioning and when is it used?
A12: In dynamic partitioning values for partition columns are known in the runtime, i.e. It is known during loading of the data into a Hive table.
One may use dynamic partition in following two cases:
  • Loading data from an existing non-partitioned table to improve the sampling and therefore, decrease the query latency.
  • When one does not know all the values of the partitions before hand and therefore, finding these partition values manually from a huge data sets is a tedious task.
Q13: How to change the column data type in Hive? Explain RLIKE in Hive.
A13: We can change the column data type by using ALTER and CHANGE.
The syntax is :
ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;
Example: If we want to change the data type of the salary column from integer to bigint in the employee table.
ALTER TABLE employee CHANGE salary salary BIGINT;RLIKE: Its full form is Right-Like and it is a special function in the Hive. It helps to examine the two substrings. i.e, if the substring of A matches with B then it evaluates to true.
Example:
‘Intellipaat’ RLIKE ‘tell’  True
‘Intellipaat’ RLIKE ‘^I.*’  True (this is a regular expression)
Q14: What are the components used in Hive query processor?
A14: The components of a Hive query processor include
Logical Plan of Generation.
Physical Plan of Generation.
Execution Engine.
Operators.
UDF’s and UDAF’s.
Optimizer.
Parser.
Semantic Analyzer.
Type Checking
Q15: What is Buckets in Hive?
A15: The present data is partitioned and divided into different Buckets. This data is divided on the basis of Hash of the particular table columns.
Q16: Explain process to access sub directories recursively in Hive queries.
A16: By using below commands we can access sub directories recursively in Hive
hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;
Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/
Q17: How to skip header rows from a table in Hive?
A17: Header records in log files
System=….
Version=…
Sub-version=….
In the above three lines of headers that we do not want to include in our Hive query. To skip header lines from our tables in the Hive,set a table property that will allow us to skip the header lines.
CREATE EXTERNAL TABLE employee (
name STRING,
job STRING,
dob STRING,
id INT,
salary INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
LOCATION ‘/user/data’
TBLPROPERTIES(“skip.header.line.count”=”2”);
Q18: What is the maximum size of string data type supported by hive? Mention the Hive support binary formats.
A18: The maximum size of string data type supported by hive is 2 GB.
Hive supports the text file format by default and it supports the binary format Sequence files, ORC files, Avro Data files, Parquet files.

About Mariano

I'm Ethan Mariano a software engineer by profession and reader/writter by passion.I have good understanding and knowledge of AngularJS, Database, javascript, web development, digital marketing and exploring other technologies related to Software development.

0 comments:

Featured post

Political Full Forms List

Acronym Full Form MLA Member of Legislative Assembly RSS Really Simple Syndication, Rashtriya Swayamsevak Sangh UNESCO United Nations E...

Powered by Blogger.