BigData Hive Project work
1) create a database
hive> create database RETAIL;
2) show the databases
hive> show databases;
OK
default
jan2020
retail
Time taken: 0.023 seconds, Fetched: 3 row(s)
3) use the Retail database
hive> use RETAIL;
4) show the tables in Retail database
hive> show tables;
OK
5) create a table
hive> create table TXNRECORDS(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING)
> row format delimited fields terminated by ',' stored as textfile;
6) load data into a table
hive> load data local inpath 'file:///home/cloudera/Desktop/Basha/Basha2019/HIVE_Practicals/txns1.txt' into table TXNRECORDS;
hive> show tables;
OK
txnrecords
Time taken: 0.021 seconds, Fetched: 1 row(s)
hive>
[cloudera@quickstart ~]$ hdfs dfs -ls /user/hive/warehouse/retail.db/txnrecords
Found 1 items
-rwxrwxrwx 1 cloudera supergroup 4418139 2020-02-01 09:51 /user/hive/warehouse/retail.db/txnrecords/txns1.txt
hive> select * from txnrecords limit 10;
OK
0 06-26-2011 4007024 40.33 Exercise & Fitness Cardio Machine Accessories Clarksville Tennessee credit
1 05-26-2011 4006742 198.44 Exercise & Fitness Weightlifting Gloves Long Beach California credit
2 06-01-2011 4009775 5.58 Exercise & Fitness Weightlifting Machine Accessories Anaheim California credit
3 06-05-2011 4002199 198.19 Gymnastics Gymnastics Rings Milwaukee Wisconsin credit
4 12-17-2011 4002613 98.81 Team Sports Field Hockey Nashville Tennessee credit
5 02-14-2011 4007591 193.63 Outdoor Recreation Camping & Backpacking & Hiking Chicago Illinois credit
6 10-28-2011 4002190 27.89 Puzzles Jigsaw Puzzles Charleston South Carolina credit
7 07-14-2011 4002964 96.01 Outdoor Play Equipment Sandboxes Columbus Ohio credit
8 01-17-2011 4007361 10.44 Winter Sports Snowmobiling Des Moines Iowa credit
9 05-17-2011 4004798 152.46 Jumping Bungee Jumping St. Petersburg Florida credit
Time taken: 0.076 seconds, Fetched: 10 row(s)
7) count the number of rows in the table.
- Hive will launch a Map_Reduce job and give the results at the end.
hive> select count(*) from txnrecords;
Query ID = cloudera_20200317115050_d45ae0eb-3662-4392-a4e2-f3e247368e52
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1579493137658_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1579493137658_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1579493137658_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-03-17 11:51:00,725 Stage-1 map = 0%, reduce = 0%
2020-03-17 11:51:15,508 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.99 sec
2020-03-17 11:51:28,713 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.69 sec
MapReduce Total cumulative CPU time: 3 seconds 690 msec
Ended Job = job_1579493137658_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.69 sec HDFS Read: 4426362 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 690 msec
OK
50000
Time taken: 55.465 seconds, Fetched: 1 row(s)
8) Create a table for Customer Data
hive> create table customer(custno string, firstname string, lastname string, age int, profession string)
> row format delimited fields terminated by ',';
hive> load data local inpath 'file:///home/cloudera/Desktop/Basha/Basha2019/HIVE_Practicals/custs.txt' into table customer;
Loading data to table retail.customer
Table retail.customer stats: [numFiles=1, totalSize=391355]
OK
Time taken: 0.308 seconds
hive> select * from customer limit 10;
OK
4000001 Kristina Chung 55 Pilot
4000002 Paige Chen 74 Teacher
4000003 Sherri Melton 34 Firefighter
4000004 Gretchen Hill 66 Computer hardware engineer
4000005 Karen Puckett 74 Lawyer
4000006 Patrick Song 42 Veterinarian
4000007 Elsie Hamilton 43 Pilot
4000008 Hazel Bender 63 Carpenter
4000009 Malcolm Wagner 39 Artist
4000010 Dolores McLaughlin 60 Writer
Time taken: 0.065 seconds, Fetched: 10 row(s)
hive> select count(*) from customer;
Query ID = cloudera_20200317120808_7ae1124b-a05f-43fa-bbbf-42cee5f14e71
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1579493137658_0002, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1579493137658_0002/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1579493137658_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-03-17 12:08:12,143 Stage-1 map = 0%, reduce = 0%
2020-03-17 12:08:22,200 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.66 sec
2020-03-17 12:08:32,024 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec
MapReduce Total cumulative CPU time: 3 seconds 300 msec
Ended Job = job_1579493137658_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.3 sec HDFS Read: 398974 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 300 msec
OK
9999
Time taken: 32.407 seconds, Fetched: 1 row(s)
9) create a join table
hive> create table out1(custno string, firstname string, age int, profession string, amount double, product string)
> row format delimited fields terminated by ',';
OK
Time taken: 0.106 seconds
hive> insert overwrite table out1
> select a.custno,a.firstname,a.age,a.profession,b.amount,b.product
> from customer a JOIN txnrecords b ON a.custno=b.custno;
Query ID = cloudera_20200412100909_d6678e43-b571-4cdc-ad33-0ae36178bdfd
Total jobs = 1
> select a.custno,a.firstname,a.age,a.profession,b.amount,b.product
> from customer a JOIN txnrecords b ON a.custno=b.custno;
Query ID = cloudera_20200412100909_d6678e43-b571-4cdc-ad33-0ae36178bdfd
Total jobs = 1
hive> select * from out1 limit 10;
OK
4007024 Cameron 59 Actor 40.33 Cardio Machine Accessories
4006742 Gregory 36 Accountant 198.44 Weightlifting Gloves
4009775 Ruby 44 Designer 5.58 Weightlifting Machine Accessories
4002199 Keith 44 Police officer 198.19 Gymnastics Rings
4002613 Hugh 43 Engineering technician 98.81 Field Hockey
4007591 Jennifer 54 Electrician 193.63 Camping & Backpacking & Hiking
4002190 Sheryl 62 Designer 27.89 Jigsaw Puzzles
4002964 Ken 67 Recreation and fitness worker 96.01 Sandboxes
4007361 Terri 52 Loan officer 10.44 Snowmobiling
4004798 Geoffrey 65 Chemist 152.46 Bungee Jumping
Time taken: 0.08 seconds, Fetched: 10 row(s)
Displaying the header of the records:
hive> set hive.cli.print.header=true;
hive> select * from out1 limit 10;
OK
out1.custno out1.firstname out1.age out1.profession out1.amount out1.product
4007024 Cameron 59 Actor 40.33 Cardio Machine Accessories
4006742 Gregory 36 Accountant 198.44 Weightlifting Gloves
4009775 Ruby 44 Designer 5.58 Weightlifting Machine Accessories
4002199 Keith 44 Police officer 198.19 Gymnastics Rings
4002613 Hugh 43 Engineering technician 98.81 Field Hockey
4007591 Jennifer 54 Electrician 193.63 Camping & Backpacking & Hiking
4002190 Sheryl 62 Designer 27.89 Jigsaw Puzzles
4002964 Ken 67 Recreation and fitness worker 96.01 Sandboxes
4007361 Terri 52 Loan officer 10.44 Snowmobiling
4004798 Geoffrey 65 Chemist 152.46 Bungee Jumping
Time taken: 0.085 seconds, Fetched: 10 row(s)
OK
4007024 Cameron 59 Actor 40.33 Cardio Machine Accessories
4006742 Gregory 36 Accountant 198.44 Weightlifting Gloves
4009775 Ruby 44 Designer 5.58 Weightlifting Machine Accessories
4002199 Keith 44 Police officer 198.19 Gymnastics Rings
4002613 Hugh 43 Engineering technician 98.81 Field Hockey
4007591 Jennifer 54 Electrician 193.63 Camping & Backpacking & Hiking
4002190 Sheryl 62 Designer 27.89 Jigsaw Puzzles
4002964 Ken 67 Recreation and fitness worker 96.01 Sandboxes
4007361 Terri 52 Loan officer 10.44 Snowmobiling
4004798 Geoffrey 65 Chemist 152.46 Bungee Jumping
Time taken: 0.08 seconds, Fetched: 10 row(s)
hive> set hive.cli.print.header=true;
hive> select * from out1 limit 10;
OK
out1.custno out1.firstname out1.age out1.profession out1.amount out1.product
4007024 Cameron 59 Actor 40.33 Cardio Machine Accessories
4006742 Gregory 36 Accountant 198.44 Weightlifting Gloves
4009775 Ruby 44 Designer 5.58 Weightlifting Machine Accessories
4002199 Keith 44 Police officer 198.19 Gymnastics Rings
4002613 Hugh 43 Engineering technician 98.81 Field Hockey
4007591 Jennifer 54 Electrician 193.63 Camping & Backpacking & Hiking
4002190 Sheryl 62 Designer 27.89 Jigsaw Puzzles
4002964 Ken 67 Recreation and fitness worker 96.01 Sandboxes
4007361 Terri 52 Loan officer 10.44 Snowmobiling
4004798 Geoffrey 65 Chemist 152.46 Bungee Jumping
Time taken: 0.085 seconds, Fetched: 10 row(s)
10) create out2 table
hive> create table out2 (custno int, firstname string, age int, profession string, amount double, product string, level string)
> row format delimited fields terminated by ',';
Categorize based on Age :
hive> insert overwrite table out2
> select *, case
> when age < 30 then 'low'
> when age >= 30 and age < 50 then 'middle'
> when age >= 50 then 'old'
> else 'others'
> end
> from out1;
hive> select * from out2 limit 10;
OK
out2.custno out2.firstname out2.age out2.profession out2.amount out2.product out2.level
4007024 Cameron 59 Actor 40.33 Cardio Machine Accessories old
4006742 Gregory 36 Accountant 198.44 Weightlifting Gloves middle
4009775 Ruby 44 Designer 5.58 Weightlifting Machine Accessories middle
4002199 Keith 44 Police officer 198.19 Gymnastics Rings middle
4002613 Hugh 43 Engineering technician 98.81 Field Hockey middle
4007591 Jennifer 54 Electrician 193.63 Camping & Backpacking & Hiking old
4002190 Sheryl 62 Designer 27.89 Jigsaw Puzzles old
4002964 Ken 67 Recreation and fitness worker 96.01 Sandboxes old
4007361 Terri 52 Loan officer 10.44 Snowmobiling old
4004798 Geoffrey 65 Chemist 152.46 Bungee Jumping old
Time taken: 0.112 seconds, Fetched: 10 row(s)
11) create a final table for displaying the results.
hive> create table out3 (level string, amount double)
> row format delimited fields terminated by ',';
hive> insert overwrite table out3
> select level,sum(amount) from out2 group by level;
hive> select * from out3;
OK
out3.level out3.amount
low 725221.3399999988
middle 1855861.669999996
old 2529100.310000011
*********************************************************