Sunday, November 6, 2016

Hive: Auto Increment column & Incrementing existing row_sequence

We already have a existed jar for this, no need to write our own udf.                                                                 Down load (hive-contrib-1.1.0.jar)jar in below link ,                                                                                       http://www.mvnrepository.com/artifact/org.apache.hive/hive-contrib/1.1.0                                                                                                                                                                                      hive> add jar /home/hadoop/Desktop/hive-contrib-1.1.0.jar;

hive> CREATE TEMPORARY FUNCTION row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
OK

CREATE TABLE IF NOT EXISTS users_inc
(
ID string,
name  string
) row format delimited fields terminated by ',' stored as textfile;

id ID value is int ,Then (ID int ...use this when creating table)
hive> insert into table users_inc
select m.max+row_sequence() as inc , ename
from (select ename from emp_csv) e
join
(select max(ID) as max from users_inc) m;

handling nulls when inserting table first time : (max val we get null when table is empty)
================
insert into table users_inc
select m.max+row_sequence() as inc , ename
from (select ename from emp_csv) e
join
(select coalesce(max(ID),0) as max from users_inc) m;

select * from users_inc;


===>for alpha numeric id increment value
insert into table users_inc
select concat("ABC-",m.max+row_sequence()) as inc , ename
from (select ename from emp_csv) e
join
(select coalesce(max(substr(ID,5)),0) as max from users_inc) m;

---->regexp_replace
insert into table users_inc
select regexp_replace(concat("ABC-",m.max+row_sequence()),'\\.0','') as inc, ename from (select ename from emp_csv) e
join
(select coalesce(max(substr(ID,5)),0) as max from users_inc) m;                                                                                                                                                                                                                                coalesce() : COALESCE(T v1, T v2, ...)
Returns the first v that is not NULL in list of values,                                                                             or   NULL if all v's are NULL.

1 comment:

  1. Bro Can you show the steps with Screenshots of hive sessions. It's not working for me.

    ReplyDelete