JAVA WORLD: Hive: Auto Increment column & Incrementing existing row

We already have a existed jar for this, no need to write our own udf. Down load (hive-contrib-1.1.0.jar)jar in below link , http://www.mvnrepository.com/artifact/org.apache.hive/hive-contrib/1.1.0 hive> add jar /home/hadoop/Desktop/hive-contrib-1.1.0.jar;

hive> CREATE TEMPORARY FUNCTION row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
OK

CREATE TABLE IF NOT EXISTS users_inc
(
ID string,
name string
) row format delimited fields terminated by ',' stored as textfile;

id ID value is int ,Then (ID int ...use this when creating table)
hive> insert into table users_inc
select m.max+row_sequence() as inc , ename
from (select ename from emp_csv) e
join
(select max(ID) as max from users_inc) m;

handling nulls when inserting table first time : (max val we get null when table is empty)
================
insert into table users_inc
select m.max+row_sequence() as inc , ename
from (select ename from emp_csv) e
join
(select coalesce(max(ID),0) as max from users_inc) m;

select * from users_inc;

===>for alpha numeric id increment value
insert into table users_inc
select concat("ABC-",m.max+row_sequence()) as inc , ename
from (select ename from emp_csv) e
join
(select coalesce(max(substr(ID,5)),0) as max from users_inc) m;

---->regexp_replace
insert into table users_inc
select regexp_replace(concat("ABC-",m.max+row_sequence()),'\\.0','') as inc, ename from (select ename from emp_csv) e
join
(select coalesce(max(substr(ID,5)),0) as max from users_inc) m; coalesce() : COALESCE(T v1, T v2, ...)
Returns the first v that is not NULL in list of values, or NULL if all v's are NULL.

JAVA WORLD

Sunday, November 6, 2016

Hive: Auto Increment column & Incrementing existing row_sequence

1 comment:

Blog Archive