本文主要是介绍hive输出格式转化,本例以json为例,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
hive默认是以tab为分隔符,分隔各个输出字段,如
-
hive> select get_json_object(json.value,'$.hour'),get_json_object(json.value,"$.channel") from json limit 10; -
Total MapReduce jobs = 1 -
Launching Job 1 out of 1 -
Number of reduce tasks is set to 0 since there's no reduce operator -
Starting Job = job_201304271626_0032, Tracking URL = http://linjianke:50030/jobdetails.jsp?jobid=job_201304271626_0032 -
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=192.168.10.44:8021 -kill job_201304271626_0032 -
2013-05-15 10:07:11,102 Stage-1 map = 0%, reduce = 0% -
2013-05-15 10:07:15,116 Stage-1 map = 100%, reduce = 0% -
2013-05-15 10:07:17,125 Stage-1 map = 100%, reduce = 100% -
Ended Job = job_201304271626_0032 -
OK -
2013-04-07 16 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-07 22 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-08 00 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-08 01 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-09 01 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-09 07 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-09 16 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-09 17 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-09 21 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
2013-04-10 01 f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8 -
Time taken: 9.531 seconds
因为业务需要,蒋输出转为json格式,可以用hive提供的transform函数
hive> select transform(get_json_object(json.value,'$.hour'),get_json_object(json.value,"$.channel")) using '/usr/bin/python transform.py aa bb' as (result string) from json limit 10;
输出为
-
{"aa":"2013-04-07 16","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-07 22","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-08 00","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-08 01","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-09 01","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-09 07","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-09 16","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-09 17","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-09 21","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"} -
{"aa":"2013-04-10 01","bb":"f32d98f8-8ac5-11e2-8a47-bc305bf4bcb8"}
python脚本:对python还不大熟悉,写的可能比较搓
-
#!/usr/bin/python -
#coding:utf8 -
import sys -
for line in sys.stdin: -
if len(line) == 0: -
#print 'end' -
break -
if line.count('\n') == len(line): -
#print 'continue' -
continue -
line=line.strip('\n') -
arr=line.split('\t') -
if len(arr) != len(sys.argv) - 1: -
print 'error' -
break -
for i in range(0,len(arr)): -
arr[i]='"%s":"%s"' % (sys.argv[i+1],arr[i]) -
content = ','.join(arr) -
print '{%s}' % content
这篇关于hive输出格式转化,本例以json为例的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!