解宝明2017/03/13         
本文将本地的Hadoop日志,加载到Hive数据仓库中,再过滤日志中有用的日志信息转存到MySQL数据库里。
环境:hive-0.12.0+Hadoop1.2.1
1、日志格式
2014-04-17 22:53:30,621 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_4856124673669777083 to 127.0.0.1:50010 2014-04-17 22:53:30,621 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_3952951948255939678 to 127.0.0.1:50010 2014-04-17 22:53:30,629 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_-7659295115271387204 to 127.0.0.1:50010 2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_1247712269990837159 to 127.0.0.1:50010 2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_3041853621376201540 to 127.0.0.1:50010 2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_5165085626508844330 to 127.0.0.1:50010 2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_5058515618805863349 to 127.0.0.1:50010
表结构定义:
//建立Hive表,用来存储日志信息 HiveUtil.createTable("create table if not exists loginfo11 ( rdate String,time ARRAY<string>,type STRING,relateclass STRING, information1 STRING,information2 STRING,information3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'");
2、代码:
GetConnect.Java类负责建立与Hive、Mysql的连接与关闭;
//package com.my.hivetest; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class getConnect { private static Connection conn = null; private static Connection conntomysql = null; private getConnect() { } public static Connection getHiveConn() throws SQLException { if (conn == null) { try { Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } conn = DriverManager.getConnection( "jdbc:hive://localhost:50031/default", "", ""); System.out.println(1111); } return conn; } public static Connection getMysqlConn() throws SQLException { if (conntomysql == null) { try { Class.forName("com.mysql.jdbc.Driver"); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } conntomysql = DriverManager.getConnection( "jdbc:mysql://localhost:3306/hadoop?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=GBK", "root", "123456"); System.out.println(1111); } return conntomysql; } public static void closeHive() throws SQLException { if (conn != null) conn.close(); } public static void closemysql() throws SQLException { if (conntomysql != null) conntomysql.close(); } }
exeHiveQL.java类,执行类,实现main函数。
//package com.my.hivetest; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class exeHiveQL { public static void main(String[] args) throws SQLException { if (args.length < 2) { System.out.print("请输入查询条件: 日志级别 日期"); System.exit(1); } String type = args[0]; String date = args[1]; //在hive中创建表 HiveUtil.createTable( "create table if not exists loginfo11 ( rdate String,time ARRAY<string>,type STRING, relateclass STRING,information1 STRING,information2 STRING, information3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'"); //加载hadoop日志 HiveUtil.loadDate("load data local inpath '/root/hadoop-1.2.1/logs/*.log.*' overwrite into table loginfo11"); //查询有用的信息 //test code String str = "select rdate,time[0],type,relateclass,information1,information2,information3 from loginfo11 where type='" + type + "' and rdate='" + date + "' "; System.out.println(str + "----test"); ResultSet res1 = HiveUtil.queryHive( "select rdate,time[0],type,relateclass,information1, information2,information3 from loginfo11 where type='"+ type + "' and rdate='" + date + "' "); //查询结果转存到mysql中 HiveUtil.hiveTomysql(res1); //关闭hive连接 getConnect.closeHive(); //关闭mysql连接 getConnect.closemysql(); } }
在运行之前需要开启hiveserver服务,这里端口号50031要与GetConnect.Java类的一致。
# bin/hive --service hiveserver -p 50031然后在eclipse中运行起来,设置输入参数
ERROR2014-04-14不过在运行之前还需要导入各种包: