【教程】大数据平台Hortonworks对接OBSFileSystem操作指南
1 背景介绍
Hortonworks公司,由Yahoo和Benchmark Capital于2011年7月联合创建,出身于名门Yahoo,Hortonworks拥有着许多Hadoop架构师和源代码贡献者,这些源代码贡献者以前均效力于Yahoo,而且已经为Apache Hadoop项目贡献了超过80%的源代码。
Hortonworks 作为Apache Hadoop2.0社区的开拓者,构建了一套自己的Hadoop生态圈,包括存储数据的HDFS,资源管理框架YARN,计算模型MAPREDUCE、TEZ等,服务于数据平台的PIG、HIVE&HCATALOG、HBASE,HDFS存储的数据通过FLUME和SQOOP导入导出,集群监控AMBARI、数据生命周期管理FALCON、作业调度系统OOZIE等。
为支持HDP大数据平台使用华为云对象存储OBS进行数据存储和读写,华为云OBS推出大数据组件OBSFileSystem进行对接。
本操作指导书旨在帮助华为云用户在HDP平台上快速对接OBSFileSystem组件,更好的使用华为云对象存储OBS。
2 部署视图
2.1 安装版本
硬件:1master+3core(配置:8U32G,操作系统:Centos7.5)
软件:Ambari:2.7.1.0,HDP:3.0.1.0
2.2 部署视图
3 Hortonworks平台对接OBS操作步骤
1.1 更新OBSFileSystem操作步骤
1.1.1 上传obs的jar包
1、从网址https://bbs.huaweicloud.cn/forum/thread-12142-1-1.html 中下载OBSFileSystem后进行解压缩,其中Package目录中包含obs所需要的jar包,列表如下:
2、将obs所需要的jar包放在/mnt/obsjar里面
1.1.2 增加hadoop-huaweicloud的jar包
1、将hadoop-huaweicloud的jar包放到如下目录中
命令:
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/share/hst/activity-explorer/lib/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/spark2/jars/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/tez/lib/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.
ln -s /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-huaweicloud.jar
1.1.3 增加esdk-obs-java的jar包
1、将esdk-obs-java的jar包放到如下目录
命令:
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/share/hst/activity-explorer/lib/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/spark2/jars/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/tez/lib/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.
1.1.4 更换okio的jar包
1、查找okio*的jar包,记录jar包路径,备份jar包
命令:
find / -name okio*
cp /usr/hdp/3.0.1.0-187/hadoop/client/okio-1.6.0.jar /mnt/oldjar/.
cp /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar /mnt/oldjar/.
2、删除所有旧的okio*的jar包,删除后请再次find确认删除完全
命令:
rm -rf /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/okio-1.6.0.jar
rm -rf /usr/hdp/3.0.1.0-187/livy2/jars/okio-1.6.0.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okio.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okio-1.6.0.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/okio-1.6.0.jar
rm -rf /usr/hdp/3.0.1.0-187/spark2/jars/okio-1.6.0.jar
rm -rf /usr/hdp/3.0.1.0-187/hbase/lib/okio-1.6.0.jar
rm -rf /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar
rm -rf /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/okio-1.4.0.jar
rm -rf /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar
3、将新的okio的jar包放到3.1.4步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中
命令:
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/livy2/jars/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hadoop/client/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/spark2/jars/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hbase/lib/.
cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.
1.1.5 更换okhttp的jar包
1、查找okhttp*的jar包,记录jar包路径,备份jar包
命令:
find / -name okhttp*
cp /usr/hdp/3.0.1.0-187/hadoop/client/okhttp-2.7.5.jar /mnt/oldjar/.
cp /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar /mnt/oldjar/.
2、删除所有旧的okhttp*的jar包,删除后请再次find确认删除完全
命令:
rm -rf /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/okhttp-2.7.5.jar
rm -rf /usr/hdp/3.0.1.0-187/livy2/jars/okhttp-2.7.5.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okhttp-2.7.5.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okhttp.jar
rm -rf /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/okhttp-2.7.5.jar
rm -rf /usr/hdp/3.0.1.0-187/spark2/jars/okhttp-2.7.5.jar
rm -rf /usr/hdp/3.0.1.0-187/hbase/lib/okhttp-2.7.5.jar
rm -rf /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar
rm -rf /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar
rm -rf /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar
3、将新的okhttp的jar包放到3.1.5步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中
命令:
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/livy2/jars/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop/client/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/spark2/jars/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hbase/lib/.
cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.
1.1.6 更换java-xmlbuilder的jar包
1、查找java-xmlbuilder*的jar包,记录jar包路径,备份jar包
命令:
find / -name java-xmlbuilder*
cp /usr/lib/ambari-server/java-xmlbuilder-0.4.jar /mnt/oldjar/.
2、删除所有旧的java-xmlbuilder*的jar包,删除后请再次find确认删除完全
命令:
rm -rf /usr/lib/ambari-server/java-xmlbuilder-0.4.jar
3、将新的java-xmlbuilder的jar包放到3.1.6步骤1查找到的目录中,同时还需要将新的java-xmlbuilder的jar包放到3.1.4步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中
命令:
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/lib/ambari-server/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/livy2/jars/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop/client/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/spark2/jars/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hbase/lib/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.
cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.
1.2 更新配置文件操作步骤
1.2.1 在HDFS集群中增加配置项
1、在HDFS集群CONFIGS的ADVANCED配置项中增加Custom core-site.xml文件中的配置项,包括:fs.obs.access.key,fs.obs.secret.key,fs.obs.endpoint和fs.obs.impl,其中fs.obs.access.key、fs.obs.secret.key、fs.obs.endpoint分别为用户的ak、sk和endpoint,请根据实际情况填写,fs.obs.impl的配置值为org.apache.hadoop.fs.obs.OBSFileSystem
2、重启HDFS集群
1.2.2 在MapReduce2集群中增加配置项
1、在MapReduce2集群CONFIGS的ADVANCED配置项中修改mapred-site.xml文件中的mapreduce.application.classpath配置项,添加路径:/ usr/hdp/3.0.1.0-187/hadoop-mapreduce/*
2、重启MapReduce2集群
1.3 使用OBS桶进行验证
1、使用hadoop命令对接OBS桶进行验证
命令:
hadoop fs -ls obs://obs-test-tmp0001/
2、使用Mapreduce的wordcount进行对接OBS验证
命令:
yarn jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-mapreduce-examples.jarr wordcount obs://bms-bucket-test01/f0.txt obs://bms-bucket-test01/result10
3、使用Spark进行对接OBS验证
val df0=spark.read.option("header","false").option("delimiter","|").csv("obs://obs-bucket12345/2019/tmplog.txt")
df0.select(max("_c2")).show()
- 点赞
- 收藏
- 关注作者
评论(0)