GaussDB T分布式集群这样安装部署不踩坑
作者介绍
魏斌,新炬网络资深数据库专家,长期服务于运营商、金融、制造业及政企客户。从传统商业DB到开源分布式,均有涉猎及独到见解。职业以来扎根客户一线,对于紧急故障处置及性能问题优化具有丰富经验,尤善于灾备、多中心建设及异构数据迁移。
本文我们将带大家一起进行GaussDB T(旧称GaussDB 100)分布式集群的安装,本次安装示例以单点容灾部署2CN、2DN的集群安装进行。
大伙们,重头戏来了,我们一起来列队整齐划一,一步、两步……
环境介绍
系统版本:RedHat7.5 X86 64
数据库版本:GaussDB100 V1.0.0
节点数:4个
部署方案:
IP及主机名:
192.168.57.21 gaussdb11.localdomain gaussdb11
192.168.57.22 gaussdb12.localdomain gaussdb12
192.168.57.23 gaussdb13.localdomain gaussdb13
192.168.57.24 gaussdb14.localdomain gaussdb14
一、开启root用户远程登录权限并关闭selinux
1、编辑sshd_config文件
vi /etc/ssh/sshd_config
2、修改PermitRootLogin配置,允许用户远程登录
可以使用以下两种方式实现:
1)注释掉"PermitRootLogin no"
#PermitRootLogin no
2)将PermitRootLogin改为yes
PermitRootLogin yes
3、修改Banner配置,去掉连接到系统时,系统提示的欢迎信息
注释掉"Banner"所在的行:
#Banner none
4、修改PasswordAuthentication配置,允许用户登录时进行密码鉴权,退出保存
将PasswordAuthentication改为yes:
PasswordAuthentication yes
5、重启sshd服务,并使用root用户身份重新登录
#service sshd restart
如果执行命令后返回提示信息Redirecting to /bin/systemctl restart sshd.service,则执行如下命令:
#/bin/systemctl restart sshd.service
6、关闭selinux
#vi /etc/selinux/config
SELINUX=disabled
二、关闭系统防火墙并disable
# systemctl stop firewalld.service
# systemctl disable firewalld.service
三、安装系统包
本次使用ISO介质配置yum源,用于数据库安装依赖包的安装。
在/etc/rc.local文件末尾写入一行:
mount /dev/cdrom /mnt
保证每次系统启动的时候都能把光盘里面的内容挂载到/mnt目录中。
1、配置yum源
将原先的yum源备份,新建一个yum源:
cd /etc/yum.repos.d
mkdir bak
mv redhat* ./bak
vi iso.repo
[root@gaussdb11 yum.repos.d]# cat iso.repo
[rhel-iso]
name=Red Hat Enterprise Linux - Source
baseurl=file:///mnt
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
2、查看package
#yum list
yum install -y zlib readline gcc
yum install -y python python-devel
yum install perl-ExtUtils-Embed
yum install -y readline-devel
yum install -y zlib-devel
yum install -y lsof
3、验证包是否安装
rpm -qa --queryformat "%{NAME}-%{VERSION}-%{RELEASE} (%{ARCH})\n" | grep -E "zlib|readline|gcc\
|python|python-devel|perl-ExtUtils-Embed|readline-devel|zlib-devel"
四、准备及安装
1、创建存放安装包的目录并解压安装包(任一主机操作)
su - root
mkdir -p /opt/software/gaussdb
cd /opt/software/gaussdb
tar -zxvf GaussDB_100_1.0.0-CLUSTER-REDHAT7.5-64bit.tar.gz
vi clusterconfig.xml --创建集群配置文件
内容如下:
<?xml version="1.0" encoding="utf-8"?>
<ROOT>
<CLUSTER>
<PARAM name="clusterName" value="gaussdbt_cluster"/>
<PARAM name="nodeNames" value="gaussdb11,gaussdb12,gaussdb13,gaussdb14"/>
<PARAM name="gaussdbAppPath" value="/opt/gaussdb/app"/>
<PARAM name="gaussdbLogPath" value="/opt/gaussdb/log"/>
<PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp/gaussdb_mppdb"/>
<PARAM name="gaussdbToolPath" value="/opt/gaussdb/huawei/wisequery"/>
<PARAM name="archiveLogPath" value="/opt/gaussdb/arch_log"/>
<PARAM name="redoLogPath" value="/opt/gaussdb/redo_log"/>
<PARAM name="datanodeType" value="DN_ZENITH_ZPAXOS"/>
<PARAM name="coordinatorType" value="CN_ZENITH_ZSHARDING"/>
<PARAM name="clusterType" value="mutil-AZ"/>
</CLUSTER>
<DEVICELIST>
<DEVICE sn="1000001">
<PARAM name="name" value="gaussdb11"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.57.21"/>
<PARAM name="sshIp1" value="192.168.57.21"/>
<PARAM name="cooNum" value="1"/>
<PARAM name="cooPortBase" value="8000"/>
<PARAM name="cooListenIp1" value="192.168.57.21"/>
<PARAM name="cooDir1" value="/gaussdb/data/data_cn"/>
<PARAM name="gtsNum" value="1"/>
<PARAM name="gtsPortBase" value="13000"/>
<PARAM name="gtsDir1" value="/gaussdb/data/data_gts,gaussdb12,/gaussdb/data/data_gts"/>
<PARAM name="etcdNum" value="1"/>
<PARAM name="etcdListenPort" value="20300"/>
<PARAM name="etcdHaPort" value="20500"/>
<PARAM name="etcdListenIp1" value="192.168.57.21"/>
<PARAM name="etcdHaIp1" value="192.168.57.21"/>
<PARAM name="etcdDir1" value="/gaussdb/data/data_etcd"/>
</DEVICE>
<DEVICE sn="1000002">
<PARAM name="name" value="gaussdb12"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.57.22"/>
<PARAM name="sshIp1" value="192.168.57.22"/>
<PARAM name="cooNum" value="1"/>
<PARAM name="cooPortBase" value="8000"/>
<PARAM name="cooListenIp1" value="192.168.57.22"/>
<PARAM name="cooDir1" value="/gaussdb/data/data_cn"/>
<PARAM name="cmsNum" value="1"/>
<PARAM name="cmServerListenIp1" value="192.168.57.22,192.168.57.21"/>
<PARAM name="cmServerHaIp1" value="192.168.57.22,192.168.57.21"/>
<PARAM name="cmServerlevel" value="1"/>
<PARAM name="cmServerRelation" value="gaussdb12,gaussdb11"/>
<PARAM name="etcdNum" value="1"/>
<PARAM name="etcdListenPort" value="20300"/>
<PARAM name="etcdHaPort" value="20500"/>
<PARAM name="etcdListenIp1" value="192.168.57.22"/>
<PARAM name="etcdHaIp1" value="192.168.57.22"/>
<PARAM name="etcdDir1" value="/gaussdb/data/data_etcd"/>
</DEVICE>
<DEVICE sn="1000003">
<PARAM name="name" value="gaussdb13"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.57.23"/>
<PARAM name="sshIp1" value="192.168.57.23"/>
<PARAM name="etcdNum" value="1"/>
<PARAM name="etcdListenPort" value="20300"/>
<PARAM name="etcdHaPort" value="20500"/>
<PARAM name="etcdListenIp1" value="192.168.57.23"/>
<PARAM name="etcdHaIp1" value="192.168.57.23"/>
<PARAM name="etcdDir1" value="/gaussdb/data/data_etcd"/>
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="40000"/>
<PARAM name="dataNode1" value="/gaussdb/data/data_dn,gaussdb11,/gaussdb/data/data_dn "/>
</DEVICE>
<DEVICE sn="1000004">
<PARAM name="name" value="gaussdb14"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.57.24"/>
<PARAM name="sshIp1" value="192.168.57.24"/>
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="40000"/>
<PARAM name="dataNode1" value="/gaussdb/data/data_dn,gaussdb12,/gaussdb/data/data_dn"/>
</DEVICE>
</DEVICELIST>
</ROOT>
给目录赋权
chmod -R 755 /opt/software
2、确认集群各节点root密码一致,因脚本互信配置需密码一致。如果不能修改密码,请提前手工完成root用户的互信配置
3、使用gs_preinstall准备好安装环境
su - root
cd /opt/software/gaussdb/script
--预安装配置环境
./gs_preinstall -U omm -G dbgrp -X /opt/software/gaussdb/clusterconfig.xml
示例:
4、查看预安装日志发现有安装环境时钟同步不一致警告,需要进行NTP设置
5、配置NTP,节点1作为NTP服务器,其他节点同步节点1
1)安装ntp
yum -y install ntp
2)节点1/etc/ntp.conf新增如下内容
server 127.0.0.1
fudge 127.0.0.1 stratum 10
restrict 192.168.57.21 nomodify notrap nopeer noquery <<====当前节点IP地址
restrict 192.168.57.255 mask 255.255.255.0 nomodify notrap <<====集群所在网段的网关(Gateway),子网掩码(Genmask)
3)其他节点/etc/ntp.conf新增如下内容
节点2:
server 192.168.57.21 <<====同步NTP服务器的IP
Fudge 192.168.57.21 stratum 10 <<====同步NTP服务器的IP
restrict 192.168.57.22 nomodify notrap nopeer noquery
restrict 192.168.57.255 mask 255.255.255.0 nomodify notrap
节点3:
server 192.168.57.21
Fudge 192.168.57.21 stratum 10
restrict 192.168.57.23 nomodify notrap nopeer noquery
restrict 192.168.57.255 mask 255.255.255.0 nomodify notrap
节点4:
server 192.168.57.21
Fudge 192.168.57.21 stratum 10
restrict 192.168.57.24 nomodify notrap nopeer noquery
restrict 192.168.57.255 mask 255.255.255.0 nomodify notrap
4)启动ntp服务
service ntpd start
5)查看ntp服务器有无和上层ntp连通
ntpstat
6)查看ntp服务器与上层ntp的状态
ntpq -p
7)设置ntp服务开机启动
systemctl enable ntpd
6、使用gs_checkos检查环境是否符合安装
7、开始安装数据库
su - omm
cd /opt/software/gaussdb/script
./gs_install -X /opt/software/gaussdb/clusterconfig.xml
附:
使用gs_uninstall卸载数据库集群:
gs_uninstall --delete-data
或者在集群中每个节点执行本地卸载:
gs_uninstall --delete-data -L
当集群状态不正常,获取不到集群信息时执行如下命令卸载集群:
gs_uninstall --delete-data -X
/opt/software/gaussdb/clusterconfig.xml
或者在集群中每个节点执行本地卸载:
gs_uninstall --delete-data -L -X
/opt/software/gaussdb/clusterconfig.xml
8、检查集群安装成功
注:由于本机内存不够,故将四台虚拟机改为三台虚拟机,并将paxos组网方式改成了ha组网。
附:
1)查看集群状态
gs_om -t status
2)停掉某个主机的所有实例
gs_om -t stop -h gaussdb13
3)启动某个主机的所有实例
gs_om -t start -h gaussdb13
4)DN主备切换,gaussdb13为备DN所在的主机名,DB2_3为要被切换的备DN名称
gs_om -t switch -h gaussdb13 -I DB2_3
5)CM主备切换, gaussdb12为当前备CM所在的主机名称, CM2为gaussdb12主机上的CM实例名称
gs_om -t switch -h gaussdb12 -I CM2
6)启停集群
gs_om -t start
gs_om -t stop
7)启停etcd
gs_om -t startetcd
gs_om -t stopetcd
五、高可用测试
本次测试以模拟节点3宕掉为背景进行。
1、查看主备DN状态,我们可以看到主DN分别为节点2上的DB1_1及节点3上的DB2_3
2、模拟节点3宕掉,停掉节点3上的所有实例
3、节点2上的备DN DB2_4变成主DN
4、启动节点3上的所有实例
5、发现主备库自动追平
6、将DB2_3备DN切成主DN
7、切换成功
六、安装问题大汇总
问题一:预安装报包类型跟CPU类型不一致
[root@gaussdb11 script]# ./gs_preinstall -U omm -G dbgrp -X /opt/software/gaussdb/clusterconfig.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools>Successfully installed the tools>Are you sure you want to create trust for root (yes/no)? yes
Please enter password for root.
Password:
Creating SSH trust for the root permission user.
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key>Successfully appended authorized_key>Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Successfully distributed SSH trust file to all node.
Verifying SSH trust>Successfully verified SSH trust>Successfully created SSH trust.
Successfully created SSH trust for the root permission user.
[GAUSS-52406] : The package type "" is inconsistent with the Cpu type "X86".
[root@gaussdb11 script]#
解决方法:
1)查看preinstall脚本运行日志。路径是clusterconfig.xml中参数gaussdbLogPath对应的路径,在该目录下om/gs_preinstall*.log的前置日志报错如下:
[2019-11-28 22:50:08.335532][gs_preinstall][LOG]:Successfully created SSH trust for the root permission user.
[2019-11-28 22:50:08.992537][gs_preinstall][ERROR]:[GAUSS-52406] : The package type "" is inconsistent with the Cpu type "X86".
Traceback (most recent call last)
File "./gs_preinstall", line 507, in <module>
File "/opt/software/gaussdb/script/impl/preinstall/PreinstallImpl.py", line 1861, in run
2)修改/opt/software/gaussdb/script/impl/preinstall/PreinstallImpl.py注释如下行
#self.getAllCpu()
问题二:预安装是报时钟同步告警
A12.[ Time consistency status ] : Warning
解决方法:配置NTP同步,配置方法见第四节步骤5。
问题三:安装数据库时报由于权限问题SYSDBA登录失败
[omm@gaussdb11 script]$ ./gs_install -X /opt/software/gaussdb/clusterconfig.xml
Parsing the configuration file.
Check preinstall>Successfully checked preinstall>Creating the backup directory.
Successfully created the backup directory.
Check the time difference between hosts in the cluster.
Installing the cluster.
Installing applications>Successfully installed APP.
Distribute etcd communication keys.
Successfully distrbute etcd communication keys.
Initializing cluster instances
.............193s
[FAILURE] gaussdb11:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize GTS1 instance
[GAUSS-51607] : Failed to start zenith instance..Output:
ZS-00001: no privilege is found
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
[FAILURE] gaussdb12:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize GTS2 instance
Successfully Initialize GTS2 instance.
Initialize cn_402 instance
[GAUSS-51607] : Failed to start zenith instance..Output:
ZS-00001: no privilege is found
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
[FAILURE] gaussdb13:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB1_1 instance
[GAUSS-51607] : Failed to start zenith instance..Output:
ZS-00001: no privilege is found
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
[FAILURE] gaussdb14:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB2_3 instance
[GAUSS-51607] : Failed to start zenith instance..Output:
ZS-00001: no privilege is found
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
.[omm@gaussdb11 script]$
分析解决步骤:
1)查看install日志,路径:
cd /opt/gaussdb/log/omm/om
[root@gaussdb11 om]# ls -lrt
total 52
-rw-------. 1 omm dbgrp 42006 Dec 1 21:43 gs_local-2019-12-01_213124.log
-rw-------. 1 omm dbgrp 5240 Dec 1 21:44 gs_install-2019-12-01_213118.log
[root@gaussdb11 om]# tail -25 gs_local-2019-12-01_213124.log
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
[2019-12-01 21:43:26.533606][Install][ERROR]:[GAUSS-51607] : Failed to start zenith instance..Output:
ZS-00001: no privilege is found
ZS-00001: "SYSDBA" login failed, login as sysdba is prohibited or privilege is incorrect
SQL>
ZS-00001: connection is not established
SQL>
Traceback (most recent call last)
File "/opt/software/gaussdb/script/local/Install.py", line 704, in <module>
File "/opt/software/gaussdb/script/local/Install.py", line 625, in initInstance
File "/opt/software/gaussdb/script/local/Install.py", line 614, in __tpInitInstance
File "/opt/software/gaussdb/script/local/../gspylib/component/Kernal/Zenith.py", line 308, in initialize
File "/opt/software/gaussdb/script/local/../gspylib/component/Kernal/CN_OLTP/Zsharding.py", line 62, in initDbInstance
File "/opt/software/gaussdb/script/local/../gspylib/component/Kernal/CN_OLTP/Zsharding.py", line 100, in initZenithInstance
File "/opt/software/gaussdb/script/local/../gspylib/component/Kernal/Zenith.py", line 406, in startInstance
2)查看/opt/gaussdb/log/omm/db_log/GTS1/run/zengine.rlog发现是内存不足导致。
UTC+8 2019-11-29 21:50:03.755|ZENGINE|00000|26307|INFO>[PARAM] LOG_HOME = /opt/gaussdb/log/omm/db_log/GTS1
UTC+8 2019-11-29 21:50:03.755|ZENGINE|00000|206158456515|INFO>starting instance(nomount)
UTC+8 2019-11-29 21:50:03.755|ZENGINE|00000|26307|ERROR>GS-00001 : Failed to allocate 4592381952 bytes for sga [srv_sga.c:170]
UTC+8 2019-11-29 21:50:03.755|ZENGINE|00000|26307|ERROR>failed to create sga
UTC+8 2019-11-29 21:50:03.755|ZENGINE|00000|26307|ERROR>Instance Startup Failed
3)把所有虚拟机的内存加大即可
本次测试虚拟机内存配置如下,供参考:
Gaussdb11:3.9G
Gaussdb12:4.9G
Gaussdb13:4.9G
问题四:安装报GAUSS-50601
1)安装进度日志:
[omm@gaussdb11 script]$ ./gs_install -X /opt/software/gaussdb/clusterconfig.xml
Parsing the configuration file.
Check preinstall>Successfully checked preinstall>Creating the backup directory.
Successfully created the backup directory.
Check the time difference between hosts in the cluster.
Installing the cluster.
Installing applications>Successfully installed APP.
Distribute etcd communication keys.
Successfully distrbute etcd communication keys.
Initializing cluster instances
390s
[SUCCESS] gaussdb11:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize cn_401 instance
Successfully Initialize cn_401 instance.
Modifying user's environmental variable $GAUSS_ENV.
Successfully modified user's environmental variable $GAUSS_ENV.
[FAILURE] gaussdb12:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB1_1 instance
Successfully Initialize DB1_1 instance.
Initialize DB2_4 instance
[GAUSS-50601] : The port [40001] is occupied.
[SUCCESS] gaussdb13:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB1_2 instance
Successfully Initialize DB1_2 instance.
Initialize DB2_3 instance
Successfully Initialize DB2_3 instance.
Modifying user's environmental variable $GAUSS_ENV.
Successfully modified user's environmental variable $GAUSS_ENV.
2)查看安装日志发现端口被占用
[omm@gaussdb11 omm]$ tail -300 om/gs_install-2019-12-09_161757.log
[2019-12-09 16:18:15.998104][gs_install][LOG]:Initializing cluster instances
[2019-12-09 16:18:15.999396][gs_install][DEBUG]:Init instance by cmd: source /etc/profile; source /home/omm/.bashrc;python '/opt/software/gaussdb/script/local/Install.py' -t init_instance -U omm:dbgrp -X /opt/software/gaussdb/clusterconfig.xml -l /opt/gaussdb/log/omm/om/gs_local.log --autostart=yes --alarm=/opt/huawei/snas/bin/snas_cm_cmd
[2019-12-09 16:24:49.689716][gs_install][ERROR]:[SUCCESS] gaussdb11:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize cn_401 instance
Successfully Initialize cn_401 instance.
Modifying user's environmental variable $GAUSS_ENV.
Successfully modified user's environmental variable $GAUSS_ENV.
[FAILURE] gaussdb12:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB1_1 instance
Successfully Initialize DB1_1 instance.
Initialize DB2_4 instance
[GAUSS-50601] : The port [40001] is occupied.
[SUCCESS] gaussdb13:
Using omm:dbgrp to install database.
Using installation program path : /home/omm
Initialize DB1_2 instance
Successfully Initialize DB1_2 instance.
Initialize DB2_3 instance
Successfully Initialize DB2_3 instance.
Modifying user's environmental variable $GAUSS_ENV.
Successfully modified user's environmental variable $GAUSS_ENV.
Traceback (most recent call last)
File "./gs_install", line 281, in <module>
File "/opt/software/gaussdb/script/impl/install/InstallImpl.py", line 93, in run
File "/opt/software/gaussdb/script/impl/install/InstallImpl.py", line 193, in doDeploy
File "/opt/software/gaussdb/script/impl/install/InstallImpl.py", line 291, in doInstall
[root@gaussdb12 om]# netstat -na |grep 40001
tcp 0 0 192.168.57.22:40001 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:40001 0.0.0.0:* LISTEN
3)卸载然后修改clusterconfig.xml文件,将节点3的DN端口改成50000继续,注意检查所有节点50000端口是否被占用。
su - omm
./gs_uninstall --delete-data -X /opt/software/gaussdb/clusterconfig.xml
vi clusterconfig.xml
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="50000"/> <<=================端口从40001修改成50000
<PARAM name="dataNode1" value="/gaussdb/data/data_dn2,gaussdb12,/gaussdb/data/data_dn2"/>
问题五、安装过程中报节点1的sha256文件不存在,集群安装失败
解决方法:从其他节点把文件scp过来即可
su - omm
cd /opt/software/gaussdb
scp *.sha256 gaussdb11:/opt/software/gaussdb
转自“墨天轮”
- 点赞
- 收藏
- 关注作者
评论(0)