成长之路 2019-12-24
环境描述:redhat7.3 CDH5.15.1 采用parcels方式部署
报错描述:airflow调度程序,最近2周偶尔报错,报错类型有2类:1、无法初始化集群配置;2、读取配置权限问题
报错一:
Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there‘s no reduce operator java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:143) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:108) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:101) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:477) at org.apache.hadoop.mapred.JobClient.(JobClient.java:455)
报错二
19/12/21 04:06:27 ERROR conf.Configuration: error parsing conf core-site.xml java.io.FileNotFoundException: /etc/hive/conf.cloudera.hive/core-site.xml (Permission denied) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at java.io.FileInputStream.(FileInputStream.java:93) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
报错三
setfacl: Permission denied. user=dip is not the owner of inode=.hive-staging_hive_2019-12-22_07-44-25_997_2557429548076828737-1 java.lang.RuntimeException: java.io.FileNotFoundException: /etc/hive/conf.cloudera.hive/hive-site.xml (Permission denied) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2811) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2663) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1340) at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java:2756) at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java:2777) at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2849) at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:2792)
检查集群中的所有机器,发现有一台机器配置文件一直在疯狂更新:
/etc/hive/conf.cloudera.hive
/etc/hadoop/conf.cloudera.yarn
/etc/hbase/conf.cloudera.hbase
检查:
该机器: /var/lib/alternatives 下面有空文件
操作:
删除/var/lib/alternatives 下面所有文件,
重启agent
cd /var/lib/alternatives rm -rf * systemctl restart cloudera-scm-agent.service
观察 : 配置文件不再疯狂更新,问题解决。