hadoop 伪分布式单机部署练习hive

taisenki 2020-05-27

第一步环境准备:

jdk安装,用户用组新建

useradd  -m hadoop 

passwd hadoop 修改密码

添加用户hadoop到hadoop用户组 

wget   https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

tar -xvf  hadoop-3.2.1.tar.gz  -C /data/projects 

sudo chown -R hadoop:hadoop /data/projects 

usermod  -a  -G hadoop haddop 第一个hadoop是组名,-a 防止其他用户组的hadoop离开,保持旧的用户组拥有hadoop用户状态

单机伪分布式,免密操作

ssh-keygen -t rsa 

cat id_rsa.pub  >> authorized_keys

chmod  600  authorized_keys

修改主机名不重启

hostname hadoop 

配置hadoop环境变量:类比jdk

# hadoop home
export HADOOP_HOME=/data/projects/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

修改hadoop 配置文件:/data/projects/hadoop/etc/hadoop

1.修改hadoop-env.sh添加如下:

[ hadoop]$ grep JAVA_HOME hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_221

2.修改core-site.xml

.配置默认采用的文件系统。
(由于存储层和运算层松耦合,要为它们指定使用hadoop原生的分布式文件系统hdfs。value填入的是uri,参数是 分布式集群中主节点的地址 : 指定端口号

2.配置hadoop的公共目录
(指定hadoop进程运行中产生的数据存放的工作目录,NameNode、DataNode等就在本地工作目录下建子目录存放数据。但事实上在生产系统里,NameNode、DataNode等进程都应单独配置目录,而且配置的应该是磁盘挂载点,以方便挂载更多的磁盘扩展容量

<configuration>
  <property>
	<name>fs.defaultFS</name>
	<value>hdfs://hadoop:9000</value>
  </property>
  <property>
	<name>hadoop.tmp.dir</name>
	<value>/data/projects/hadoop/tmp</value>
  </property>
</configuration>

3.修稿hdfs-site.xml,配置副本数量

1.配置启动hadoop50070端口

2.(客户端将文件存到hdfs的时候,会存放在多个副本。value一般指定3,但因为搭建的是伪分布式就只有一台机器,所以只能写1。)

<configuration>
   <property>
	<name>dfs.replication</name>
	<value>1</value>
   </property>
  <property>
   <name>dfs.http.address</name>
   <value>192.168.110.151:50070</value>
  </property>
</configuration>

 4.配置 mapred-site.xml 

指定MapReduce程序应该放在哪个资源调度集群上运行。若不指定为yarn,那么MapReduce程序就只会在本地运行而非在整个集群中运行。

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

5.配置 yarn-site.xml

1.指定yarn集群中的老大(就是本机)

2.配置yarn集群中的重节点,指定map产生的中间结果传递给reduce采用的机制是shuffle

<configuration>

   <property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop</value>
   </property>
   <property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
   </property>
</configuration>

6.配置 关闭防火墙

格式化hadoop : 

执行hdfs namenode -format 

2020-05-27 19:18:49,081 INFO util.GSet: 0.029999999329447746% max memory 839.5 MB = 257.9 KB
2020-05-27 19:18:49,081 INFO util.GSet: capacity = 2^15 = 32768 entries
2020-05-27 19:18:49,112 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1667952246-192.168.110.151-1590578329102
2020-05-27 19:18:49,131 INFO common.Storage: Storage directory /data/projects/hadoop/tmp/dfs/name has been successfully formatted.
2020-05-27 19:18:49,184 INFO namenode.FSImageFormatProtobuf: Saving image file /data/projects/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2020-05-27 19:18:49,367 INFO namenode.FSImageFormatProtobuf: Image file /data/projects/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2020-05-27 19:18:49,399 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-05-27 19:18:49,416 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2020-05-27 19:18:49,416 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.110.151
************************************************************/

启动服务:cd  /data/projects/hadoop/sbin 执行

[ sbin]$ start-dfs.sh
Starting namenodes on [hadoop]
hadoop: Warning: Permanently added ‘hadoop‘ (ECDSA) to the list of known hosts.
Starting datanodes
localhost: Warning: Permanently added ‘localhost‘ (ECDSA) to the list of known hosts.
Starting secondary namenodes [hadoop]
[ sbin]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[ sbin]$ jps
57681 NameNode
58020 SecondaryNameNode
57800 DataNode
58712 Jps
58380 NodeManager
58255 ResourceManager

六个一个不少就成功了

相关推荐