changjiang 2015-11-04
| (1)hadoop2.7.1源码编译 | http://aperise.iteye.com/blog/2246856 |
| (2)hadoop2.7.1安装准备 | http://aperise.iteye.com/blog/2253544 |
| (3)1.x和2.x都支持的集群安装 | http://aperise.iteye.com/blog/2245547 |
| (4)hbase安装准备 | http://aperise.iteye.com/blog/2254451 |
| (5)hbase安装 | http://aperise.iteye.com/blog/2254460 |
| (6)snappy安装 | http://aperise.iteye.com/blog/2254487 |
| (7)hbase性能优化 | http://aperise.iteye.com/blog/2282670 |
| (8)雅虎YCSBC测试hbase性能测试 | http://aperise.iteye.com/blog/2248863 |
| (9)spring-hadoop实战 | http://aperise.iteye.com/blog/2254491 |
| (10)基于ZK的Hadoop HA集群安装 | http://aperise.iteye.com/blog/2305809 |
lzo snappy gzip是hadoop支持的三种压缩方式,目前网上推荐snappy,这里讲解如何安装snappy

github上Hadoop源码(https://github.com/apache/hadoop/blob/trunk/BUILDING.txt)推荐的安装方式为:sudo apt-get install snappy libsnappy-dev
当前Hadoop新的版本在模块hadoop-common中都已经集成了相关压缩库的编解码工具,无需去其它地方下载编解码打包:
如果之前编译过Hadoop源代码,这一步骤可以不做。
官网给定的安装包中是不支持snappy压缩的,需要自己重新编译Hadoop源码,而编译源码首先需要保证linux上已经安装了linux关于snappy的库,已经在步骤1中解决。
关于如何编译Hadoop源代码,请参见http://aperise.iteye.com/blog/2246856
1.下载hadoop源代码hadoop-2.7.1-src.tar.gz放置于/root下并解压缩
cd /root wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz tar -zxvf hadoop-2.7.1-src.tar.gz
cd /root/hadoop-2.7.1-src export MAVEN_OPTS="-Xms256m -Xmx512m" mvn package -Pdist,native -DskipTests -Dtar -rf :hadoop-common -Drequire.snappy -X #如果想编译整个代码且单独指定snappy库位置,命令如下: #mvn package -Pdist,native,docs,src -DskipTests -Drequire.snappy -Dsnappy.lib=/usr/local/lib4.编译完成后,在/root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/lib/native下得到如下文件:


#1.将步骤2中编译的snappy支持文件拷贝到Hadoop中
#这里我安装的Hadoop位置为/home/hadoop/hadoop-2.7.1
cp -r /root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/lib/native/* /home/hadoop/hadoop-2.7.1/lib/native/
cp /usr/local/lib/* /home/hadoop/hadoop-2.7.1/lib/native/
#2.将步骤3编译后的hadoop-common-2.7.1.jar文件拷贝到Hadoop
#这里我安装的Hadoop位置为/home/hadoop/hadoop-2.7.1
cp -r /root/hadoop-2.7.1-src/hadoop-dist/target/hadoop-2.7.1/share/hadoop/common/* /home/hadoop/hadoop-2.7.1/share/hadoop/common/
#3.修改hadoop的配置文件/home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml,增加如下配置:
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.Lz4Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
<description>A comma-separated list of the compression codec classes that can
be used for compression/decompression. In addition to any classes specified
with this property (which take precedence), codec classes on the classpath
are discovered using a Java ServiceLoader.</description>
</property>
#4.修改/home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml,添加如下内容:
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
<description>Should the job outputs be compressed?
</description>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>RECORD</value>
<description>If the job outputs are to compressed as SequenceFiles, how should
they be compressed? Should be one of NONE, RECORD or BLOCK.
</description>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the job outputs are compressed, how should they be compressed?
</description>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the map outputs are compressed, how should they be
compressed?
</description>
</property>至此,Hadoop中已经可以支持snappy压缩算法了,hbase目前还待配置,请往下看。
至此,hbase中已经添加了对于snappy的支持。
cd /home/hadoop/hbase-1.2.1/bin/ ./hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://hadoop-ha-cluster/hbase/data/default/signal/3a194dcd996fd03c0c26bf3d175caaec/info/0c7f62f10a4c4e548c5ff1c583b0bdfa snappy
上面hdfs://hadoop-ha-cluster/hbase/data/default/signal/3a194dcd996fd03c0c26bf3d175caaec/info/0c7f62f10a4c4e548c5ff1c583b0bdfa是存在于我Hadoop上的HDFS文件

hdfs dfs -text /aaa.snappy