Hive学习之WordCount单词统计

ydbjason 2013-04-16

单词统计相当于编程开始的HELLO WORLD。应该都跑过。假设这里有一个文档,里面有两行这样的话:

Hello World Bye World

Hello Hadoop GoodBye Hadoop

最终要显示的结果如下: 

Hive学习之WordCount单词统计

程序如下:

Map:

public class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output,
                    Reporter reporter) throws IOException {
        String line = value.toString();
        StringTokenizer itr = new StringTokenizer(line);
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            output.collect(word, one);
        }
    }
}

Reduce:

public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterator<IntWritable> values,
                      OutputCollector<Text, IntWritable> output,
                      Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
          sum += values.next().get();
      }
      output.collect(key, new IntWritable(sum));
  }

客户端:

public class WordCount {
  public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));
      JobClient.runJob(conf);
  }

以上是传统的MR程序。现在,我们可以利用hive来做这样的事。

相关推荐