Hadoop Streaming 编程实战（C++、Php、Python）

当前位置：首页 / 福利社 / 资料 / Hadoop Streaming 编程实战（C++、Php、Python）

Hadoop Streaming 编程实战（C++、Php、Python）

冷月无声2017/05/19

Streaming框架允许任何程序语言实现的程序在Hadoop MapReduce中使用，方便已有程序向Hadoop平台移植。因此可以说对于hadoop的扩展性意义重大。接下来我们分别使用C++、Php、Python语言实现Hadoop WordCount。

实战一：C++语言实现Wordcount
代码实现：
1）C++语言实现WordCount中的Mapper，文件命名为mapper.cpp，以下是详细代码
#include
#include
#include
using namespace std;

int main(){
string key;
string value = "1";
while(cin>>key){
cout< }
return 0;
}
2）C++语言实现WordCount中的Reducer，文件命名为reducer.cpp，以下是详细代码
#include
#include
#include
#include
using namespace std;
int main(){
string key;
string value;
map word2count;
map::iterator it;
while(cin>>key){
cin>>value;
it = word2count.find(key);
if(it != word2count.end()){
(it->second)++;
}
else{
word2count.insert(make_pair(key, 1));
}
}

for(it = word2count.begin(); it != word2count.end(); ++it){
cout }
return 0;

}

测试运行C++实现Wordcount的具体步骤

1）在线安装C++
在Linux环境下，如果没有安装C++，需要我们在线安装C++

yum -y install gcc-c++

2）对c++文件编译，生成可执行文件
我们通过以下命令将C++程序编译成可执行文件，然后才能够运行
g++ -o mapper mapper.cpp

g++ -o reducer reducer.cpp

3）本地测试
集群运行C++版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

cat djt.txt |./mapper |sort|./reducer

4）集群运行
切换到hadoop安装目录下，提交C++版本的WordCount作业，进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "./mapper"
-reducer "./reducer"
-file mapper
-file reducer
-input /dajiangtai/djt.txt

-output /dajiangtai/out

如果最终出现想要的结果，说明C++语言成功实现Wordcount

实战二：Php语言实现Wordcount
代码实现：
1）Php语言实现WordCount中的Mapper，文件命名为wc_mapper.php，以下是详细代码
#!/usr/bin/php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while(($line = fgets(STDIN)) !== false){
$line = trim($line);
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach($words as $word){
echo $word, chr(9),"1",PHP_EOL;
}
}

2）Php语言实现WordCount中的Reducer，文件命名为wc_reducer.php，以下是详细代码
#!/usr/bin/php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while(($line = fgets(STDIN)) !== false){
$line = trim($line);
list($word,$count) = explode(chr(9),$line);
$count = intval($count);
$word2count[$word] += $count;
}
foreach($word2count as $word => $count){
echo $word, chr(9),$count,PHP_EOL;
}

测试运行Php实现Wordcount的具体步骤

1）在线安装Php
在Linux环境下，如果没有安装Php，需要我们在线安装Php环境

yum -y install php

2）本地测试
集群运行Php版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

cat djt.txt|php wc_mapper.php |sort|php wc_reducer.php

3）集群运行
切换到hadoop安装目录下，提交Php版本的WordCount作业，进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "php wc_mapper.php"
-reducer "php wc_reducer.php"
-file wc_mapper.php
-file wc_reducer.php
-input /dajiangtai/djt.txt
-output /dajiangtai/out
如果最终出现想要的结果，说明Php语言成功实现Wordcount

实战三：Python语言实现Wordcount
代码实现：
1）Python语言实现WordCount中的Mapper，文件命名为Mapper.py，以下是详细代码
#!/usr/java/hadoop/env python
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
words = filter(lambda word: word, line.split())
for word in words:

print '%s\t%s' % (word, 1)

2）Python语言实现WordCount中的Reducer，文件命名为Reducer.py，以下是详细代码
#!/usr/java/hadoop/env python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
word, count = line.split()
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError:
pass
sorted_word2count = sorted(word2count.items(), key=itemgetter(0))
for word, count in sorted_word2count:
print '%s\t%s'% (word, count)

测试运行Python实现Wordcount的具体步骤

1）在线安装Python
在Linux环境下，如果没有安装Python，需要我们在线安装Python环境

yum -y install python27

2）本地测试
集群运行Python版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

cat djt.txt|python Mapper.py |sort|python Reducer.py

3）集群运行
切换到hadoop安装目录下，提交Python版本的WordCount作业，进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "python Mapper.py"
-reducer "python Reducer.py"
-file Mapper.py
-file Reducer.py
-input /dajiangtai/djt.txt
-output /dajiangtai/out

如果最终出现想要的结果，说明Python语言成功实现Wordcount

备注：如果对上述内容细节不是很了解或者有疑问，可以免费学习课程：

Hadoop Streaming 多语言编程实战

http://www.dajiangtai.com/course/54.do

Hadoop Streaming 编程实战（C++、Php、Python）

看完这篇文章的人大多学习了更多课程>>