冷月无声2017/05/19         
Streaming框架允许任何程序语言实现的程序在Hadoop MapReduce中使用,方便已有程序向Hadoop平台移植。因此可以说对于hadoop的扩展性意义重大。接下来我们分别使用C++、Php、Python语言实现Hadoop WordCount。
}
测试运行C++实现Wordcount的具体步骤
yum -y install gcc-c++
g++ -o reducer reducer.cpp
cat djt.txt |./mapper |sort|./reducer
-output /dajiangtai/out
?>
?>
测试运行Php实现Wordcount的具体步骤
yum -y install php
cat djt.txt|php wc_mapper.php |sort|php wc_reducer.php
print '%s\t%s' % (word, 1)
测试运行Python实现Wordcount的具体步骤
yum -y install python27
cat djt.txt|python Mapper.py |sort|python Reducer.py
如果最终出现想要的结果,说明Python语言成功实现Wordcount
备注:如果对上述内容细节不是很了解或者有疑问,可以免费学习课程:
Hadoop Streaming 多语言编程实战
实战一:C++语言实现Wordcount
代码实现:
1)C++语言实现WordCount中的Mapper,文件命名为mapper.cpp,以下是详细代码
#include
#include
#include
using namespace std;
int main(){
string key;
string value = "1";
while(cin>>key){
cout<
return 0;
}
2)C++语言实现WordCount中的Reducer,文件命名为reducer.cpp,以下是详细代码
#include
#include
#include
#include
using namespace std;
int main(){
string key;
string value;
map
map
while(cin>>key){
cin>>value;
it = word2count.find(key);
if(it != word2count.end()){
(it->second)++;
}
else{
word2count.insert(make_pair(key, 1));
}
}
for(it = word2count.begin(); it != word2count.end(); ++it){
cout
return 0;
在Linux环境下,如果没有安装C++,需要我们在线安装C++
我们通过以下命令将C++程序编译成可执行文件,然后才能够运行
g++ -o mapper mapper.cpp
集群运行C++版本的WordCount之前,首先要在Linux本地测试运行,调试成功,确保程序在集群中正常运行,测试运行命令如下:
切换到hadoop安装目录下,提交C++版本的WordCount作业,进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "./mapper"
-reducer "./reducer"
-file mapper
-file reducer
-input /dajiangtai/djt.txt
代码实现:
1)Php语言实现WordCount中的Mapper,文件命名为wc_mapper.php,以下是详细代码
#!/usr/bin/php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while(($line = fgets(STDIN)) !== false){
$line = trim($line);
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach($words as $word){
echo $word, chr(9),"1",PHP_EOL;
}
}
#!/usr/bin/php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while(($line = fgets(STDIN)) !== false){
$line = trim($line);
list($word,$count) = explode(chr(9),$line);
$count = intval($count);
$word2count[$word] += $count;
}
foreach($word2count as $word => $count){
echo $word, chr(9),$count,PHP_EOL;
}
在Linux环境下,如果没有安装Php,需要我们在线安装Php环境
集群运行Php版本的WordCount之前,首先要在Linux本地测试运行,调试成功,确保程序在集群中正常运行,测试运行命令如下:
切换到hadoop安装目录下,提交Php版本的WordCount作业,进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "php wc_mapper.php"
-reducer "php wc_reducer.php"
-file wc_mapper.php
-file wc_reducer.php
-input /dajiangtai/djt.txt
-output /dajiangtai/out
如果最终出现想要的结果,说明Php语言成功实现Wordcount
代码实现:
1)Python语言实现WordCount中的Mapper,文件命名为Mapper.py,以下是详细代码
#!/usr/java/hadoop/env python
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
words = filter(lambda word: word, line.split())
for word in words:
#!/usr/java/hadoop/env python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
word, count = line.split()
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError:
pass
sorted_word2count = sorted(word2count.items(), key=itemgetter(0))
for word, count in sorted_word2count:
print '%s\t%s'% (word, count)
在Linux环境下,如果没有安装Python,需要我们在线安装Python环境
集群运行Python版本的WordCount之前,首先要在Linux本地测试运行,调试成功,确保程序在集群中正常运行,测试运行命令如下:
切换到hadoop安装目录下,提交Python版本的WordCount作业,进行单词统计。
hadoop jar /usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
-D mapred.reduce.tasks=2
-mapper "python Mapper.py"
-reducer "python Reducer.py"
-file Mapper.py
-file Reducer.py
-input /dajiangtai/djt.txt
-output /dajiangtai/out