Steps for execution of Wordcount in hadoop

In last few articles we studied about theoretical concept of hadoop.In this article we learns Steps to write and execute your Hadoop code.Basic program i.e. hello world program for hadoop is wordcount.We will see steps to write and execute wordcount program.
Create a sample file e.g. file.txt
$ cat > file.txt
Bigdata and crunching techniques.
Mathematics is tough subject.
What is Bigdata?
What is the time now?
When you create a file it's present on a local file system. Copy this file from local file system to hadoop distributed file system i.e. HDFS.
$ hadoop fs -put path1 path2
path1 = local file system path
path2 = hdfs path
e.g. $ hadoop fs -put /home/kb/file.txt /WordCount/file.txt
Writing code for WordCount:
1. Driver.java (main class file)
2. Mapper.java
3. Reducer.java
Compile all .java file using hadoop-core.jar file.
$javac -classpath $HADOOP_HOME/hadoop-core.jar *.java
Hadoop framework can run only jar files. So we have to create jar file for above compiled programs.
$ jar cvf WordCount.jar *.class
Run jar file:
$ hadoop jar input1 input2 input3 input4

input1 = WordCount.jar
input2 = Name of Class which contains main() method
input3 = Input file present in hdfs (“file.txt”) // It may be file or directory. File must be present in hdfs.
input4 = Output directory name // It must be directory. Must create a new directory

$ hadoop jar WordCount.jar WordCount /WordCount/file.txt WordCountOP
Check the output present in WordCountOP directory present in HDFS. There are three files.
1) _Success //only shows successfule execution of program
2) log //contains all log information
3) part-r-00000 //contains actual count of words. 
$ hadoop fs -ls /WordCountOP/part-r-00000
Or you can also directly download file from hdfs (localhost:50070)

This are the simple way to write and execute Hadoop code. Now enjoy programming in Hadoop.

2 comments:

Post a Comment