This article describe how to write and execute simple MapReduce program for Hadoop in Python. Hadoop is written in Java, but hadoop provide api to MapReduce so we can write map and reduce function in any language. Hadoop Streaming uses Unix standard streams as the interface, so you can use any language that supports read from standard input and write to standard output.
Python MapReduce Code:
Steps for Mapper Code:
- Open terminal then "gedit mapper.py" and paste following code and save. I kept mapper.py file on desktop "/home/kb/Desktop/mapper.py".
- Change the permission of mapper file:
- Check your mapper.py program running properly:
Steps for Reducer Code:
- Open terminal then "gedit reducer.py" and paste following code and save. I kept reducer.py file on desktop "/home/kb/Desktop/reducer.py".
- Change the permission of reducer file:
- Check your reducer.py program running properly:
Steps for Executing program:
- For executing program we require jar file "hadoop-streaming-2.7.0.jar".
- Output :