Introduction to Hadoop

Introduction:

Hello Readers, this is 1st article in tutorial series on Hadoop. In this article, I have suggested a road map for all the learners who are interested in learning the processing of BigData using Hadoop,Spark, Apex, Kafka and other bigdata technology at both conceptually and hands-on level.

Prerequisites for learning Hadoop:

  1. Programming Language - Core Java: Hadoop framework is written in Core Java, so knowledge of Core Java, mainly Collection framework, part is importance. Hadoop also provides API's for MapReduce programming called as Hadoop streaming which supports C++, Python, Scala and Ruby.
  2. Operating System- Linux and Windows: Hadoop is open source, so Linux OS distributions are most preferable for open source development.Now Windows can also support Hadoop installations with the use of hortonworks
  3. For deep understanding and modification in core Hadoop components, knowledge of distributed operating system is necessary.

Agenda for Hadoop Study:

  • Big Data:We will start with the BigData concepts. We will be learning and understanding BigData definition, types of data, characteristic, sources, application, limitations of traditional database, technologies to handle BigData.
  • Single node Installation of Hadoop: This will be our first foot in the world of Hadoop, wherein we will learn about Hadoop installation on a Linux system.
  • Understanding Hadoop components: This will be the introductory article about MapReduceand HDFS, very crucial Hadoop components.
  • Basic Linux commands: In this article, we will be learning some of the very basic Linux commands, that are very essential while working on Hadoop.
  • YARN: This article will be about understanding YARN architecture. YARN stands for Yet Another Resources Negotiator. Yarn provides advantages over previous version of Hadoop, including better scalability, cluster utilization and user agility. It will also include information about major changes in Hadoop, from Hadoop 1.x to 2.x.
  • Our first Hadoop program: We start writing programs with this. In this article, we will be writing 'Hello world' program of Hadoop i.e WordCount in Java and Python. We will also study how it works and go through internal details of the code, debugging of program and configuration of Hadoop with Eclipse IDE.
  • Hadoop Ecosystem: Hadoop Ecosystem will cover data access components like HivePig, Storage components like Hbase, Data ingestion technique such as Sqoop, flume workflow technique Oozie.
  • Multi-node installation of Hadoop : After working with Hadoop on single node, we will move a step further and find out real strength of Hadoop by installing it on multi-node cluster.
  • Examples: We will see some more examples to get more comfortable with Hadoop and Hadoop Ecosystems.
  • Pros and Cons : We will be discussing the advantages and limitations of Hadoop for solving real life problems.
  • We will also covers Hadoop Ecosystem: PIG, HIVE, SQOOP, HBASE, ZOOKEEPER, OOZIE etc and streaming data analysis techniques like SPARK and APACHE APEX.
That's all for this tutorial. In next article, we will be understanding about BigData terminology. Please do let me know about your views about this article in the comment section below, and stay tuned for more awesome articles. Enjoy learning of Hadoop and it's ecosystem.

7 comments:

Post a Comment