Thursday, December 8, 2016

Install hadoop-2.7.3 on Ubuntu 16.04

Posted by Kishore Bhosale

Introduction

In this article we are installing hadoop-2.7.3 on single node.Single system is sufficient to run all components of hadoop. But at production level hadoop cluster involves many more machines. Hadoop runs on any linux ditribution.

Install Java:

Hadoop is written in Java, so it requires Java Development Kit(JDK) on your system.
Check java is available on your system by following command:

$ javac
or
$ java -version

If either of this commands gives error then you have to install java first.
Steps to install java :

$ sudo apt-get update
$ sudo apt-get install openjdk-8-jre
$ sudo apt-get install openjdk-8-jdk
$ java -version

Once java is installed set JAVA_HOME
environmental variable in .bashrc file.
Before that how to find where java is installed? use following commands:

$ ls -l /etc/alternatives/javac
lrwxrwxrwx 1 root root 36 Nov 14 23:15 /etc/alternatives/javac -> /usr/lib/jvm/java-8-oracle/bin/javac

/usr/lib/jvm/java-8-oracle this is the location where java is installed.

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$PATH:$JAVA_HOME/bin

Now copy above two lines and paste at end in .bashrc file. Your java path may be different.

Configure ssh

Hadoop requires communication between multiple components present on one or more machines. We need to ensure that user we are using for hadoop can connect to required host without needing password. It can be done by using SSH
If ssh is not available on your system then install using following command:

$ sudo apt-get install ssh

After installation of ssh execute following commands:

$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost

Download hadoop

To dowanload hadoop click here.

Extract and move hadoop-2.7.3 folder to /usr/local/

$ mv hadoop-2.7.3 /usr/local

Configure .bashrc file

export HADOOP_HOME=/usr/local/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Update .bashrc file to apply changes

$ source ~/.bashrc

Configuration

Modify hadoop-env.sh

Open hadoop-env.sh file and find export JAVA_HOME and set JAVA_HOME path.

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Modify core-site.xml

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml
 <configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Modify hdfs-site.xml

$ vim /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml
 <configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Modify mapred-site.xml

  $ vim /usr/local/hadoop-2.7.3/etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Modify yarn-site.xml

 $ vim /usr/local/hadoop-2.7.3/etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Format namenode for first time

$ hadoop namenode -format

Start hadoop

 $ start-all.sh

Check proper installation of hadoop


 $ jps
6482 DataNode
5763 ResourceManager
6694 SecondaryNameNode
6937 NodeManager
7182 Jps
6367 NameNode

If all 5 commands are running properly means your hadoop is working properly. :)

Browse the web interface for the NameNode, ResourceManager


NameNode - http://localhost:50070/

SecondaryNameNode - http://localhost:50090/status.html

DataNode - http://localhost:50070/dfshealth.html#tab-datanode

ResourceManager - http://localhost:8088/

Stop Hadoop:
$ stop-all.sh

Thats all in this article, we learned about running hadoop locally.
Start exploring your interset in hadoop.

Dont forget to like and share this post!
Read More

Saturday, December 3, 2016

Install Devstack on Linux

Posted by Kishore Bhosale

Introduction

DevStack lets you interact with openstack on a small scale that’s representative of a much larger deployment. Devstack helps you deploy the same openstack components found in large multi-server environments on a single server. You can quickly deploy devstack. Insted of deploying Controller and compute node separately on different VMs/system we use sing VMs/system to run openstack, and its nothing but devstack.


In this article we are installing devstack on Ubuntu 16.04.
You can install devstack on any Linux distribution(Ubuntu/CentOS/RHEL)

Install devstack:
Update local packages:

# sudo apt-get -y update

Install git

# sudo apt-get -y install git

Using git, retrieve the latest release of devStack to /opt/devsatck directory:

# sudo git clone https://github.com/openstack-dev/devstack.git /opt/devstack/

DON’T STACK AS ROOT. If you try to stack as the root user with devStack, the process will fail with an error scolding you for running the script as the root user.

Enter Into devstack directory:

# cd /opt/devstack/

Create the stack user and set ownership of all devStack files to that user:

# sudo chmod u+x tools/create-stack-user.sh
# sudo tools/create-stack-user.sh
Creating a group called stack
Creating a user called stack
Giving stack user passwordless sudo privileges

Makes stack user owner of all files in directory:

# sudo chown -R stack:stack /opt/devstack/

Your directory has now been prepared with appropriate permissions, and a new user has been created.

Switches to stack user

# sudo -i -u stack

Go into /devstack directory:

# cd /opt/devstack/

Creating local.conf in /devstack directory. You’ll build your local.conf file, which is used by devStack to configure your deployment

# vim local.conf

# Credentials
ADMIN_PASSWORD=test
MYSQL_PASSWORD=test
RABBIT_PASSWORD=test
SERVICE_PASSWORD=test
SERVICE_TOKEN=token

# Output
LOGFILE=/opt/stack/logs/stack.sh.log
VERBOSE=True
LOG_COLOR=False
SCREEN_LOGDIR=/opt/stack/logs

Execute stack.sh:

# ./stack.sh

Go ahead your devstack is installed successfully. Access the Dash-board by entering the following URL into your browser:

http://<your host_ip>

Enter the following user name and password:

User name : admin/demo
Password : test


 Lets start exploring devstack!

 

Uninstall devstack:

How to remove devstack from system:
Its very simple just enter following command:

# ./unstack.sh
# ./clean.sh
# sudo rm -rf /opt/stack
# sudo reboot

That's all in this article. Don't forget to share your views in comment section below.
Read More

Wednesday, November 30, 2016

basic networking commands in linux

Posted by Kishore Bhosale

Introduction

Linux system is command-based. To enjoy real beauty of Linux, you must aware with commands used in Linux. All commands are located in /bin and /usr/bin. In man command contains documentation of commands. In this article we learn few basic networking commands like host-name, ifconfig, ping and so many. Lets start with commands:
 

HOSTNAME Command
It shows the systems Hostname. Hostname file present at "/etc/hostname" location. We can change host name permanently by editing this file.

# hostname
csfunda.com

IFCONFIG command
ifconfig (interface configurator) displays ip address, netmask, brodcast network interface, hardware address and current network configuration information.

# ifconfig
enp19s0   Link encap:Ethernet  HWaddr a4:ba:db:xx:xx:xx  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2204 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2204 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:183514 (183.5 KB)  TX bytes:183514 (183.5 KB)

wlp18s0b1 Link encap:Ethernet  HWaddr c4:46:19:xx:xx:xx  
          inet addr:192.168.x.x  Bcast:192.168.x.xx  Mask:255.255.255.0
          inet6 addr: fe80::22f3:f1e1:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:18627 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11016 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:20908110 (20.9 MB)  TX bytes:1289043 (1.2 MB)
ifconfig with interface name (enp19s0) will display only specific interface details.
we can enable specific interface using commands "ifup interface_name" or disable specific interface using commands "ifdown interface_name"

# ifup enp19s0

# ifdown enp19s0

PING command
Ping is used to test the network.It send ICMP ECHO_REQUEST to network hosts.

ping using domain name:
# ping csfunda.com
PING csfunda.com (216.239.32.21) 56(84) bytes of data.
64 bytes from any-in-2015.1e100.net (216.239.32.21): icmp_seq=1 ttl=46 time=66.3 ms
64 bytes from any-in-2015.1e100.net (216.239.32.21): icmp_seq=2 ttl=46 time=72.3 ms
64 bytes from any-in-2015.1e100.net (216.239.32.21): icmp_seq=3 ttl=46 time=63.3 ms
^C
--- csfunda.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 63.313/67.306/72.301/3.743 ms

ping using IP address:
# ping 216.239.34.21
PING 216.239.34.21 (216.239.34.21) 56(84) bytes of data.
64 bytes from 216.239.34.21: icmp_seq=1 ttl=46 time=63.3 ms
64 bytes from 216.239.34.21: icmp_seq=2 ttl=46 time=62.7 ms
64 bytes from 216.239.34.21: icmp_seq=3 ttl=46 time=62.4 ms
^C
--- 216.239.34.21 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 62.416/62.833/63.363/0.394 ms

DIG command
dig (domain information groper) is a flexible tool for interrogating DNS name servers. It performs DNS lookups and displays the answers that are returned from the name server(s) that were queried.

dig www.csfunda.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> www.csfunda.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- 06:42:16="" 0="" 127.0.1.1="" 14321="" 16759="" 177="" 1799="" 183="" 1="" 2016="" 216.58.199.179="" 30="" 3="" 4096="" 472="" 4="" 97977="" a="" additional:="" answer:="" answer="" authority:="" authority="" cname="" edns:="" flags:="" ghs.google.com.="" ghs.l.google.com.="" google.com.="" id:="" in="" ist="" msec="" msg="" noerror="" nov="" ns1.google.com.="" ns2.google.com.="" ns3.google.com.="" ns4.google.com.="" ns="" opcode:="" opt="" pre="" pseudosection:="" qr="" query:="" query="" question="" ra="" rcvd:="" rd="" section:="" server:="" size="" status:="" time:="" udp:="" version:="" wed="" when:="" www.csfunda.com.="">

NETSTAT command
This command print network connections, routing tables, interface statistics, masquerade connections, and multi-cast memberships. Useful for finding connection to and from the host.
netstat -t : display only tcp connection
netstat -u : display only udp connection
netstat -g : display all interfaces
netstat -a :display all connection
netstat -nap | grep port : display pid of application using this port
netstat -l : List only listing port

# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 192.168.x.x:xxxxx       ec2-35-xxx-xx-xxx:https ESTABLISHED
tcp        0      0 192.168.x.x:xxxxx       103.243.xxx.xx:https    ESTABLISHED
tcp        0      0 localhost:epmd          localhost:49290         TIME_WAIT  
tcp        0      0 localhost:58746         localhost:epmd          ESTABLISHED
tcp        0      0 localhost:epmd          localhost:58746         ESTABLISHED
tcp        0      0 192.168.x.x:xxxxx       bom05s05-in-f14.1:https ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ]         DGRAM                    21494    /run/user/1000/systemd/notify
unix  3      [ ]         DGRAM                    11869    /run/systemd/notify
unix  7      [ ]         DGRAM                    11880    /run/systemd/journal/socket
unix  2      [ ]         DGRAM                    11888    /run/systemd/journal/syslog

NSLOOKUP command
It discovers Hostname from IP address or discover IP address from hostname.

# nslookup csfunda.com
Server:  127.0.1.1
Address: 127.0.1.1#53

Non-authoritative answer:
Name: csfunda.com
Address: 216.239.38.21
Name: csfunda.com
Address: 216.239.32.21
# nslookup 216.58.220.14
Server:  127.0.1.1
Address: 127.0.1.1#53

Non-authoritative answer:
14.220.58.216.in-addr.arpa name = bom05s05-in-f14.1e100.net.

Authoritative answers can be found from:
220.58.216.in-addr.arpa nameserver = ns4.google.com.
220.58.216.in-addr.arpa nameserver = ns2.google.com.
220.58.216.in-addr.arpa nameserver = ns1.google.com.
220.58.216.in-addr.arpa nameserver = ns3.google.com.

ROUTE command
It shows ip routing table. We can also manipulate this table.

# route 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.2.1     0.0.0.0         UG    600    0        0 wlp18s0b1
link-local      *               255.255.0.0     U     1000   0        0 wlp18s0b1
192.168.2.0     *               255.255.255.0   U     600    0        0 wlp18s0b1
We can add/delete route.
Route add

# route add -net  gw 

Route Delete

# route del -net  gw 

TRACEROUTE command
traceroute : IT tracks the route packets taken from an IP network on their way to a given host. It utilizes the IP protocol's time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.

# traceroute google.com
traceroute to google.com (216.58.220.174), 30 hops max, 60 byte packets
 1  192.168.2.1 (192.168.2.1)  2.113 ms  2.814 ms  3.600 ms
 2  * * *
 3  10.50.50.73 (10.50.50.73)  17.259 ms  18.846 ms  20.375 ms
 4  * * *
 5  * * *
 6  10.240.254.1 (10.240.254.1)  29.267 ms * *
 7  10.241.1.1 (10.241.1.1)  20.147 ms  15.656 ms  14.705 ms
 8  100.100.100.1 (100.100.100.1)  17.392 ms  8.787 ms  11.068 ms
 9  103.243.115.186 (103.243.115.186)  10.239 ms  9.947 ms  8.268 ms
10  * * *
11  * * *
12  * * *
13  bom07s10-in-f14.1e100.net (216.58.220.174)  17.852 ms  19.361 ms  21.089 ms

Start enjoying Linux!
Read More

Monday, November 21, 2016

Linux commands for Hadoop

Posted by Kishore Bhosale
The file is nothing but storage of information. On Linux or Unix system everything is considered as File. So system contains a large number of files. All details of files are stored in a separate area of hard disk which is accessible only to the kernel.

The file is mainly divided into 3 categories:
  1. Ordinary File : Regular files contain only data as a stream of character. e.g. Text files, Binary files.
  2.  Directory File : Contains file and other directory. (It contains file name and corresponding inode number)
  3.  Device File : All devices and peripherals are represented by files. Generally, files found under /dev directory are device files.
/dev - all device file content.
/etc - all configuration files of the system.
/bin & /usr/bin - all commonly used linux commands.
/sbin & /usr/sbin - System administrator can execute these commands.
/usr/include - contains header files used by compilers.
/lib & /usr/lib - contain all library files in binary form.
/tmp - user can create temporary files.
/var - contains variable part of the file system.   
/home - home directory.
Home directory
kb@kb:$ echo $HOME
/home/kb
kb@kb: ~$

Move from any directory to home directory: $ cd ~
Move from any directory to root directory :  $ cd /
Checking your current directory:(pwd)
kb@kb: ~/Desktop$ pwd
/home/kb/Desktop
kb@kb: ~/Desktop$
Changing your current directory (cd):
kb@kb: ~/Desktop$ pwd
/home/kb/Desktop
kb@kb: ~/Desktop$ cd Image/
kb@kb: ~/Desktop/Image$ pwd
/home/kb/Desktop/Image/

Current directory: "Desktop/" Change directory to "/Image":
$ cd Image/ Now current directory: Image/
Making directory (mkdir):
Create a new directory inside Image directory: 
$ mkdir Data

kb@kb: ~/Desktop/Image$ mkdir Data
kb@kb: ~/Desktop/Image/Data

Create number of directories in another directory using command :
$ mkdir -P Data/Hadoop/LinuxCommands
Removing directory/file (rm):
Remove the directory Data from Image directory : 
$ rmdir Data

kb@kb: ~/Desktop/Image$ mkdir Data
kb@kb: ~/Desktop/Image$ ls
Actor Cute Friends    Nature Data God 
kb@kb: ~/Desktop/Image$ rmdir Data
kb@kb: ~/Desktop/Image$ ls
Actor Cute Friends    Nature God

Removing directories/file from another directory using command :
$ rm -r Data/Hadoop/LinuxCommands   (r - recursively)
$ rm * (Delete all files)
$ rm file1 file2 file3 (Deleting file1 file2 file3 )
Absolute pathnames:
Complete path name is mentioned in terminal. 
Run Linux command directly by mentioning the path: e.g. /bin/date
Relative pathnames:
Mention path from the current directory if the file is present in that directory. 
. (single dot) - This represents the current directory
.. (two dots) - This represents the parent directory
e.g. $ pwd
/home/kb/Desktop/Image/
$ cd .. (change directory to parent of the current directory)
$ pwd
/home/kb/Desktop
$ cd ../.. (change directory to parent of parent of current directory)
/home
Listing directory content (ls):
ls       - list of all filename in the current directory .
ls -x   - Multi columnar output
ls -a   - Show all filename beginning with a dot including . and .. (Header File)
ls -R  - Recursive list
ls - r  - Sort filename in reverse order 
ls -l   - all details about files and directories  
           If starting with "-" Its a file.
           If starting with "d" Its a directory
ls -t    - Sort filenames by last modification time.
Copying a File (cp):
cp command copies a file or a group of files.
$ cp  
$ cp Demo/* /Demo/HDFS (Copy all files from Demo to HDFS dir) 
Renaming Files (mv) :
It rename a file/dir or moves a group of file to different dir.
$ mv file1 file2
Creating and Displaying file:
$ cat > filename (create a file)
$ cat >> filename (create file if not exist and add content,
                if file exist then append content) 
$ cat filename (show content of file)
To create file generally used techniques:
          1) Vi editor
          2) touch command
          3) gedit editor 
Creating and Extracting Jar files:
$ jar -cvf "filename"(.jar) "src"(.java)  //(cvf - create verbos file)
$ jar -cvf wc WordCount.java

Extract jar:
$ jar -xvf filename.jar 
Change File Permission (chmod):
$ chmod 755 file.txt

7 r w x (root user)
5 r - x (group user)
5 r - x (other user)
There are plenty of commands in linux, but we require only few of them listed above enough to work with hadoop. After practicing above commands, you have enough knowledge of linux command to start working on hadoop.
Read More

Sunday, November 20, 2016

How to clear cache, buffer in linux (Ubuntu 16.04)

Posted by Kishore Bhosale
For running any application on any operating system requires memory. Every operating system has its own technique for memory management. But still many times our system faced low memory issue and because of that system performance is degrade. This is because cache memory(RAM) is full. Cache memory is used to store data which is frequently used by operating system. To clear cache most common way is Reboot the system but this is not a good solution. In linux we can clear cache or buffer manually.


Lets first check how much memory is used by our system using free -m command:
# free -m
----------------------------------------------------------------------------------
              total        used        free      shared  buff/cache   available
Mem:          2873        2240          81         221         550         210
Swap:          1905         178        1727
Check in details memory usage using meminfo command:
# cat /proc/meminfo
---------------------------------------------------------------------------------- 
MemTotal:        2942168 kB
MemFree:          274208 kB
MemAvailable:     508524 kB
Buffers:           45664 kB
Cached:           535960 kB
SwapCached:         6044 kB
Active:          1794220 kB
Inactive:         692760 kB
Active(anon):    1651944 kB
Inactive(anon):   492360 kB
Active(file):     142276 kB
Inactive(file):   200400 kB
Unevictable:       10348 kB
Mlocked:           10348 kB
SwapTotal:       1951740 kB
SwapFree:        1769120 kB
Dirty:               248 kB
....
There are so many entries.
There are three drop_cache options to clear cache memory:
1) free PageCache only
sync; echo 1 > /proc/sys/vm/drop_caches
2) free Dentries and Inodes
sync; echo 2 > /proc/sys/vm/drop_caches
3) cree PageCache, Dentries and Inodes
sync; echo 3 > /proc/sys/vm/drop_caches
sync will flush the file system buffer.drop_caches delete the cached objects. If you use this command while running application it may cause performance issue.Its bad idea to run drop_cache command when your server is under heavy usage.
now check memeory usage:
free -m
----------------------------------------------------------------------------------
              total        used        free      shared  buff/cache   available
Mem:           2873        1917        590         193      365         578
Swap:          1905         178        1727
Thats all this is the simple way to clear cache.
Read More
Powered by Blogger.