Hadoop: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
 
Line 25: Line 25:
= Books =
= Books =
Hadoop: The Definitive Guide
Hadoop: The Definitive Guide
= Distributed File System (DFS) =
* For very large files: TBs, PBs
* Each file is partitioned into chunks, typically 64MB
* Each chunk is replicated several times (>=3), on different racks, for fault tolerance
* Implementations: Google's DFS (GFS, proprietary) and Hadoop's DFS (HDFS, open source)

Latest revision as of 09:56, 16 July 2014

Installation

Download the tarball (hadoop-2.4.1.tar.gz) from Hadoop website.

tar xzvf hadoop-2.4.1.tar.gz

Make sure JAVA_HOME is set to a Java installation. If it is not available, we can include it by editing etc/hadoop/hadoop-env.sh and specify the JAVA_HOME variable. For example

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

It is convenient to put the Hadoop binary directory on your command-line path. For example, I append the following 2 lines to ~/.bashrc file.

export HADOOP_INSTALL=/home/brb/hadoop-2.4.1
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

Check that Hadoop runs by typing

hadoop version

Public datasets

Books

Hadoop: The Definitive Guide

Distributed File System (DFS)

  • For very large files: TBs, PBs
  • Each file is partitioned into chunks, typically 64MB
  • Each chunk is replicated several times (>=3), on different racks, for fault tolerance
  • Implementations: Google's DFS (GFS, proprietary) and Hadoop's DFS (HDFS, open source)