DURGASOFT: HADOOP on 20th Sep 11:00AM at Marthahalli (Bangalore) by Mr. Anil Kumar

Sunday, September 21, 2014

HADOOP on 20th Sep 11:00AM at Marthahalli (Bangalore) by Mr. Anil Kumar

Syllabus:

HADOOP DEVOLOPMENT

• Introduction

What is Cloud Computing

What is Grid Computing

What is Virtualization

How above three are inter-related to each other

What is Big Data

Introduction to Analytics and the need for big data analytics

Hadoop Solutions - Big Picture

Hadoop distributions

Comparing Hadoop Vs. Traditional systems

Volunteer Computing

Data Retrieval - Radom Access Vs. Sequential Access

NoSQL Databases

• The Motivation for Hadoop

Problems with traditional large-scale systems

Requirements for a new approach

• Hadoop: Basic Concepts

What is Hadoop?

The Hadoop Distributed File System

How MapReduce Works

Anatomy of a Hadoop Cluster

• Hadoop demons

Namenode

Datanode

Secondary namenode

Job tracker

Task tracker

• HDFS at detail

Blocks and Splits

Replication

Data high availability

Data Integrity

Cluster architecture and block placement

• Programming Practices & Performance Tuning

Developing MapReduce Programs in

Local Mode

Pseudo-distributed Mode

Fully distributed mode

• Writing a MapReduce Program

Examining a Sample MapReduce Program

Basic API Concepts

The Driver Code

The Mapper

The Reducer

Hadoop's Streaming API

• Setup Hadoop cluster of Apache, Cloudera and HortonWorks

Install and configure Apache Hadoop

Make a fully distributed Hadoop cluster on a single laptop/desktop

Install and configure Cloudera Hadoop distribution in fully distributed mode

Install and configure HortonWorks Hadoop distribution in fully distributed mode

Monitoring the cluster

Getting used to management console of Cloudera and Horton Works

• Delving Deeper Into the Hadoop API

Using Combiners

The configure and close Methods

SequenceFiles

Partitioners

Counters

Directly Accessing HDFS

ToolRunner

Using The Distributed Cache

• Common MapReduce Algorithms

Sorting and Searching

Indexing

Classification/Machine Learning

Term Frequency - Inverse Document Frequency

Word Co-Occurrence

Hands-On Exercise: Creating an Inverted Index

• Debugging MapReduce Programs

Testing with MRUnit

Logging

Other Debugging Strategies

• Advanced MapReduce Programming

A Recap of the MapReduce Flow

Custom Writables and WritableComparables

The Secondary Sort

Creating InputFormats and OutputFormats

Pipelining Jobs With Oozie

Map-Side Joins

Reduce-Side Joins

• Monitoring and debugging on a Production Cluster

Counters

Skipping Bad Records

Rerunning failed tasks with Isolation Runner

• Tuning for Performance

Reducing network traffic with combiner

Reducing the amount of input data

Using Compression

Running with speculative execution

Refactoring code and rewriting algorithms Parameters affecting Performance

Other Performance Aspects

Hadoop Ecosystem covered as part of Hadoop Developer
• Eco system component: Hive

Hive concepts

Install and configure hive on cluster

Create database, access it console

Develop and run sample applications in Java/Python to access hive

• Eco system component: Sqoop

Install and configure sqoop on cluster

Import data from Oracle/Mysql to hive

• Eco system component: PIG

Install and configure PIG

Write sample Pig Latin scripts

• Eco system component: HBase

Hbase concepts

Install and configure hbase on cluster

Create database, Develop and run sample applications

• Eco system component:Cassandra

Cassandra concepts

Install and configure Cassandra

Create database, access it console

Develop and run sample applications in Java/Python to access Cassandra data

• Eco system component:Oozie

Oozie concepts

Install and configure Oozie on cluster

Create a sample workflows and run them on cluster

• Overview of other Eco system component:
o Avro, Thrift, Rest, Mahout, Flume, Chukwa, YARN, MR2 etc.

• Analytics Basics

Analytics and big data analytics

Commonly used analytics algorithms

Analytics tools like R and Weka

Mahout

SUBSCRIBE VIA EMAIL TO GET NEW BATCH UPDATES

Sunday, September 21, 2014

HADOOP on 20th Sep 11:00AM at Marthahalli (Bangalore) by Mr. Anil Kumar

HADOOP DEVOLOPMENT

Hadoop Ecosystem covered as part of Hadoop Developer

No comments:

Post a Comment

Blog Archive