HADOOP DEVOLOPMENT • Hadoop and HDFS architecture Hadoop Architecture and Eco System Understanding of Distribution system & parallel computing. HDFS daemons : Namenode, Secondary Namenode, and Datanode MapReduce daemons : JobTracker and TaskTracker Block Replacement,Data Integrity, Re-balancer HDFS user/admin commands. Anatomy of a Hadoop Cluster • Setting up Hadoop cluster Install and configure Apache Hadoop Make a Pseudo distributed Hadoop cluster on a single laptop/desktop Monitoring the cluster using UI • MapReduce Programming MapReduce framework and architecture Hadoop Data Types Developing MapReduce Programs in ♠ Local Mode ♠ Pseudo-distributed Mode ♠ Fully distributed mode Writing MapReduce Programs Examining MapReduce Programming ♠ ToolRunner ♠ Basic API Concepts (Driver code, Mapper, Reducer) • Delving Deeper Into the Hadoop API The configure and close Methods Input and Output Formatters ♠ Text Format ♠ KeyValue Format ♠ Nline Format ♠ SequenceFile Format Partitioners • Tuning for Performance Reducing network traffic with combiner Reducing the amount of input data Running with speculative execution • Advanced MapReduce Programming A Recap of the MapReduce Flow Custom Writables and WritableComparables Map-Side Joins Reduce-Side Joins Using The Distributed Cache • Monitoring and debugging on a Production Cluster Counters Skipping Bad Records Rerunning failed tasks with Isolation Runner Schedulers(FIFO, Capacity and Fair) • YARN Introduction & Architecture • Pig - ETL Introduction, Pig Vs Hive, Pig Vs MapReduce and SQL Pig's Data Model Pig Architecture • Hive – Dataware housing platform Architecture of Hive Hive Services, Clients, Meta-store Hive Data Model and File Formats Hive Query Language DDL in Hive Joins, Unions, Indexing, Views • Hbase – NOSQL Database Hbase Overview & Architecture Hbase Installation Usage Scenario of Hbase, CRUD HBase DataModel ♠ Table and Row ♠ Column Family & Column Qualifier ♠ Cell and its Versioning ♠ Regions and Region Server • SQOOP Overview on Sqoop import/export Install and configure Sqoop on cluster MySQL Installation and connection Sqoop commands Various Options to Import Data ♠ Table Imports ♠ Filtering Imports ♠ Hive Imports • Flume Introduction and Architecture Install and configure Flume Flume Components Flume Events Hands-on Exercise Gathering Twitter data using Flume Pig Latin, Transformations Installing and Running Pig in Local & Distributed modes Advanced Pig concepts, Debugging Hands-on Exercise Statistics & Archiving with Hive Hive Partitions, Buckets Hive UDF,UDAF,UDTF Hive SerDe properties Hive Optimizations and best practices Hands-on Exercise Hbase operations (Get/Scan, Put, Delete..) Hbase Admin - Create database, Develop and run sample applications Hbase Clients ♠ Thrift ♠ Java API ♠ REST MapReduce & Hive Integration with Hbase
|
No comments:
Post a Comment