HADOOP

Hadoop Training – Course Content

Overview:

Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.

Training Objectives of Hadoop:

Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.

Target Students / Prerequisites:

Students must be belonging to IT Background and familiar with Concepts in Java and Linux.

Introduction , The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands on Exercise
  • How MapReduce Works
  • Hands on Exercies
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

Writing a MapReduce Program

  • Examining a Sample MapReduce Program
  • With several examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API

Delving Deeper Into The Hadoop API

  • More About ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With Combiners
  • The configure and close methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache
  • Hands-On Exercise

Performing several Hadoop jobs

  • overview of cell data access (CDA) reports
  • building a CDA report
  • using CDA functions

Financial Intelligence in SAS Financial Management

  • understanding account behaviors
  • cumulative translation adjustment (CTA) and retained earnings

The Forms Workspace and Planning Process

  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Processing video files and audio files
  • Processing image files
  • Processing XML files
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency – Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise: Creating an Inverted Index
  • Identity Mapper
  • Identity Reducer
  • Exploring well known problems using MapReduce applications

Usining HBase

  • What is HBase?
  • HBase API
  • Managing large data sets with HBase
  • Using HBase in Hadoop applications
  • Hands-on Exercise

Quick Enquiry

New Batches

  Salesforce

2/27/2018 8:30 PM EST