Big Data Trainings

Big Data has arrived!! If you are an IT professional who wish to change your career path to Big Data and become a Big Data Expert in a month, you have come to the right place.
We provide personalized Hadoop Training with hands on real life use-cases. Our mission is to ensure that you are a Big Data Expert within a month.

MyBigDataCoach provides expert professional coaching on Big Data Technologies, Data Science and Analytics.
Trainings are conducted by senior experts in Big Data Technologies.

We also provide Corporates Training on Big Data Technologies in India.

You can register for our online trainings on various Big Data Technologies.

Happy learnings!

For Further queries contact us at

Facebook :

Pre requisites to attend

Basic knowledge of Basic Linux commands, Basic Core Java and writing SQL queries

Course Contents:

Day 1
 Introduction to Big Data and Hadoop? (Common)
 Technology Landscape
 Why Big Data?
 Difference between Big Data and Traditional BI?
 Fundamentals about High Scalability.
 Distributed Systems and Challenges
 Key Fundamentals for Big Data
 Big Data Use Cases
 End to End production use case deployed for Hadoop
 When to use Hadoop and When not to?
Day 2
 HDFS Fundamentals
 Fundamentals behind HDFS Design
 Key Characteristics of HDFS
 HDFS Daemons
 HDFS Commands
 Anatomy of File Read and Write in HDFS
 HDFS File System Metadata
 How replication happens in Hadoop
 How is replication strategy defined and how network topology can be defined?
 Limitations of HDFS
 When to use HDFS and when not to?
Day 3
 Map Reduce Fundamentals
 What is Map-Reduce
 Examples of Map-Reduce Programs
 How to think in Map-Reduce
 What is feasible in Map-Reduce and What is not?
 End to End flow of Map-Reduce in Hadoop
Day 4
 Architecture Difference between MRV1 and YARN
 Introduction to Resource Manager
 Node Manager Responsibility
 Application Manager
 Proxy Server
 Job History Server
 Running map-reduce programs in YARN
Day 5 and Day 6
 Hadoop Administration Part 1
 Hadoop Installation and Configuration
 YARN Installation and Configuration
 Hadoop Name Node
 HDFS Name Node Metadata Structure
 FSImage and Edit Logs
 Viewing Name Node Metadata and Edit Logs
 HDFS Name Node Federation
 Federation and Block Pool ID
 Tracing HDFS Blocks
 Name Node Sizing
 Memory calculations for HDFS Metadata
 Selecting the optimal Block Size
 Secondary Name Node
 Checkpoint process in details
 Hadoop Map-Reduce
 Tracing a Map-Reduce Execution from Admin View
 Logs and History Viewer
Day 7
 Hadoop Administration Part 2
 Hadoop Configurations
 High Availability of Name Node
 Configuring Hadoop Security
 NameNode Safemode and what are the conditions for namenode to be in Safemode?
 Name Node High Availability
 Distcp commands in Hadoop
 File Formats in Hadoop (RC, ORC, Sequence File, AVRO etc)
Day 8
 Hadoop Ecosystem Components
 Role of each ecosystem components
 How does it all fit together

 Hive
 Introduction
 Concepts on Meta-store
 Installation
 Configuration
 Basics of Hive
 What Hive cannot do?
 When to not use HIVE
Day 9
 Introduction
 Installation and Configuration
 Basics of PIG
 Hands on Example
Day 10
 Oozie
 Introduction
 Installation and Configurations
 Running workflows in Oozie with HIVE, Map-Reduce, PIG, Sqoop
Day 11
 Flume
 Introduction
 Installation and Configurations
 Running flume examples with HIVE , Hbase etc
Day 12
 Introduction
 HUE Installation and Configuration
 Using HUE
 Zookeeper
 Introduction
 Installation and Configurations
 Examples in Zookeeper
 Sqoop
 Introduction to Sqoop
 Installation and Configuration
 Examples for Sqoop
Day 13
 Monitoring
 Monitoring Hadoop process
 Hadoop Schedulers
 FIFO Scheduler
 Capacity Scheduler
 Fair Scheduler
 Difference between Fair and Capacity Schedulers
 Hands on with Scheduler Configuration
 Cluster Planning and Sizing
 Hardware Selection Consideration
 Sizing
 Operating Systems Consideration
 Kernel Tuning
 Network Topology Design
 Hadoop Filesystem Quota
 Hands on with Few of Hadoop Tuning configurations
 Hands on Sizing a 100 TB Cluster
Day 14
 Hadoop Maintenance
 Logging and Audit Trails
 File system Maintenance
 Backup and Restore
 DistCp
 Balancing
 Failure Handling
 Map-Reduce System Maintenance
 Upgrades
 Performance Benchmarking and Test
 Hadoop Cluster Monitoring
 Installation of Nagios and Ganglia
 Configuring Nagios and Ganglia
 Collecting Hadoop Metrics
 REST interface for metrics collection
 JMX JSON Servlet
 Cluster Health Monitoring
 Configuring Alerts for Clusters
 Overall Cluster Health Monitoring
 Introduction to Cloudera Manager
Day 15
 Advanced Developer for Hadoop
 Java API for HDFS Interactions
 File Read and Write to HDFS
 WebHDFS API and interacting with Hadoop using WebHDFS
 Different protocols used for interacting with HDFS
 Hadoop RPC and security around RPC
 Communication between Client and Data Node
 Hands on Examples with different file format write in HDFS
Day 16
 Hadoop Map-Reduce API
 InputFormat and Record Readers
 Splittable and Non Splittable Files
 Mappers
 Combiners
 Patitioners
 Sorters
 Reducers
 OutputFormats and Record Writers
 Implementing custom Input Formats for PST and PDF
 MapReduce Execution Framework
 Counters
 Inside MapReduce Daemons
 Failure Handling
 Speculative Execution
Day 17
 Sqoop
 Difference between Sqoop and Sqoop2
 What are the various parameters in Export
 What the various parameters in Import
 Typical challenges with Sqoop operations
 How to tune Sqoop performance
Day 18 , 19
 MapReduce Examples and design patterns
 HIVE SerDe
 Will be buffers for any spill over sessions!!
Day 20: Hadoop Design and Architecture
 Security
 Security Design for HDFS
 Kerberos Fundamentals
 Setting up KDC
 Configuring Secured Hadoop Cluster
 Setting up Multi-realm authentication for Production Deployment
 Typical product deployment challenges with respect to Hadoop Security
 Role of HttpFS proxy for corporate firewalls
 Role of Cloudera Sentry and Knox
 Common Failures and Problems
 File system related issues
 Map-Reduce related issues
 Maintenance related issues
 Monitoring related issues
Day 21
 Writing custom UDF
 SerDe and role of SerDe
 Writing SerDe
 Advanced Analytical Functions
 Real Time Query
 Difference Stinger and Impala?
 Key Emerging Trends
 Implementing Updates and Deletes in HIVE
Day 22
 Architecture for PIG
 Advanced PIG Join Types
 Advanced PIG Latin Commands
 PIG Macros and their Limitations
 Typical Issues with PIG
 When to use PIG and When not to?
Day 23
 Oozie
 Architecture and Fundamentals
 Installing and Configuring Oozie
 Oozie Workflows
 Coordinator Jobs
 Bundle Jobs
 Difference patterns in Oozie Scheduling
 How to troubleshoot in Oozie
 How to handle different libraries in Oozie
 Hands on example with Oozie
 Architecture and Fundamentals
 Installing and Configuring HUE
 Executing PIG, HIVE, Map-Reduce through HUE using Oozie
 Various features of HUE
 Integration of HUE users with Enterprise Identity Management systems
Day 24
 Flume
 Flume Architecture
 Complex and Multiplexing Flows in Flume
 Configuring and running flume agents for the various supported sources (NetCat, JMS, Exec, Thrift, AVRO)
 Configuring and running flume agents with various supported sinks (HDFS, Logger, AVRO, Hbase, FileRoll, ElasticSearch etc)
 Understanding Batch load to HDFS
 Example with Flume in real project scenarios for
 Log Analytics
 Machine data collection with SNMP sources
 Social Media Analytics
 Typical challenges with Flume operations
 Integration with HIVE and Hbase
 Implementing Custom Flume Sources and Sinks
 Flume Security with Kerberos
Day 25
 Zookeeper
 Architecture
 High Scalability with Zookeeper
 Common Recipes with Zookeeper
 Leader Election
 Distributed Transaction Management
 Node Failure Detections and Cluster Membership management
 Co-ordination Services
 Cluster Deployment recipe with Zookeeper
 Typical challenges with Zookeeper operations
 YARN Architecture and Advanced Concepts
Day 26
 End to End POC Design
 Live Example of end to end POC which has all ecosystem components


About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Engineer. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I like travelling, long drives and very much addicted to music.

One Response to Big Data Trainings

  1. Michael Ryan says:

    Incredible content… have clarified it unmistakably, yes every understudy needs reinforce for their work advancement and they ought to need to take after a couple tips for their future.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: