Big Data & Hadoop Course Content
Course Duration 40 hours(20 Days)
Daily Live Sessions for 2 hrs
What you will Learn
Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source.
Module 1 – INTRODUCTION TO BIG DATA
➢ What is Big Data?
➢ Examples of Big Data
➢ Reasons of Big Data Generation
➢ Why Big Data deserves your attention
➢ Use cases of Big Data
➢ Different options of analyzing Big Data
Module 2 – INTRODUCTION TO HADOOP
➢ What is Hadoop,
➢ History of Hadoop
➢ How Hadoop name was given
➢ Problems with Traditional Large-Scale Systems and Need for Hadoop
➢ Where Hadoop is being used
➢ Understanding distributed systems and Hadoop
➢ RDBMS and Hadoop
MODULE 3- STARTING HADOOP
➢ Hadoop Architecture
- Apache Hadoop Installation
- Standalone Mode
- Pseudo Distributed Mode
- Fully Distributed Mode
- Cloudera Installation
➢ Features of Hadoop
➢ Hadoop Components- HDFS, Map Reduce
➢ Anatomy of File write / read
➢ Introduce other components of Hadoop ecosystem
MODULE 4- HDFS
➢ HDFS Commands.
➢ Single node hadoop cluster
➢ Understanding hadoop configuration files
➢ Overview Of Hadoop Distributed File System
- Name nodes
- Data nodes
- The Command-Line Interface
➢ The building blocks of Hadoop.
➢ Running HDFS Commands
➢ Web-based cluster UI-Name Node UI, Map Reduce UI
MODULE 5- UNDERSTANDING MAP REDUCE
➢ How Map Reduce Works
- Data flow in MapReduce
- Map operation
- Reduce operation
➢ Input and Output Formats
➢ Partitions
➢ Combiners
➢ Schedulers
➢ MapReduce Program In JAVA using Eclipse
➢ Counting words with Hadoop—Running program
➢ Writing MapReduce Drivers, Mappers and Reducers in Java
➢ Real-world “MapReduce” problems
Writing a MapReduce Program and Running a MapReduce Job
➢ Java WordCount Code Walkthrough
MODULE 6- HADOOP ECOSYSTEM
➢ Hive
➢ Sqoop
➢ Pig
MODULE 7- EXTENDED SUBJECTS ON HIVE
➢ Installing Hive
➢ Introduction to Apache Hive
➢ Getting data into Hive
➢ Hive’s architecture
➢ Hive-HQL
➢ Query execution
➢ Programming Practices and projects in Hive
➢ Troubleshooting
➢ Hive Programming
MODULE 8- EXTENDED SUBJECTS ON PIG
➢ Introduction to Apache Pig
➢ Install Pig
➢ Pig architecture
➢ Pig Latin – Reading and writing data using Pig
➢ Programming with pig, Load data, execute data processing statements