Overview
The course provides students with knowledge and skill in the concepts of the Hadoop framework. Students will learn how the components of the Hadoop ecosystem, such as Hadoop, Yarn, MapReduce, HDFS, Pig, Impala, HBase, Flume, Apache Spark, etc. fit in with the Big Data processing lifecycle.
Course Objectives
After completing this course, students will be able to:
- Understand the different components of the Hadoop ecosystem such as Hadoop, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark with this Hadoop course.
- Understand Hadoop Distributed File System (HDFS) and YARN architecture, and learn how to work with them for storage and resource management
- Understand MapReduce and its characteristics and assimilate advanced MapReduce concepts
- Ingest data using Sqoop and Flume
- Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
- Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
- Understand and work with HBase, its architecture and data storage, and learn the difference between HBase and RDBMS
- Do functional programming in Spark, and implement and build Spark applications
- Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques