Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Module 4 - Data Analysis using Hadoop

This module introduces Big Data, its platforms, and its analysis. Hadoop is the most popular approach to Big Data, for storing and computing statistics for massive data sets. The module describes Big Data, Hadoop, Hadoop's ecosystems, and Hive. In the labs, Hive is presented as the main context to show how massive data sets can be analyzed. At the end of this module, students should know how to use Hive for Big Data Analysis on Azure HDInsight.

Lesson Title Lab Objectives
1 Big Data and Hadoop Lab Explain Big Data. Understand why Hadoop is used. Describe the core concepts of Hadoop. Recognize the features of the Hadoop ecosystem.
2 Getting Started with Hadoop Cluster on Azure Explain the purpose of a Hadoop cluster. Describe YARN Architecture and HDFS. Use MapReduce to run a job. Understand the function of HDInsight.
3 Getting Started with HDInsight Lab Understand features of HDFS. Understand processing Big Data with HDInsight.
4 Hive Overview Understand what Hive is. Describe Hive architecture and components. Explain the Hive data model and how data is stored.
5 Hive Data Types and File Formats Describe Hive data types and how they are used. Explain supported Hive file formats.
6 Hive Databases and Tables Lab How Hive stores and manages databases. Basic commands to create and manipulate databases.
7 Working with Hive Tables Lab Create and add data to managed tables. Create external tables and point data to them. Create and work with partitioned tables. Export data out of tables.
8 Hive CLI and Hive Query Language Basics Use the Hive CLI and Hive shell. Work with the SELECT statement.
9 Hive Query Language In-Depth Lab Use the SELECT with WHERE statement. Understand different data types used in HiveQL. Recognize floating point comparisons. Use JOIN statements.
10 Hive Extensions Use Hive set variables to pass parameters to a Hive query. Create external scripts/programs using TRANSFORM….USING. Understand how user-defined functions (UDFs are created and used in Hive. Access open source UDFs.
11 Hive Text Processing Use String functions for: manipulating strings, formatting strings, search and substitute, parsing URLs, regular expressions, data mining, and table generating.