fbpx

Data Science & Big Data Analytics with R Tableau

Professional training program designed to empower learners to understand and apply core data science concepts across the Data Lifecycle.

Join Now

In-Class

Course Includes

  • Data Storage with Big Data & Hadoop
  • Data Integration with Apache Spark
  • Data Enhancement & Analysis using R
  • Data Visualization using Tableau

Eligibility

.Ideal for non-tech graduates and professionals to create an entry into analytics either in your own industry or elsewhere.

Available Modules

Available in-class on weekdays & weekends. Session also available online live training.

Course pre-requisites

Anyone with good knowledge of high school level mathematics and statistics. Preferred to have knowledge of SQL

Course Duration

114 hours of classroom session

Data Science & Big Data Analytics with R Tableau

• Data Storage with Big Data & Hadoop
• Data Integration with Apache Spark
• Data Enhancement & Analysis using R
• Data Visualization using Tableau

Course is a mix of science, information technology and industry domain information.
This course enables all participating learners to start their careers into Data Science. Ideally designed to suit learners to become researchers, analysts or visualizers.
This is a project and hands-on intensive program, all students will run through practical development experience using an integrated data set.
Best fit program for tech & non-tech graduates and working professionals to create a new career in your own industry as data analyst.

Module 1: Big Data with Hadoop & Spark

• Big Data/Hadoop
• Big Data and its Sources
• RDBMS vs. Hadoop
• Hadoop Architecture and Ecosystem
• When to Use and Not use Hadoop
• HDFS Characteristics and Definitions
• HDFS Design and Architecture Overview
• Accessing HDFS
• HDFS Commands
• Basic File System Operations
• HDFS Administration Commands
• HDFS Features and Benefits

Introduction to Map Reduce

• Map Reduce Architecture
• Map Reduce Phases
• Map Reduce Framework
• Parallel Processing with Map Reduce
• Map Reduce Jobs
• Map Reduce I/O Formats MAPREDUCE FRAME WORK
• Map Reduce Mechanics

Introduction to Spark

• Apache Spark
• Spark: Components for Distributed Execution
• Resilient Distributed Dataset(RDD)
• RDD operations
• Characteristics of RDD
• Directed Acyclic Graph Execution Engine
• Spark Shells
• Spark Configuration
• Cluster Modes
• Spark Architecture
• Spark – Advantages
• Compare Hadoop Ecosystem and Apache Spark

Module 2: R Programming

• Introduction to R
• Math, Variables, and Strings
• Vectors and Factors
• Vector operations

Data structures in R

• Arrays & Matrices
• Lists
• Data frames

R programming fundamentals

• Conditions and loops
• Functions in R
• Objects and Classes
• Debugging

Working with data in R

• Reading CSV and Excel Files
• Reading text files
• Writing and saving data objects to file in R

Strings and Dates in R

• String operations in R
• Regular Expressions
• Dates in R

Module 3: Machine Learning with R

Machine learning vs. Statistical modeling & supervised vs. Unsupervised Learning
• Machine Learning Languages, Types, and Examples
• Machine Learning vs Statistical Modelling
• Supervised vs Unsupervised Learning
• Supervised Learning Classification
• Unsupervised Learning
Supervised Learning Understanding nearest neighbour classification

The KNN algorithm

• Measuring similarity with distance
• Choosing Appropriate K
• Use Case
• Classification Using Naïve Bayes
• Classification using Decision Trees
• The C5.0 decision tree algorithm
• Understanding Classification Rules
• Understanding Regression
• Support Vector Machines
• Neural Networks
• Black Box Methods

Unsupervised Learning

• Association Rules – Pattern detection
• K-Means Clustering plus Advantages & Disadvantages
• Hierarchical Clustering plus Advantages & Disadvantages
• Measuring the Distances Between Clusters – Single Linkage Clustering
• Measuring the Distances Between Clusters – Algorithms for Hierarchy Clustering
• Density-Based Clustering
• Evaluating Model Performance
• Improving Model Performance

Module 4: Tableau

Introduction to Tableau Desktop

Connecting to Data

Customizing a Data Source

• Filtering Your Data
• Sorting Your Data
• Creating Groups in Your Data
• Creating Hierarchies in Your Data
• Working with Date Fields: Discrete and Continuous Time
• Working with Date Fields: Custom Dates
• Working with Multiple Measures: Dual Axis and Combo Charts
• Working with Multiple Measures: Combined Axis Charts
• Showing Relationships between Numerical Values
• Mapping Data Geographically
• Using Crosstabs: Totals and Aggregation

Using Crosstabs: Highlight Tables

• Using Crosstabs: Heat Maps
• Using Calculations: Customize Your Data
• Using Calculations: Working with Strings, Dates, and Type Conversion Functions
• Using Calculations: Working with Aggregations
• Using Quick Table Calculations to Analyze Data
• Showing Breakdowns of the Whole
• Highlighting Data with Reference Lines
• Create a Dashboard: Combining Your Views
• Create a Dashboard: Add Actions for Interactivity
• Sharing Your Work

Working with a Data Extract

• Joining Tables
• Blending Multiple Data Sources
• Blending Data without a Common Field
• Using Split and Custom Split
• Advanced Calculations: Aggregating Dimensions
• Controlling Table Calculations
• Showing the Biggest and Smallest Values
• Using Level of Detail Expressions
• Filtering and LOD Expressions
• Using Parameters to Control Data in the View
• Parameters: Swap Measures
• Using Sets to Highlight Data

Advanced Mapping: Modifying Locations

• Advanced Mapping: Customizing Tableau’s Geocoding
• Advanced Mapping: Using a Background Image
• Viewing Distributions
• Comparing Measures Against a Goal
• Showing Statistics and Forecasting
• Telling Stories with Data

Join Now

In-Class

Course Includes

  • Data Storage with Big Data & Hadoop
  • Data Integration with Apache Spark
  • Data Enhancement & Analysis using R
  • Data Visualization using Tableau

Enquire Now

Enquire Now