Data Mining - TRAINING

Course Objectives

The main parts of the course include exploratory data analysis, frequent pattern mining, clustering, and classification. The course lays the basic foundations of these tasks, and it also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. It integrates concepts from related disciplines such as machine learning and statistics and is also ideal for a course on data analysis. Most of the prerequisite material is covered in the text, especially on linear algebra, and probability and statistics.
The course includes many examples to illustrate the main technical concepts. It also has end-of-chapter exercises, which have been used in class.All of the algorithms in the course have been implemented by the authors.We suggest that readers use their favorite data analysis and mining software to work through our examples and to implement the algorithms we describe in text; we recommend the R software or the Python language with its NumPy package.
Having understood the basic principles and algorithms in data mining and data analysis, readers will be well equipped to develop their own methods or use more advanced techniques.

Course Summary and Details

Data Mining The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics.
This course id for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the course include exploratory data analysis, pattern mining, clustering, and classification. The course lays the basic foundations of these tasks and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this course offers solid guidance in data mining for participants, researchers, and practitioners alike.
Key Features:
• Covers both core methods and cutting-edge research
• Algorithmic approach with open-source implementations
• Minimal prerequisites, as all key mathematical concepts are presented, as is the intuition behind the formulas
• Short, self-contained chapters with class-tested examples and exercises that allow for flexibility in designing a course and for easy reference
• Supplementary online resource containing lecture slides, videos, project ideas, and more
This couse is an outgrowth of data mining courses at Rensselaer Polytechnic Institute (RPI) and Universidade Federal de Minas Gerais (UFMG); the RPI course has been offered every Fall since 1998, whereas the UFMG course has been offered since 2002. Although there are several good course on data mining and related topics, we felt that many of them are either too high-level or too advanced. Our goal was to write an introductory text that focuses on the fundamental algorithms in data mining and analysis. It lays the mathematical foundations for the core data mining methods, with key concepts explained when first encountered; the course also tries to build the intuition behind the formulas to aid understanding.

Data Mining Training Course - OUTLINES

Data Mining and Analysis

• Data Matrix
• Attributes
• Data: Algebraic and Geometric View
• Data: Probabilistic View
• Data Mining


• Univariate Analysis
• Bivariate Analysis
• Multivariate Analysis
• Data Normalization
• Normal Distribution

Categorical Attributes

• Univariate Analysis
• Bivariate Analysis
• Multivariate Analysis
• Distance and Angle
• Discretization

Graph Data

• Graph Concepts
• Topological Attributes
• Centrality Analysis
• Graph Models

Kernel Methods

• Kernel Matrix
• Vector Kernels
• Basic Kernel Operations in Feature Space
• Kernels for Complex Objects

High-dimensional Data

• High-dimensional Objects
• High-dimensional Volumes
• Hypersphere Inscribed within Hypercube
• Volume of Thin Hypersphere Shell
• Diagonals in Hyperspace
• Density of the Multivariate Normal
• Appendix: Derivation of Hypersphere Volume

Dimensionality Reduction

• Introduction
• Principal Component Analysis
• Kernel Principal Component Analysis
• Singular Value Decomposition


• Frequent Itemsets and Association Rules
• Itemset Mining Algorithms
• Generating Association Rules

Summarizing Itemsets

• Maximal and Closed Frequent Itemsets
• Mining Maximal Frequent Itemsets: GenMax Algorithm
• Mining Closed Frequent Itemsets: Charm Algorithm
• Nonderivable Itemsets

Sequence Mining

• Frequent Sequences
• Mining Frequent Sequences
• SubstringMining via Suffix Trees

Graph Pattern Mining

• Isomorphism and Support
• Candidate Generation
• The gSpan Algorithm

Pattern and Rule Assessment

• Rule and Pattern Assessment Measures
• Significance Testing and Confidence Intervals

CLUSTERING - Representative-based Clustering

• K-means Algorithm
• Kernel K-means
• Expectation-Maximization Clustering

Hierarchical Clustering

• Preliminaries
• Agglomerative Hierarchical Clustering

Density-based Clustering

• The DBSCAN Algorithm
• Kernel Density Estimation
• Density-based Clustering: DENCLUE

Spectral and Graph Clustering

• Graphs and Matrices
• Clustering as Graph Cuts
• Markov Clustering

Clustering Validation

• External Measures
• Internal Measures
• Relative Measures

CLASSIFICATION - Probabilistic Classification

• Bayes Classifier
• Naive Bayes Classifier
• K Nearest Neighbors Classifier

Decision Tree Classifier

• Decision Trees
• Decision Tree Algorithm

Linear Discriminant Analysis

• Optimal Linear Discriminant
• Kernel Discriminant Analysis

Support Vector Machines

• Support Vectors and Margins
• SVM: Linear and Separable Case
• Soft Margin SVM: Linear and Nonseparable Case
• Kernel SVM: Nonlinear Case
• SVM Training Algorithms

Classification Assessment

• Classification Performance Measures
• Classifier Evaluation
• Bias-Variance Decomposition

Other Information Technology Courses

Documents and Records Management Compliance
Artificial Intelligence and Applications
Sentiment Analysis and Applications
Informaton Security and Cryptology
Big Data Management
IT Risk Management
Cloud Computing


Cumhuriyet Cad. No:5
Floor 5 - Taksim
34437 Beyoğlu-Istanbul

+90 (553) 743 83 69
You can also reach us via VOIP
Whatsapp, Viber

Do not hesitate to send your inquiry