Learning Path (Syllabus)

Machine Learning Bootcamp

Pay Only After Placement at min 6 LPA

The learning journey.

Unit 1 - Python.

Numpy & Pandas

Both the Pandas and NumPy can be seen as an essential library for any machine learning due to their intuitive syntax and high-performance matrix computation capabilities. These two libraries are also best suited for data science applications.

Sci-Kit Learn

Sci-kit learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Data Visualisation with Matplotlib

Matploptib is a low-level library of Python which is used for data visualization.

Unit 2 - Statistics.

Descriptive Statistics and Inferential statistics

Statistics that summarise or describe features of a data set such as central tendency or disposition

Data Normalisation

Normalization is an essential step in data pre-processing in any machine learning application and model fitting.

Parametric and Non-parametric methods

Machine learning models can be parametric or non-parametric. Parametric models are those that require the specification of some parameters before they can be used to make predictions, while non-parametric models do not rely on any specific parameter settings and therefore often produce more accurate results.

Hypothesis Testing

The process of hypothesis testing is to draw inferences or some conclusion about the overall population or data by conducting some statistical tests on a sample. When we use sample data to train our model, we make assumptions about our population. By performing hypothesis testing, we validate these assumptions for a desired significance level.

Probability theory

Probability theory is incorporated into machine learning, particularly the subset of artificial intelligence concerned with predicting outcomes and making decisions.

Unit 3 - Supervised Learning.

Linear Regression

Multiple Linear Regression attempts to model the relationship between two or more features and a response by fitting a linear equation to observed data. The steps to perform multiple linear Regression are almost similar to that of simple linear Regression.

Decision Tree

Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.

Random Forest

Random Forest is a powerful and versatile supervised machine learning algorithm that grows and combines multiple decision trees to create a “forest.” It can be used for both classification and regression problems in R and Python.

Naive Bayes

Naïve Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of classification tasks. Bayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities. Conditional probability is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred.

Support Vector Machines

Support vector machine is another simple algorithm that every machine learning expert should have in his/her arsenal. Support vector machine is highly preferred by many as it produces significant accuracy with less computation power. Support Vector Machine, abbreviated as SVM can be used for both regression and classification tasks. But, it is widely used in classification objectives.

Unit 4 - Unsupervised Learning.

K-Means Algorithm

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The objective of K-means is simple - group similar data points together and discover underlying patterns.

Heirarchical Clustering

Hierarchical clustering, also known as hierarchical cluster analysis or HCA, is another unsupervised machine learning approach for grouping un-labeled datasets into clusters. The hierarchy of clusters is developed in the form of a tree in this technique, and this tree-shaped structure is known as the dendrogram.

GMMs

Gaussian mixture models (GMMs) are a type of machine learning algorithm. They are used to classify data into different categories based on the probability distribution. Gaussian mixture models can be used in many different areas, including finance, marketing and so much more.

Outlier Detection

Anomaly detection is one of the most common use cases of machine learning. Finding and identifying outliers helps to prevent fraud, adversary attacks, and network intrusions that can compromise your company’s future. To ensure that the trained model generalizes well to the valid range of test inputs, it's important to detect and remove outliers.

Dimensionality Detection

Dimensionality reduction is commonly used in data visualization to understand and interpret the data, and in machine learning or deep learning techniques to simplify the task at hand.