Category Archives: Research

Training Set, Validation Set, Test Set

Training Set is a subset of the dataset used to build predictive models.
Validation Set is a subset of the dataset used to assess the performance of model built in the training phase
– It provides a test platform for fine-tuning model’s parameters and selecting the best performing model
– Not all modeling algorithms need a validation set
Test set or unseen examples is a subset of the dataset to assess the likely future performance of a model.
– If a model fits the training set much better than it fits the test set. Overfitting is probably the cause


Binary Classification(two class classification)

true|false, 1|0, -1|+1, male|female

Multi-class classification problems can be seen as binary classification problems.

Model Evaluation:

Machine Learning in Easy Way

Deep Learning, Deep Neural Network:


Road to Data Scientist and Python Ninja

Here , I will try to share some of the info about data scince mastery step by step and pythonic attitude :p

These links may help:

Data Podcats:






Heading Towards My Journey To Data Science Strongly..The sexiest job of 21st century


Sentiment Analysis, Natural Language Processing:

Data Science MOOC (Will do someday)

Data Science: Discrete vs Continuous

Data Mining Deep Study – Confusion Matrix

A confusion matrix shows the number of correct and incorrect predictions made by the classification models compared by actual outcomes (target value) in the data.


Found a good lecture regarding confusion matrix with easy explanation for HIV AIDS. Video is found below and my own drawing regarding this is also given below:


@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeirc
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}

sunny, 90, 77, TRUE, no
overcast, 88, 90, FALSE, no

Difference between AI, Machine Learning, NLP and Deep Learning

Different terms for:


Artificial Intelligence will lead the future

Google and the Self-Driving Car

Data Science Track

1. Watch all the videos in youtube regarding data science including algorithms.
2. For data mining complete a series example Rushdi Shams with WEKA
3. For basic theory:
Watch and complete UDACITY, Andrew NG  machine  learning course step by step.
Udacity course I have found much interesting than Andrew NG but I will finish both In Sha Allah
4. There is a UDEMY paid course hands on data science with python
5. For python learn from codeacademy
UBUNTU is good rather windows for python
6. Subeen vaia’s book is also good for python

To be continued

Plagiarism Checker

Top 5 popular data science algorithms

Top 5 popular data science algorithms:

Decision Tree
Random Fores
Association Rule Mining
Linear Regression
K-means Clustering

Data science is nothing but extracting and actionable knowledge from data:

Data Scienctist must know data architecture , machine learning, data analytics.

Machine Learning Algorithms(sample)

Unsupervised Supervised
Clustering Regression
Kmeans Linear
SVD Polynomial
PCA Decision Trees
Radom Forests

Association Analysis Classification
Apriori KNN
FP Growth Trees
Hidden Markov Model Logistic Regression
Naive Bayes

Supervised Learning: The categories of the data is already known
Unsupervised Learning: The learning process attempts to find appropriate category for the data.

Motivational Researcher

Mission Data Scientist