Category Archives: Data Mining

Training Set, Validation Set, Test Set

Training Set is a subset of the dataset used to build predictive models.
Validation Set is a subset of the dataset used to assess the performance of model built in the training phase
– It provides a test platform for fine-tuning model’s parameters and selecting the best performing model
– Not all modeling algorithms need a validation set
Test set or unseen examples is a subset of the dataset to assess the likely future performance of a model.
– If a model fits the training set much better than it fits the test set. Overfitting is probably the cause

 

Binary Classification(two class classification)

true|false, 1|0, -1|+1, male|female

Multi-class classification problems can be seen as binary classification problems.

Model Evaluation:

 

https://en.wikipedia.org/wiki/Confusion_matrix

Data Science: Discrete vs Continuous

Making Predictions with WEKA

How to Save Your Machine Learning Model and Make Predictions in Weka

Decision Tree

Data Mining Deep Study – Confusion Matrix

A confusion matrix shows the number of correct and incorrect predictions made by the classification models compared by actual outcomes (target value) in the data.

 

Found a good lecture regarding confusion matrix with easy explanation for HIV AIDS. Video is found below and my own drawing regarding this is also given below:

WEKA Rushdi Shams Track

In 3rd video it explains some of the details about different results output comes. It’s important.

In 4th video blue is yes and red is no

In 5th video it’s explained about the testing and training in details so it must be watched.

In 7th video K fold 10 means 10 different models for 10 different folds

In 8th we have tried the IRIS data

In 9th feature selection methods where attribute can be selected for different algorithms and results may vary. (Wrapper method)
feauture selection means attribute selection

In 10th ranker algotihms uses for ranking features or attributes wrapper method for machine learning tasks where filter method useful for data mining tasks

WEKA

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeirc
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}

@data
sunny, 90, 77, TRUE, no
overcast, 88, 90, FALSE, no

Mission Data Mining

With WEKA


For predicting class from model

 

Some data mining tuts

http://people.sabanciuniv.edu/berrin/cs512/lectures/WEKA/WEKA%20Explorer%20Tutorial-REFERENCE.pdf

How to Run Your First Classifier in Weka

 

https://www.ibm.com/developerworks/library/os-weka2/

Data Mining:Intros and Weka With Java

https://www.ibm.com/developerworks/library/os-weka1/

http://www.programcreek.com/2013/01/a-simple-machine-learning-example-in-java/

http://www.cs.umb.edu/~ding/history/480_697_spring_2013/homework/WekaJavaAPITutorial.pdf