Confusion matrix

https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/#:~:text=A%20confusion%20matrix%20is%20a,the%20true%20values%20are%20known.&text=The%20classifier%20made%20a%20total,the%20presence%20of%20that%20disease).

https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62

Z-test

Same Rank Issue

https://www.statisticshowto.com/spearman-rank-correlation-definition-calculate/

Symmetry and Assymetry Data

https://elentra.healthsci.queensu.ca/assets/modules/types-of-data/symmetrical_and_asymmetrical_data.html

Absolute and Relative Frequency

https://www.geeksforgeeks.org/absolute-and-relative-frequency-in-r-programming/

Hypothesis Testing

Outliers in DM

Exercise 11: Problem solved: Data Mining

 

code for exercise :

 

Variable Rejection

 

 

 

Exercise 10: Solving for problem

K={1, 1.1, 5, 5.1, 1.5, 5.2, 7.9, 1.2, 8.1, 9}
Total item=10

iter1:
m1=5 m2=9
K1={1, 1.1, 5, 5.1, 1.5, 5.2, 1.2} K2={7.9, 8.1, 9}
m1=2.87==approx(3) m2=8.333==approx(9)

iter(2):
K1={1, 1.1, 5, 5.1, 1.5, 5.2, 1.2} ; K2={7.9, 8.1, 9}
m1=approx(3) m2=approx(9)
So, here same mean twice. so we have to stop.

Data Mining: Cluster analysis doing manually chapter 10

K-means clustering Algorithm for manually finding from observation:

Step 1: Take mean value

Step 2: Find nearest number of mean and put in cluster

Step 3: Repeat one and two until we get same mean

K={2,3,4,10,11,12,20,25,30}

k=2 [it means we have to create 2 clusters]

iter1:

m1=4   m2=12

k1={2,3,4}   [according to nearest distance of 4]

so mean m1=3

k2={10,11,12,20,25,30}

m2=108/6=18

iter2:

k1={2,3,4,10}

m1=4.75==approc(5)

k2={11,12,20,25,30}

m2=19.6==approx(20)

Iter 3:
K1={2,3,4,10,11,12} K2={20,25,30}
m1=7 m2=25

k1={2,3,4,10,11,12} k2={20,25,30}

m1=7, m2=25

Same mean twice. Thus we are getting same mean we have to stop.

Data Engineer Track

https://towardsdatascience.com/who-is-a-data-engineer-how-to-become-a-data-engineer-1167ddc12811

https://medium.com/datadriveninvestor/python-vs-r-choosing-the-best-tool-for-ai-ml-data-science-7e0c2295e243

 

ArrayList in Java

ArrayList in Java:

Generics:

 

LinkedList is faster for manipulation but slower for retrieval and ArrayList is slower for manipulation but faster for retrieval

 

Data Mining: Unit 9

Ensemble Models:

Basics
Boosting
Ranmdom Forests

Support Vector Machines
Basics
Linear Classification
Nonlinear Classification
Properties of SVMs

Discriminant Analysis
Basics

 

Exercise:

We are going to create some data mining models for classification and compare their performance. The goal with our models is still.

 

Data Mining: Exercise 8

Design of network topology

Determine:

Number of input nodes
Too few nodes => misclassification
Too many nodes=> overfitting

 

Problems with dollar sign:

https://stackoverflow.com/questions/42560090/what-is-the-meaning-of-the-dollar-sign-in-r-function

Problem with tilde sign:

https://stackoverflow.com/questions/14976331/use-of-tilde-in-r-programming-language?noredirect=1&lq=1