Data Preprocessing/Exercise Sheet 2

Data Preprocessing in the Data Mining Process:

The data mining/KDD process
Why data preprocessing?

Issues in Data Preprocessing:

Data Cleaning
Data Transformation
Variable Construction
Data Reduction and Discretization
Data Integration

The data mining/KDD Process:
Understanding customer: 10%-20%
Understanding data:20-30
Prepare data: 40-70%
Build Models: 10-20%
Evaluate models: 10%-20%
Take action:10%20%

Why data mining?

Real – world data is dirty
Low data quality anyway a huge problem in data mining
Garbage in,garbage out
Different methods, different requirements

R Working Codes for data mining:

R code is case sensitive:
I am doing it from professors sheet.

dim means dimension


This line i could not make work:

hist(Ozone,breaks=25,ylim=(c(0,45)),main=”Original data”)

And another question how the imputation works


Exercise 2 (K)= I have to find the answers


Exercise 3: Answer:

clothing=read.csv(file="F:/desktop and documents/Desktop/dataminingdata/clothing_store.txt")



It would be a great help, if you support by sharing :)
Author: zakilive

Leave a Reply

Your email address will not be published.