Categories
 A ll Codes (310)
 ACM (222)
 ACM Technique (18)
 ACMICPC (57)
 Adhoc/Brute Force (3)
 Algorithm (83)
 Algo Unsolved (3)
 Asymptotic Notation (3)
 Geometry (2)
 Graph (6)
 All Pair Shortest Path (1)
 BFS (1)
 DFS (1)
 Single Source Shortest Path (2)
 Bellman Ford (1)
 Dijkstra (1)
 Topological Sort (1)
 Number Theory (6)
 Euler GCD (1)
 GCD (1)
 LCM (1)
 Prime factorization (1)
 Sieve of Erastosthenes (1)
 Trial Division (1)
 Recursion (3)
 Fibonacci (1)
 Tower of Hanoi (1)
 Searching (3)
 Binary Search (2)
 Linear Search (1)
 Sorting (13)
 Bubble Sort (1)
 Bucket Sort (1)
 Counting Sort (1)
 Heap Sort (1)
 Insertion Sort (1)
 Merge Sort (3)
 Quicksort (2)
 Radix Sort (1)
 Selection Sort (3)
 Tree (6)
 AVL Tree (1)
 B+ Tree (1)
 Binary Search Tree (4)
 Competitive Programming (1)
 Data Structure (7)
 Linked List (1)
 Queue (2)
 Circular Queue (1)
 Stack (1)
 Future Reference (1)
 Problem Solution (101)
 Codeforces (3)
 DevSkill (1)
 HackerEarth (17)
 HackerRank (4)
 Lightoj (2)
 SPOJ (1)
 Unsolved (1)
 URI Online Judge (34)
 UVa (18)
 Programming Contest (2)
 Programming Problem Solving (18)
 Challenge (1)
 CS Courses (165)
 Art of Effective Living (1)
 Artificial Intelligence (13)
 Assembly Language (3)
 Compiler Design (1)
 Computer Architecture (3)
 Data Communication (1)
 Data Mining (23)
 WEKA (2)
 Database (10)
 SQL (8)
 Digital Image Processing (5)
 Embedded Systems (2)
 Arduino (2)
 Games Development (1)
 Graphics (9)
 OpenGL (7)
 Mathematics (26)
 Numerical Method (18)
 Bisection Method (4)
 Numerical Method (18)
 Microprocessor (1)
 OOP (53)
 Operating System (11)
 Bash/Shell Scripting (7)
 CPU Scheduling (1)
 Simulation and Modelling (1)
 Web Engineering (1)
 DevOps (1)
 Experience (3)
 Events (1)
 Food Review (1)
 Travel (1)
 Financial Knowledge (1)
 Fitness (15)
 Body Transformation (2)
 Bodybuilding (7)
 Cycling (2)
 Fat Loss (2)
 Fitness Motivation (5)
 Health (2)
 Mental Health (1)
 My Gym (2)
 My Life Chart (3)
 My Life Style (3)
 Weight loss (1)
 Graphics Design (1)
 Photoshop (1)
 HigherStudy (90)
 EducationUSA (2)
 English Learning (9)
 Pronunciation (3)
 German Learning (32)
 Germany (11)
 Grammar (11)
 GRE (17)
 IELTS (26)
 Handwriting Improvement (1)
 Listening (4)
 Reading (1)
 Speaking (2)
 Writing Task (3)
 TOEFL (2)
 HTML/CSS (1)
 Job (56)
 CSE (2)
 CV/Resume (4)
 Interview (26)
 Fresher/Intern (3)
 Java (14)
 Software Engineering (3)
 Masters of Science (45)
 Advanced Formal Modelling (3)
 Advanced IT security (1)
 Advanced Testing Method (1)
 Cloud Computing (3)
 Data Mining (15)
 HMI (1)
 Implementation of Database (1)
 Introductory Data Analysis(IDA) (9)
 Excel (3)
 LaTeX (1)
 Learning from Data LFD (1)
 Machine Learning (3)
 Mathematics (1)
 Pattern Oriented Software Architecture(POS) (2)
 Real Time Computing (1)
 MSSQL (3)
 Stored Procedures (1)
 TSQL (1)
 Open Source (6)
 Linux (4)
 Ubuntu 15.04 (1)
 Ubuntu 18.04 (1)
 Ubuntu14.04 (1)
 Ubuntu (2)
 Linux (4)
 Philosophy of Life (171)
 Learning (2)
 Life (10)
 Life Motivation (127)
 Business/Money (6)
 Career/Earning (12)
 Depression (3)
 Fearless Motivation (1)
 Life Quotes (2)
 Life Story/Journey (1)
 Passion (1)
 People Skill/Networking (14)
 Stress (2)
 Study Motivation (59)
 Success Hacks (74)
 Healthy Habit (15)
 Never Quit (36)
 Patience (23)
 Social Skills/Communicative Skills (5)
 Time Management (4)
 Life Productivity (11)
 Life Rules (5)
 Life Skills (4)
 Lifefact (4)
 Lifelessons (8)
 Lifestyle (1)
 Love (8)
 Motivation (17)
 Motivational Posts (7)
 Music (5)
 Poems (2)
 Love (1)
 Poetry (1)
 Ramadan (1)
 Romance (4)
 Sadness (2)
 Sarcasm (2)
 Thoughts (2)
 Pore Abar Bujhbo (8)
 Presentation (8)
 Programming (337)
 Language (306)
 C (152)
 Array (24)
 C library function (3)
 C Refresh (106)
 Function (17)
 logic (1)
 Logic Logic (4)
 Logics (1)
 Memory Management (1)
 Pointer (19)
 String (21)
 Structured Programming (1)
 C# (40)
 C++ (35)
 Vector (1)
 Java (46)
 Javascript (12)
 PHP (18)
 Python (18)
 C (152)
 R (1)
 R programming (3)
 Language (306)
 Project (2)
 Project Report (1)
 Research (53)
 BigData (3)
 Data Science (15)
 Machine Learning (23)
 Natural Language Processing(NLP) (5)
 Plagiarism Checker (2)
 Research Papers (5)
 Research Topic (3)
 Statistics (4)
 Sentiment Analysis (3)
 Software Development (84)
 .NET (34)
 Xamarin (1)
 Android (10)
 ASP.NET (10)
 Debate (2)
 Design Patterns Implementations (7)
 Laravel (4)
 Object Notation (4)
 Regex (2)
 Software Development Life Cycle(SDLC) (4)
 Training (9)
 LICT (9)
 unity3d (1)
 Version Control/SVN (4)
 XML (1)
 .NET (34)
 Tech Tips (21)
 Desktop (1)
 Fix (1)
 Laptop (1)
 Laptop(ল্যাপটপ) (1)
 HP (1)
 Techie Talk (15)
 Tech Debate (2)
 VPS (2)
 Uncategorized (65)
 University Life (3)
 7th semester (1)
 9th semester (1)
 Web Links (1)
 বাংলা (10)
 অবজেক্ট ওরিয়েন্টেড প্রোগ্রামিং (1)
 রিসার্চ (1)
 সি প্রোগ্রামিং (2)
 হার্ডডিস্ক পার্টিশান (1)
Random Quote
Random Quotes

Category Archives: Data Mining
Protected: Data Mining Summer 2021, Frankfurt U
Enter your password to view comments.
Posted in Data Mining
Linear Regression: Influential Observations
https://www.youtube.com/watch?v=fJSXS4oVf88
Posted in Data Mining, Uncategorized
Confusion matrix
https://www.dataschool.io/simpleguidetoconfusionmatrixterminology/#:~:text=A%20confusion%20matrix%20is%20a,the%20true%20values%20are%20known.&text=The%20classifier%20made%20a%20total,the%20presence%20of%20that%20disease).
https://towardsdatascience.com/understandingconfusionmatrixa9ad42dcfd62
Posted in Data Mining, Data Mining
Exercise 10: Solving for problem
K={1, 1.1, 5, 5.1, 1.5, 5.2, 7.9, 1.2, 8.1, 9}
Total item=10
iter1:
m1=5 m2=9
K1={1, 1.1, 5, 5.1, 1.5, 5.2, 1.2} K2={7.9, 8.1, 9}
m1=2.87==approx(3) m2=8.333==approx(9)
iter(2):
K1={1, 1.1, 5, 5.1, 1.5, 5.2, 1.2} ; K2={7.9, 8.1, 9}
m1=approx(3) m2=approx(9)
So, here same mean twice. so we have to stop.
Posted in Data Mining, Data Mining
Data Mining: Cluster analysis doing manually chapter 10
Kmeans clustering Algorithm for manually finding from observation:
Step 1: Take mean value
Step 2: Find nearest number of mean and put in cluster
Step 3: Repeat one and two until we get same mean
K={2,3,4,10,11,12,20,25,30}
k=2 [it means we have to create 2 clusters]
iter1:
m1=4 m2=12
k1={2,3,4} [according to nearest distance of 4]
so mean m1=3
k2={10,11,12,20,25,30}
m2=108/6=18
iter2:
k1={2,3,4,10}
m1=4.75==approc(5)
k2={11,12,20,25,30}
m2=19.6==approx(20)
Iter 3:
K1={2,3,4,10,11,12} K2={20,25,30}
m1=7 m2=25
k1={2,3,4,10,11,12} k2={20,25,30}
m1=7, m2=25
Same mean twice. Thus we are getting same mean we have to stop.
Posted in Data Mining, Data Mining
Data Mining: Unit 9
Ensemble Models:
Basics
Boosting
Ranmdom Forests
Support Vector Machines
Basics
Linear Classification
Nonlinear Classification
Properties of SVMs
Discriminant Analysis
Basics
Exercise:
We are going to create some data mining models for classification and compare their performance. The goal with our models is still.
Posted in Data Mining, Data Mining
Data Mining: Exercise 8
Design of network topology
Determine:
Number of input nodes
Too few nodes => misclassification
Too many nodes=> overfitting
Problems with dollar sign:
https://stackoverflow.com/questions/42560090/whatisthemeaningofthedollarsigninrfunction
Problem with tilde sign:
https://stackoverflow.com/questions/14976331/useoftildeinrprogramminglanguage?noredirect=1&lq=1
Posted in Data Mining, Data Mining
Unit 5 Multiple Linear Regression
iT IS HAPPEN WHEN MORE THAN ONE POSSIBLE PREDICTOR VARIABLE.
including more than one independent variable in the regression model, makes us extend the simple linerar regression model to a multiple linear regression model.
Advantages:
Relationship between response variables and several predictors simultaneously.
Disadvantages:
Model building , interpration difficulties due to complexity.
Multiple linear regression with two predictors:
Y=beta0+beta1X1+beta2X2+epsylon
where, Y is the dependent variable.
X1,X2…Xk are predictors(independent variables)
Epsylon is the random error
beta1, beta2, beta0 are unknown regression coefficients
Example=> oil consumption:
Y=oil consumption(per month)
X1=outdoor temperature
X2=size of house(in meter square)
Model:
Y=beta0+beta1X1+beta2X2+epsylon
now beta1 is expected change in Y(oil consiumption) at one unit increase in X1(outdoor temperature), when all other predictors are kept constant, i.e. in this case the size of the house is not changed.
beta1 is estimated with beta1=27.2 degree C
Assumptions:
The random error term epsylon is normally distributed and has mean zero. i.e. E(epsylon)=0
Epsylon has (unknown) variance sigma epsylon^2. i.e. all random errors have the same variance.
Adjusted R^2
R^2adj=1 SSE/(nk1)/SST/(n1)
As for simple linear regression:
plots of residual against y prime
plots of residuals against xi
normal probability plot of residuals
plots of residuals in observation order
Cook’s distance
Studentized residuals
Standardized residuals
Dffits
Collinearity:
Can only occur for multiple regression.
Predictors explaining the same variation of the response variabl.
Oil consumption continued:
One predictor measuring house size in cm^2 and another predictor in m^2
Variance inflation factor
VIFi=1/1Ri^2
Condition Index for collinearity:
between 10 and 30=>weak collinearity
between 30 and 100=>moderate
collinearity>100=>strong collinearity
Example of Oil consumption continued:
Assume that we would like to use outdoor temperature X1 and house size X2 as predictors. Additionally, we want to use a third predictor:
X3={1 if extrathick walls, 0 otherwise
Model:
Y=beta0+beta1X1+beta2X2+beta3X3+epsylon
Model Selection Strategies:
Mldel ranked using R^2, adjusted R^2 or mallow’s Cp
Stepwise selection methods:
Backward, forward, stepwise selection
r^2 Selection
In a data set with 7 possible predictors, there would be 2^71=127 possible regression models.
For every model size(k=1,2,…..,p) look at, let say, m models, chosen
Mallow’s Cp:
Large Cp=>biased model
it’s a formula.
where MSEp=mean squared error for a model with p parametes
mean squared error for the full model
n=number of observations
Posted in Data Mining, Data Mining
Exercise Sheet 5
1d theke clear na , eta clear korte hobe , In Sha Allah.
Lagle onno kono tutorial ba example dekhte hobe.
Posted in Data Mining, Data Mining
Exercise Sheet 4
Data Mining Methods: Unit 4
Correlation and Simple Linear Regression
Interpretation of the correlation coefficient
Possible range: [1, 1]
1: perfect negative linear relationship
0: no linear relationship,
1: perfect positive linear relationship.
Regression: Objective
To predict one variable from other variables.
To explain the variability of one variable using the other variables.
Predicts scores on one variable from the scores on a second variable.
Response variable: predicting variable (Y )
Predictor variable: predictions based on this variable (X)
Simple regression:
Only one predictor variable; otherwise multiple regression
Linear regression:
Predictions of the response variable (Y ) is a linear function of the predictor variable (X)
Posted in Data Mining, Data Mining
Data Preprocessing/Exercise Sheet 2
Theory:
Data Preprocessing in the Data Mining Process:
The data mining/KDD process
Why data preprocessing?
Issues in Data Preprocessing:
Data Cleaning
Data Transformation
Variable Construction
Data Reduction and Discretization
Data Integration
The data mining/KDD Process:
Understanding customer: 10%20%
Understanding data:2030
Prepare data: 4070%
Build Models: 1020%
Evaluate models: 10%20%
Take action:10%20%
Why data mining?
Real – world data is dirty
Low data quality anyway a huge problem in data mining
Garbage in,garbage out
Different methods, different requirements
R Working Codes for data mining:
R code is case sensitive:
I am doing it from professors sheet.
dim means dimension
This line i could not make work:
hist(Ozone,breaks=25,ylim=(c(0,45)),main=”Original data”)
And another question how the imputation works
Exercise 2 (K)= I have to find the answers
Exercise 3: Answer:
1 
clothing=read.csv(file="F:/desktop and documents/Desktop/dataminingdata/clothing_store.txt") 
Posted in Data Mining, Data Mining
R programming
R:
Manipulation of Vectors and Numbers
Vectors and Assignment
Extraction of Elements from VectorsMatrices
Basic Manipulations
The Data Frame
Table
Frames
Cumulative Distribution Function
Measures of Central Tendency
Measures of Spread
Correlation[Ektu dekhte hobe]
Posted in Data Mining, Data Mining, R programming