ARAMA

Sponsorlar

Sample: Classification and Clustering

28 Kasım 2013 tarihinde Emre ÇİNTAŞ tarafından yazılmıştır.

The flag database creators collected primarily from the "Collins Gem Guide to Flags": Collins Publishers (1986) and the full database date is 05/15/1990. I did translate this database from flag.data to flagemrec.xls and flagemrec.arrf because of that wasn’t run on Weka or Rapidminner etc.

 

This data file contains details of various nations and their flags. With this data i will try things like predicting the religion of a country from its size and the colours in its flag. In this data, there are 194 instances and 30 attributes which are; name,landmass,zone,area(km),population,language,religion,bars,stripes,colours,

red,green,blue,gold,white,black,orange,mainhue,circles,crosses,saltires,quarters,

sunstars,crescent,triangle,icon ,animate,text,topleft,botright…(Attributes)

Data Mining on Flag Database

Data mining is the use of automated data analysis techniques to uncover previously undetected relationships among data items. Data mining often involves the analysis of data stored in a data warehouse. Three of the major data mining techniques are regression, classification and clustering.

I will aplly just 2 techniques on flag database that are classification and clustering.

Classification : is a data mining (machine learning) technique used to predict group membership for data instances.

Clustering : is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

Flag Database Statistics

 

Purpose of the Study and Methodology

The main purpose of this study, predicting the religion of a country from its size and the colours in its flag should take into consideration the most important metrics (variables) cluster analysis techniques and classification techniques of KMeans clustering algorithm can be used as a convenient method to create an example of the show religion,language,country name according to flag attributes.

Classification Work

I tried four classification algorithms that names are Naive Bayes,Kstar,RbfNetwork and J.48 on Weka and as you can see in this picture, the most succesfull algorithm was Kstar algorithm in 194 instances(Flag Database). 

 Visualizing Flags on Weka

According to flags’ color, i found religion with Kstar classification algorithm.

Clustering with KMeans (K=5)

Result

Firstly; in my study, I tried classification with 194 instances on flag database and inside of 4 algorithms that was succesfull Kstar algorithm(%99.9) and then created visualizing flags.

Secondly;in my study, 194 instances(country) is divided into five clusters with Kmeans algorithm. The resulting sets of: 1st cluster 51 ( 26%), 2nd cluster 37 ( 19%), 3rd cluster 31 ( 16%), 4th cluster 41 ( 21%) and 5th; 34 countries in the cluster (18%) are available.

Thirdly, According to the datas in my hand;

1st cluster 51(%26) which is in North America use language English, religion is other christian, use red,blue and white color in their flags, mainhue color is blue,topleft and rightleft on flag is blue color.

2nd cluster 37 (%19) which is in Affrica use language other languages, religion is ethnic, use red,green and gold color in their flags, mainhue color is red,topleft is red and rightleft on flag is green color.  

3rd cluster 31 (%16) which is in Asia use language arabic language, religion is muslim, use red,green and white color in their flags, mainhue color is green,topleft and rightleft on flag is green color.  

4th cluster 41 (%21) which is in Europe use language indo europen language, religion is catholic, use red,blue and white color in their flags, mainhue color is red,topleft is blue and rightleft on flag is red color.  

5th cluster 34 (%18) which is in Asia use language other languages, religion is muslim and marxist, use red and white color in their flags, mainhue color is red,topleft and rightleft on flag is red color.