Classification on Adult dataset

The Adult dataset was used in Ron Kohavi 2011 paper, Scaling Up the Accuracy of Naive-Bayes Classi ers: a Decision-Tree Hybrid.

Predict whether income exceeds $50K/yr based on census data. Also known as “Census Income” dataset. Extraction was done by Barry Becker from the 1994 Census database. Prediction task is to determine whether a person makes over 50K a year.

The columns in this dataset are:

  • age
  • workclass
  • fnlwgt
  • education
  • education-num
  • maritial-status
  • occupation
  • relationship
  • race
  • sex
  • capital-gain
  • capital-loss
  • hours-per-week
  • native-country

The model was been generated using Random Forest approach (http://scikit-learn.org/stable/), Pandas (http://pandas.pydata.org/) and Numpy (http://www.numpy.org/).

Sample adult data

Sample adult data

sample adult data

Summary of numerical fields

summary of numerical fields

summary of numerical fields

Examples number of each incomes

Examples number of each incomes

Examples number of each incomes

True means have missing value else False.

True means have missing value else False

True means have missing value else False

Model Output generated.

Model Output.

Model Output.

Leave a Reply

Your email address will not be published. Required fields are marked *