Coding a Decision Tree

The Data

For our first example, we will be using a set of measurements taken on various red wines.
The data set is from
- P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
The data is located at
- https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
There are 12 measurements, taken on 1599 different red wines.

Attribute Summary

Question: Can we predict the quality of the wine from the attributes?

Coding Decision Trees: General Steps

Load the decision tree packages
Read in the data
Identify the target feature
Divide the data into a training set and a test set.
Fit the decision tree model
Apply the model to the test data
Display the confusion matrix

1. Load Decision Tree Package

from sklearn import tree

2. Read in the data

import pandas as pd
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine = pd.read_csv(data_url, delimiter=';')
print(wine.info())

3. Identify the target feature

#Split the quality column out of the data
wine_target = wine['quality']
wine_data = wine.drop('quality', axis=1)

For the functions that we will be using, the target values (e.g., quality) must be a separate object.

4. Divide the Data

from sklearn import model_selection
test_size = 0.30
seed = 7
train_data, test_data, train_target, test_target = model_selection.train_test_split(wine_data,
wine_target, test_size=test_size,
random_state=seed)

5. Fit the Decision Tree Model

model = tree.DecisionTreeClassifier()
model = model.fit(train_data, train_target)

6. Apply the Model to the Test Data

prediction = model.predict(test_data)

7. Display Confusion Matrix

row_name ="Quality"
cm = pd.crosstab(test_target, prediction,
rownames=[row_name], colnames=[''])
print(' '*(len(row_name)+3),"Predicted ", row_name)
print(cm)

Activity: Decision Tree Program

Make sure that you can run the decisionTree code: 01_Decision_Tree.ipynb

Last updated on Jun 9, 2022