Introduction to AI

Module 7: decision trees

Summary

Topics:
- Decision Trees and regression trees with a short discussion of probability estimation trees
- Ensemble methods specifically focuses on random forests
Length: This module will take one week to complete
Assigned chapters: Chapter 19 (the parts we did not read for the last module)
Project: Project 5 is assigned in this module and will be due in the next module

Decision trees and Random forests

Decision tree based classifiers are very popular in machine learning and data science applications. The most popular methods are random forests and gradient boosted classifiers, both of which are ensembles of decision trees. For this module, we will discuss basic decision tree algorithm and then discuss random forests and gradient boosting.

One reason that decision trees are a popular method is that they are inherently human readable, at least as a single tree, the forest is harder to read. A single decision tree is really a flow chart, a type of diagram that humans have been making and reading for many years! The difference is that the flow chart splits are all created automatically using machine learning rather than by hand.

Decision trees

Decision tree graphic

Graphic from this article at DataScience foundation (they often have good articles/graphics!)

Assignments for Module 7

Topic 1: decision trees

For the first part of this module, we will learn about the basic decision tree algorithm. This will include classification trees and regression trees. For our reading, we will jump back to the remaining section in chapter 19 that we skipped in the previous module.

(30 min) Reading
- Read Section 19.3

(30 min) Decision trees
- What types of trees exist? Why do we want to study trees?
  - Copy of my slides
- How do you grow a decision tree?
  - Copy of my slides
  - Wikipedia talks about Gini Impurity score
  - An example of how to use chi-squared as a decision tree score
- Example of choosing the best attribute
  - Copy of my slides
Complete the exercise on decision trees

Topic 2: ensemble methods

For the second topic in this module, we will look into ensemble methods. This will let us learn about random forests and gradient boosted methods as well!

(30 min) Reading
- Read Section 19.8 (Ensemble Methods)

(30 min)
- Motivation for ensemble methods: brittleness and bias variance tradeoff
  - Copy of my slides
- Random Forests
  - Copy of my slides
- Boosting and Gradient Boosted Forests
  - Copy of my slides

project for module 7

Project 4

Remember that project 4 is due at the end of this week!

Project 5

Although you will spend most of your project time this week on project 4, project 5 is assigned at the end of this module since it is associated with this module. It will be due in 2 weeks on Nov 14. For this module, please read project 5 and you will begin working on it during the next module.

suggested schedule for module 7

week 1 (Oct 24-31)

Complete Topic 1 by Tuesday
Complete Topic 2 by Thursday
Finish Project 4 by Sunday
Read Project 5 by Sunday