FOURTH INTERNATIONAL SYMPOSIUM ON
IMPRECISE PROBABILITIES AND THEIR APPLICATIONS
Carnegie Mellon University
Pittsburgh, PA, USA
July 20-23 2005

ISIPTA'05 ELECTRONIC PROCEEDINGS

Carolin Strobl

Variable Selection in Classification Trees Based on Imprecise Probabilities

Abstract

Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities.

Keywords. Classification trees, credal classification, variable selection bias, attribute selection error, Shannon entropy, entropy estimation.

Paper Download

The paper is availabe in the following formats:

Authors addresses:

Carolin Strobl, M.Sc. Statistics
Department of Statistics
Ludwig-Maximilians-University Munich
Ludwigstr.33/310
80539 Munich, Germany
Tel.: +49-89-2180-3196

E-mail addresses:

Carolin Strobl carolin.strobl@stat.uni-muenchen.de

Related Web Sites


[ back to the Proceedings of ISIPTA'05 home page 
Send any remarks to the following address: smc@decsai.ugr.es