On Binary Classification in Extreme Regions

Hamid Jalalzai; Stéphan Clémençon; Anne Sabourin

Proceedings/Recueil Des Communications Année : 2018

On Binary Classification in Extreme Regions

(1, 2) , (1, 2) , (1, 2)

1
2

Hamid Jalalzai

Fonction : Auteur
PersonId : 736865
IdHAL : hamid-jalalzai

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Anne Sabourin

Fonction : Auteur
PersonId : 1077014

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Résumé

In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in R d with d ≥ 1 by means of a classification rule with minimum probability of error. In a wide variety of applications, ranging from finance/insurance to environmental sciences through teletraffic data analysis for instance, extreme (i.e. very large) observations X are of crucial importance, while contributing in a negligible manner to the (empirical) error however, simply because of their rarity. As a consequence, empirical risk minimizers generally perform very poorly in extreme regions. It is the purpose of this paper to develop a general framework for classification in the extremes. Precisely, under non-parametric heavy-tail assumptions for the class distributions, we prove that a natural and asymptotic notion of risk, accounting for predictive performance in extreme regions of the input space, can be defined and show that minimizers of an empirical version of a non-asymptotic approximant of this dedicated risk, based on a fraction of the largest observations, lead to classification rules with good generalization capacity, by means of maximal deviation inequalities in low probability regions. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed

Domaines

Statistiques [stat] Machine Learning [stat.ML]

Fichier principal

inproceedings-2018-18398-4.pdf (778.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://imt.hal.science/hal-01932813

Soumis le : vendredi 23 novembre 2018-13:35:58

Dernière modification le : vendredi 17 novembre 2023-14:36:15

Dates et versions

hal-01932813 , version 1 (23-11-2018)

Identifiants

HAL Id : hal-01932813 , version 1

Citer

Hamid Jalalzai, Stéphan Clémençon, Anne Sabourin. On Binary Classification in Extreme Regions. 2018. ⟨hal-01932813⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM PARISTECH LTCI IDS S2A

213 Consultations

177 Téléchargements

On Binary Classification in Extreme Regions

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager