The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.

Mots clés

Deep learning Transformer Monte Carlo methods Sequence prediction

Domaines

Machine Learning [stat.ML] Intelligence artificielle [cs.AI] Applications [stat.AP]

Fichier principal

smc_transformer_2020.pdf (561.13 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sylvain Le Corff : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02896961

Soumis le : samedi 12 décembre 2020-13:39:27

Dernière modification le : mercredi 21 juin 2023-11:44:06

Dates et versions

hal-02896961 , version 1 (11-07-2020)

hal-02896961 , version 2 (12-12-2020)

Identifiants

HAL Id : hal-02896961 , version 2
ARXIV : 2007.08620

Citer

Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin. The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction. 2020. ⟨hal-02896961v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X INSTITUT-TELECOM CNRS TELECOM-SUDPARIS X-CMAP X-DEP-MATHA CMAP IP_PARIS

221 Consultations

777 Téléchargements