A. Mesaros, T. Heittola, and T. Virtanen, TUT database for acoustic scene classification and sound event detection, 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
DOI : 10.1109/EUSIPCO.2016.7760424

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrencee et al., Audio Set: An ontology and human-labeled dataset for audio events, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952261

J. Salamon and J. P. Bello, Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification, IEEE Signal Processing Letters, vol.24, issue.3, pp.279-283, 2017.
DOI : 10.1109/LSP.2017.2657381

G. Parascandolo, H. Huttunen, and T. Virtanen, Recurrent neural networks for polyphonic sound event detection in real life recordings, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7472917

T. Komatsu, T. Toizumi, R. Kondo, and Y. Senda, Acoustic event detection method using semi-supervised non-negative matrix factorization with a mixture of local dictionaries, Tech. Rep, 2016.

D. Barchiesi, D. Giannoulis, D. Stowel, and M. D. Plumbley, Acoustic Scene Classification: Classifying environments from the sounds they produce, IEEE Signal Processing Magazine, vol.32, issue.3, pp.16-34, 2015.
DOI : 10.1109/MSP.2014.2326181

S. Chu, S. Narayanan, C. J. Kuo, and M. J. Mataric, Where am I? Scene Recognition for Mobile Robots using Audio Features, 2006 IEEE International Conference on Multimedia and Expo, pp.885-888, 2006.
DOI : 10.1109/ICME.2006.262661

R. Serizel, V. Bisot, S. Essid, and G. Richard, Machine listening techniques as a complement to video image analysis in forensics, 2016 IEEE International Conference on Image Processing (ICIP), pp.5470-5474, 2016.
DOI : 10.1109/ICIP.2016.7532497
URL : https://hal.archives-ouvertes.fr/hal-01393959

A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund et al., Audio-based context recognition, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.321-329, 2006.
DOI : 10.1109/TSA.2005.854103

A. Rakotomamonjy and G. Gasso, Histogram of gradients of time-frequency representations for audio scene classification

E. Benetos, M. Lagrange, and S. Dixon, Characterisation of acoustic scenes using a temporally constrained shift-invariant model, Proc. of Digital Audio Effects, 2012.

V. Bisot, R. Serizel, S. Essid, and G. Richard, Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1216-1229, 2017.
DOI : 10.1109/TASLP.2017.2690570
URL : https://hal.archives-ouvertes.fr/hal-01362864

A. Rakotomamonjy, Supervised Representation Learning for Audio Scene Classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.6, pp.1253-1265, 2017.
DOI : 10.1109/TASLP.2017.2690561
URL : https://hal.archives-ouvertes.fr/hal-01354115

J. Li, W. Dai, F. Metze, S. Qu, and S. Das, A comparison of Deep Learning methods for environmental sound detection, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952131

S. Park, S. Mun, Y. Lee, and H. Ko, Score fusion of classification systems for acoustic scene classification, DCASE2016 Challenge, 2016.

H. Phan, P. Koch, F. Katzberg, M. Maass, R. Mazur et al., Audio Scene Classification with Deep Recurrent Neural Networks, Interspeech 2017, 2017.
DOI : 10.21437/Interspeech.2017-101

W. Dai, C. Dai, S. Qu, J. Li, and S. Das, Very deep convolutional neural networks for raw waveforms, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2017.7952190

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

A. Mesaros, T. Heittola, O. Dikmen, and T. Virtanen, Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.151-155, 2015.
DOI : 10.1109/ICASSP.2015.7177950

J. Mairal, F. Bach, and J. Ponce, Task-Driven Dictionary Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.4, pp.791-804, 2012.
DOI : 10.1109/TPAMI.2011.156
URL : https://hal.archives-ouvertes.fr/inria-00521534

H. Eghbal-zadeh, B. Lehner, M. Dorfer, and G. Widmer, CP- JKU submissions for DCASE-2016: a hybrid approach using binaural i-vectors and deep convolutional neural networks, 2016.

Y. Petetin, C. Laroche, and A. Mayoue, Deep neural networks for audio scene recognition, 2015 23rd European Signal Processing Conference (EUSIPCO), pp.125-129, 2015.
DOI : 10.1109/EUSIPCO.2015.7362358

Y. Bengio, P. Lamblinl, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks Advances in neural information processing systems, p.153, 2007.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

J. Le-roux, J. R. Hershey, and F. Weninger, Deep NMF for speech separation, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.66-70, 2015.
DOI : 10.1109/ICASSP.2015.7177933

V. Bisot, R. Serizel, S. Essid, and G. Richard, Supervised nonnegative matrix factorization for acoustic scene classification, Tech. Rep., DCASE2016 Challenge, 2016.

B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Richard, Yaafe, an easy to use and efficient audio feature extraction software, Proc. of International Society for Music Information Retrieval, pp.441-446, 2010.

R. Serizel, S. Essid, and G. Richard, Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp.2016-2042, 2016.
DOI : 10.1109/MLSP.2016.7738818
URL : https://hal.archives-ouvertes.fr/hal-01393964

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Lecun, What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th International Conference on Computer Vision, pp.2146-2153, 2009.
DOI : 10.1109/ICCV.2009.5459469

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, 2012.