Ondřej Dušek

Ondřej Dušek

Charles University

Prague, Czechia

odusekufal.mff.cuni.cz


Me

I am an Assistant Professor at the Institute of Formal and Applied linguistics, Charles University in Prague, working on the NaMuDDiS project (multi-domain dialogue systems). My main research focus is machine learning and deep learning for natural language processing, especially human-computer dialogue (dialogue systems) and natural language generation (NLG). I previously worked in the Interaction Lab at Heriot-Watt University in Edinburgh, where I also co-advised the Heriot-Watt University team in the Amazon Alexa Prize chatbot competition.

I did my PhD on language generation for dialogue systems at the Institute of Formal and Applied linguistics, Charles University in Prague, where I also worked on multiple grant projects related to dialogue systems and machine translations.

News

Projects

Research projects I work and have worked on:

Current

  • NaMuDDiS – multi-domain dialogue systems

Past

  • MaDrIgAL – spoken dialogue systems (and natural language generation, 2016-2018)
  • Alexa Prize Challenge – chatbots (2017-2018)
  • DILiGENt – natural language generation (2016-2018)
  • AdaNLG – adaptive natural language generator (2014–2016)
  • Vystadial – spoken dialogue systems (2012–2016)
  • QTLeap – semantic machine translation (2013–2016)
  • Khresmoi – medical information retrieval (working on machine translation, 2013–2014)
  • FAUST – improving machine translation fluency (2011–2013)

Tools

Open-source software I (co-)built:

  • RatPred – trainable NLG evaluation tool
  • TGen – a statistical natural language generator
  • Treex – a modular NLP toolkit
  • Alex – spoken dialogue system framework
  • MTMonkey – machine translation web services infrastructure
  • Flect – statistical morphology generation

Students

Students I co-supervise and co-supervised:

  • Xinnuo Xu (Ph.D., CDT robotics, with Verena Rieser & Ioannis Konstas, since 2016)
  • Shubham Agarwal (Ph.D. at Heriot-Watt, with Verena Rieser & Ioannis Konstas, since 2017)
  • Vojtěch Hudeček (Ph.D. at Charles Uni, with Zdeněk Žabokrtský, since 2018)
  • Patrícia Schmidtová (BSc. at Charles Uni, with Vojtěch Hudeček, since 2018)
If you're interested in doing a bachelor's/master's thesis or a PhD with me, please email me.

Publications

Papers

2019

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge, In: Computer Speech and Language. ScienceDirect arXiv Web
  • Simon Keizer, Ondřej Dušek, Xingkun Liu, and Verena Rieser. User Evaluation of a Multi-dimensional Statistical Dialogue System, In: SIGDIAL, Stockholm. arXiv Poster

2018

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Findings of the E2E NLG Challenge, In: INLG, Tilburg. arXiv Web Slides
  • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity, In: EMNLP, Brussels. arXiv Github Poster
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. RankME: Reliable Human Ratings for Natural Language Generation, In: NAACL, New Orleans. arXiv Poster Github
  • Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. Improving Context Modelling in Multimodal Dialogue Generation, In: INLG, Tilburg. arXiv Github Poster
  • Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. A Knowledge-Grounded Multimodal Search-Based Conversational Agent, In: SCAI EMNLP workshop, Brussels. arXiv Github Poster
  • Igor Shalyminov, Ondřej Dušek, and Oliver Lemon. Neural Response Ranking for Social Conversation: A Data-Efficient Approach, In: SCAI EMNLP workshop, Brussels. arXiv Github Slides

2017

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Referenceless Quality Estimation for Natural Language Generation, In: ICML Workshop on Learning to Generate Natural Language, Sydney. arXiv Poster Slides Github
  • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, and Verena Rieser. Why We Need New Evaluation Metrics for NLG, In: EMNLP, Copenhagen. arXiv Github
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. The E2E Dataset: New Challenges For End-to-End Generation, In: SIGDIAL, Saarbrücken. arXiv Poster Slides Video Web
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. Data-driven Natural Language Generation: Paving the Road to Success, In: WiNLP, Vancouver. arXiv
  • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, and Oliver Lemon. An Ensemble Model with Ranking for Social Dialogue, In: NIPS 2017 Workshop on Conversational AI, Long Beach. arXiv
  • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, and Oliver Lemon. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback, In: Alexa Prize, Las Vegas. PDF

2016

  • Ondřej Dušek and Filip Jurčíček. A Context-aware Natural Language Generator for Dialogue Systems, In: SIGDIAL, Los Angeles. PDF arXiv Github
  • Ondřej Dušek and Filip Jurčíček. Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings, In: ACL, Berlin. PDF arXiv Github
  • Ondřej Bojar, Ondřej Dušek, Tom Kocmi, Jindřich Libovický, Michal Novák, Martin Popel, Roman Sudarikov, and Dušan Variš. CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered, In: TSD, Brno. SpringerLink
  • Rudolf Rosa, Martin Popel, Ondřej Bojar, David Mareček, and Ondřej Dušek. Moses & Treex Hybrid MT Systems Bestiary, In: Deep MT Workshop, Lisbon. PDF
  • Roman Sudarikov, Ondřej Bojar, Ondřej Dušek, Martin Holub, and Vincent Kríž. Verb Sense Disambiguation in Machine Translation, In: HyTra-6, Osaka. PDF
  • Ondřej Dušek and Filip Jurčíček. A Context-aware Natural Language Generation Dataset for Dialogue Systems, In: RE-WOCHAT, Portorož. PDF Slides

2015

  • Rudolf Rosa, Ondřej Dušek, Michal Novák, and Martin Popel. Translation Model Interpolation for Domain Adaptation in TectoMT, In: Deep MT Workshop, Prague, 2015 PDF Slides
  • Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel, and Rudolf Rosa. New Language Pairs in TectoMT, In: WMT, Lisbon, 2015 PDF Poster
  • Ondřej Dušek and Filip Jurčíček. Training a Natural Language Generator from Unaligned Data, In: ACL-IJCNLP, Beijing. PDF Slides Poster Video Github
  • Ondřej Dušek, Eva Fučíková, Jan Hajič, Martin Popel, Jana Šindlerová, and Zdeňka Urešová. Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation, In: Depling, Uppsala. PDF Slides
  • Zdeňka Urešová, Ondřej Dušek, Eva Fučíková, Jan Hajič, and Jana Šindlerová. Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus, In: LAW IX, Denver. PDF

2014

  • Daniela Majchráková, Ondřej Dušek, Jan Hajič, Agáta Karčová, Radovan Garabík. Semi-automatic Detection of Multiword Expressions in the Slovak Dependency Treebank, In: Computational Linguistics in Bulgaria, Sofia. PDF
  • Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský and Jan Hajič. HamleDT: Harmonized multi-language dependency treebank, in: Language Resources and Evaluation (48) 4, December 2014. SpringerLink
  • Ondřej Dušek, Ondřej Plátek, Lukáš Žilka, and Filip Jurčíček. Alex: Bootstrapping a Spoken Dialogoue System for a New Domain by Real Users, in: SIGDIAL, Philadelphia. PDF Poster
  • Ondřej Dušek, Jan Hajič, Jaroslava Hlaváčová, Michal Novák, Pavel Pecina, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová and Daniel Zeman. Machine Translation of Medical Texts in the Khresmoi Project, in: WMT, Baltimore. PDF
  • Ondřej Dušek, Jan Hajič, and Zdeňka Urešová: Verbal Valency Frame Detection and Selection in Czech and English, in: EVENTS, Baltimore. PDF Poster
  • Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, and Zdeňka Urešová: Adaptation of Machine Translation for Multilingual Information Retrieval in the Medical Domain, in: Artificial Inteligence in Medicine (61) 3. ScienceDirect
  • Matěj Korvas, Ondřej Plátek, Ondřej Dušek, Lukáš Žilka, and Filip Jurčíček: Free English and Czech Telephone Speech Corpus Shared Under the CC-BY-SA 3.0 License, in: LREC, Reykjavík. PDF Slides
  • Zdeňka Urešová, Ondřej Dušek, Jan Hajič, and Pavel Pecina: Multilingual Test Sets for Machine Translation of Search Queries for Cross-lingual Information Retrieval in the Medical Domain, in: LREC, Reykjavík. PDF Poster

2013

  • Ondřej Dušek, Filip Jurčíček: Robust Multilingual Statistical Morphological Generation Models, in: ACL Student Research Workshop, Sofia. PDF Slides Video Github
  • Ondřej Dušek: Towards a Truly Statistical Natural Language Generator for Spoken Dialogues, in: Week of Doctoral Students. Prague. PDF Slides
  • Aleš Tamchyna, Ondřej Dušek, Rudolf Rosa, Pavel Pecina: MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service, in: The Prague Bulletin of Mathematical Linguistics 100, 31-40. PDF Poster Github

2012

  • Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin Majliš, Michal Novák, David Mareček: Formemes in English-Czech Deep Syntactic MT, in: WMT, Montréal. PDF
  • Rudolf Rosa, David Mareček, Ondrej Dušek: DEPFIX: A System for Automatic Correction of Czech MT Outputs, in: WMT, Montréal. PDF
  • Rudolf Rosa, Ondřej Dušek, David Mareček, Martin Popel: Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors, in: SSST-6, Jeju. PDF
  • Ondřej Bojar, Zdeněk Žabokrtský, Ondrej Dušek, Petra Galušcáková, Martin Majliš, David Marecek, Jiří Maršík, Michal Novák, Martin Popel, Aleš Tamchyna: The Joy of Parallelism with CzEng 1.0, in: LREC, Istanbul. PDF

Theses

  • Novel Methods for Natural Language Generation in Spoken Dialogue Systems. Ph.D. Thesis, Faculty of Mathematics and Physics, Charles University, Prague, 2017. PDF Summary PDF slides
  • Confrontation of Czech and German valency lexicons. Master's thesis, Faculty of Arts, Charles University in Prague, 2013. (in German) PDF
  • Deep automatic analysis of English. Master's thesis, Faculty of Mathematics and Physics, Charles University in Prague, 2010. PDF
  • BashCommander. Bachelor thesis, Faculty of Mathematics and Physics, Charles University in Prague, 2007. PDF

Talks

  • Challenges in Response Generation and Conversational AI. ILCC/HCRC Seminar, University of Edinburgh. Sep 14, 2018. PPTX Slides (24MB)
  • Can You Be Friends with a Smart Speaker Device? Pint of Science Festival, Edinburgh. May 15, 2018. PPTX Slides (63MB)
  • Sequence-to-sequence Natural Language Generation. University of Sheffield. Jun 1, 2017. Slides
  • Home Intelligent? Assistants. Edinburgh Science Festival. Apr 8, 2017. PPTX slides (63MB)
  • Sequence-to-sequence Natural Language Generation for Spoken Dialogue Systems. ÚFAL Monday seminar. Mar 28, 2017. Slides Video
  • Sequence-to-sequence Natural Language Generation. HWU Interaction Lab meeting. Nov 16, 2016. Slides
  • Sequence-to-sequence Natural Language Generation. Diligent project meeting. Nov 10, 2016. Slides
  • Natural Language Generation (Mostly) for Spoken Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 11, 2016. Slides
  • Natural Language Generation for Spoken Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 14, 2015. Slides
  • A Two-stage Syntax-based Natural Language Generator. ÚFAL Monday seminar. Mar 9, 2015. Slides Video
  • Tecto to AMR and Translation (with Tim O'Gorman and others). JHU/CLSP Fred Jelinek Memorial PIRE Workshop, Aug 1, 2014. Slides Video
  • Ein Vergleich der deutschen und tschechischen Valenzwörterbücher durch Korpusanalyse und Befragung unter Linguisten. The 4th PRAGESTT Students' German Philology Conference. Mar 21, 2014. (in German) Slides Handout
  • Natural Language Generation (Not Only) in Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 22, 2013. Slides
  • Learning Morphology from the Corpus. ÚFAL Monday seminar. Nov 11, 2013. Slides Video