Ondřej Dušek

Ondřej Dušek

Charles University

Prague, Czechia

odusekufal.mff.cuni.cz


Me

I am an Assistant Professor at the Institute of Formal and Applied linguistics, Charles University in Prague, working on the NG-NLG ERC project (new methods for natural language generation). My main research focus is machine learning and deep learning for natural language processing, especially neural language models for natural language generation (NLG) and human-computer dialogue (dialogue systems, chatbots). I previously worked in the Interaction Lab at Heriot-Watt University in Edinburgh, where I also co-advised the Heriot-Watt University team in the Amazon Alexa Prize chatbot competition.

I did my PhD on language generation for dialogue systems at the Institute of Formal and Applied linguistics, Charles University in Prague, where I also worked on multiple grant projects related to dialogue systems and machine translations.

News

Projects

Research projects I work and have worked on:

Current

  • NG-NLG – Language generation with neural & symbolic methods (ERC StG, 2022-2027)
  • THEaiTRE – Automatically generating a theatre play (Czech Technical Agency, 2020-2022)
  • EDU-AI – Education chatbot assistant (Czech Technical Agency, 2021-2023)

Past

  • NaMuDDiS – multi-domain dialogue systems (2019-2021)
  • METOD – dialogue management (industry cooperation with Agnostix co-funded by City of Prague, 2020)
  • MaDrIgAL – spoken dialogue systems (and natural language generation, 2016-2018)
  • Alexa Prize Challenge – chatbots (2017-2018)
  • DILiGENt – natural language generation (2016-2018)
  • AdaNLG – adaptive natural language generator (2014–2016)
  • Vystadial – spoken dialogue systems (2012–2016)
  • QTLeap – semantic machine translation (2013–2016)
  • Khresmoi – medical information retrieval (working on machine translation, 2013–2014)
  • FAUST – improving machine translation fluency (2011–2013)

Tools

Open-source software I (co-)built:

  • RatPred – trainable NLG evaluation tool
  • TGen – a statistical natural language generator
  • Treex – a modular NLP toolkit
  • Alex – spoken dialogue system framework
  • MTMonkey – machine translation web services infrastructure
  • Flect – statistical morphology generation

Students

Students I supervise:

  • Vojtěch Hudeček (Ph.D. with Zdeněk Žabokrtský, since 2018)
  • Zdeněk Kasner (Ph.D. since 2019)
  • Sourabrata Mukherjee (Ph.D. since 2019)
  • Daniel Štancl (Ph.D. since 2020)
  • Ondřej Plátek (Ph.D. since 2021)
  • Patrícia Schmidtová (completed BSc. with Vojtěch Hudeček, 2018–2019; MSc. since 2020)
  • Jaroslav Šafář (MSc. since 2021)
  • Ondřej Motlíček (MSc. since 2021)
  • František Trebuňa (MSc. since 2021)
  • Jiří Balhar (MSc. since 2022)
  • Peter Grajcar (MSc. since 2022)
  • Nalin Kumar (MSc. since 2022)
  • Kristína Szabová (MSc. since 2022)
  • Saad Obaid (MSc., LCT with Uni Saarbrücken, with Vera Demberg & Iza Škrjanec, since 2022)
  • Jakub Růžička (BSc. with Jan Cuřín & Martin Čmejrek, since 2022)

My former students:

  • Shubham Agarwal (Ph.D. at Heriot-Watt, with Verena Rieser & Ioannis Konstas, 2017-2019)
  • Vojtěch John (completed BSc. 2021-2022)
  • Jonáš Kulhánek (completed MSc. 2020-2021)
  • Tomáš Nekvinda (completed MSc. 2019–2020; Ph.D. 2020-2022)
  • Borek Požár (completed BSc. with Martin Čmejrek & Jan Cuřín, 2020-2021)
  • Jan Vainer (completed MSc. 2019–2020)
  • Xinnuo Xu (completed Ph.D., CDT Robotics Edinburgh, with Verena Rieser & Ioannis Konstas, 2016–2021)
If you're interested in doing a bachelor's/master's thesis or a PhD with me, please email me.

Publications

Papers

2021

  • Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek. AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models, in: NLP4ConvAI Workshop. arXiv
  • Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas. MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization, In: EMNLP Findings. Anthology
  • Emiel van Miltenburg, Miruna Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen. Underreporting of errors in NLG output, and what to do about it, In: INLG (Commendation for an outstanding position paper). Anthology
  • Zdeněk Kasner, Simon Mille and Ondřej Dušek. Text-in-Context: Token-Level Error Detection for Table-to-Text Generation, In: INLG Anthology Poster.
  • Vojtěch Hudeček, Ondřej Dušek and Zhou Yu. Discovering Dialogue Slots with Weak Supervision, In: ACL. Anthology
  • Xinnuo Xu, Ondřej Dušek, Verena Rieser and Ioannis Konstas. AggGen: Ordering and Aggregating while Generating, In: ACL. Anthology
  • Tomáš Nekvinda and Ondřej Dušek. Shades of BLEU, Flavours of Success: The Case of MultiWOZ, In: GEM Workshop. Anthology
  • Sebastian Gehrmann et al. (50+ authors). The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics, In: GEM Workshop. Anthology
  • Léon-Paul Schaub, Vojtěch Hudeček, Daniel Štancl, Ondřej Dušek and Patrick Paroubek. Defining And Detecting Inconsistent System Behavior inTask-oriented Dialogues, In: TALN-RECITAL. Anthology

2020

  • Ondřej Dušek and Zdeněk Kasner. Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference, In: INLG (Best Paper Award). ACL Anthology Video Code
  • Zdeněk Kasner and Ondřej Dušek. Data-to-Text Generation with Iterative Text Editing, In: INLG. ACL Anthology
  • Zdeněk Kasner and Ondřej Dušek. Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation, In: WebNLG+ Workshop. PDF
  • Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, and Ondřej Dušek. Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task, In: WNGT Workshop. ACL Anthology
  • Tomáš Nekvinda and Ondřej Dušek. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech, In: Interspeech. ISCA Archive Code, samples and demo
  • Jan Vainer and Ondřej Dušek. SpeedySpeech: Efficient Neural Speech Synthesis, In: Interspeech. ISCA Archive Code and samples
  • Xinnuo Xu, Ondřej Dušek, Jingyi Li, Verena Rieser, and Ioannis Konstas. Fact-based Content Weighting for Evaluating Abstractive Summarisation, In: ACL. ACL Anthology Video Code

2019

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge, In: Computer Speech and Language. ScienceDirect arXiv Web
  • Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, and Verena Rieser. Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking), In: INLG, Tokyo. arXiv slides Github
  • Ondřej Dušek, David M. Howcroft, and Verena Rieser. Semantic Noise Matters for Neural Natural Language Generation, In: INLG, Tokyo. PDF poster Github
  • Ondřej Dušek and Filip Jurčíček. Neural Generation for Czech: Data and Baselines, In: INLG, Tokyo. arXiv slides Github (data) Github (code)
  • Simon Keizer, Ondřej Dušek, Xingkun Liu, and Verena Rieser. User Evaluation of a Multi-dimensional Statistical Dialogue System, In: SIGDIAL, Stockholm. ACL arXiv Poster code

2018

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Findings of the E2E NLG Challenge, In: INLG, Tilburg. arXiv Web Slides
  • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity, In: EMNLP, Brussels. arXiv Github Poster
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. RankME: Reliable Human Ratings for Natural Language Generation, In: NAACL, New Orleans. arXiv Poster Github
  • Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. Improving Context Modelling in Multimodal Dialogue Generation, In: INLG, Tilburg. arXiv Github Poster
  • Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. A Knowledge-Grounded Multimodal Search-Based Conversational Agent, In: SCAI EMNLP workshop, Brussels. arXiv Github Poster
  • Igor Shalyminov, Ondřej Dušek, and Oliver Lemon. Neural Response Ranking for Social Conversation: A Data-Efficient Approach, In: SCAI EMNLP workshop, Brussels. arXiv Github Slides

2017

  • Ondřej Dušek, Jekaterina Novikova, and Verena Rieser. Referenceless Quality Estimation for Natural Language Generation, In: ICML Workshop on Learning to Generate Natural Language, Sydney. arXiv Poster Slides Github
  • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, and Verena Rieser. Why We Need New Evaluation Metrics for NLG, In: EMNLP, Copenhagen. arXiv Github
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. The E2E Dataset: New Challenges For End-to-End Generation, In: SIGDIAL, Saarbrücken. arXiv Poster Slides Video Web
  • Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. Data-driven Natural Language Generation: Paving the Road to Success, In: WiNLP, Vancouver. arXiv
  • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, and Oliver Lemon. An Ensemble Model with Ranking for Social Dialogue, In: NIPS 2017 Workshop on Conversational AI, Long Beach. arXiv
  • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, and Oliver Lemon. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback, In: Alexa Prize, Las Vegas. PDF

2016

  • Ondřej Dušek and Filip Jurčíček. A Context-aware Natural Language Generator for Dialogue Systems, In: SIGDIAL, Los Angeles. PDF arXiv Github
  • Ondřej Dušek and Filip Jurčíček. Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings, In: ACL, Berlin. PDF arXiv Github
  • Ondřej Bojar, Ondřej Dušek, Tom Kocmi, Jindřich Libovický, Michal Novák, Martin Popel, Roman Sudarikov, and Dušan Variš. CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered, In: TSD, Brno. SpringerLink
  • Rudolf Rosa, Martin Popel, Ondřej Bojar, David Mareček, and Ondřej Dušek. Moses & Treex Hybrid MT Systems Bestiary, In: Deep MT Workshop, Lisbon. PDF
  • Roman Sudarikov, Ondřej Bojar, Ondřej Dušek, Martin Holub, and Vincent Kríž. Verb Sense Disambiguation in Machine Translation, In: HyTra-6, Osaka. PDF
  • Ondřej Dušek and Filip Jurčíček. A Context-aware Natural Language Generation Dataset for Dialogue Systems, In: RE-WOCHAT, Portorož. PDF Slides

2015

  • Rudolf Rosa, Ondřej Dušek, Michal Novák, and Martin Popel. Translation Model Interpolation for Domain Adaptation in TectoMT, In: Deep MT Workshop, Prague, 2015 PDF Slides
  • Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel, and Rudolf Rosa. New Language Pairs in TectoMT, In: WMT, Lisbon, 2015 PDF Poster
  • Ondřej Dušek and Filip Jurčíček. Training a Natural Language Generator from Unaligned Data, In: ACL-IJCNLP, Beijing. PDF Slides Poster Video Github
  • Ondřej Dušek, Eva Fučíková, Jan Hajič, Martin Popel, Jana Šindlerová, and Zdeňka Urešová. Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation, In: Depling, Uppsala. PDF Slides
  • Zdeňka Urešová, Ondřej Dušek, Eva Fučíková, Jan Hajič, and Jana Šindlerová. Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus, In: LAW IX, Denver. PDF

2014

  • Daniela Majchráková, Ondřej Dušek, Jan Hajič, Agáta Karčová, Radovan Garabík. Semi-automatic Detection of Multiword Expressions in the Slovak Dependency Treebank, In: Computational Linguistics in Bulgaria, Sofia. PDF
  • Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský and Jan Hajič. HamleDT: Harmonized multi-language dependency treebank, in: Language Resources and Evaluation (48) 4, December 2014. SpringerLink
  • Ondřej Dušek, Ondřej Plátek, Lukáš Žilka, and Filip Jurčíček. Alex: Bootstrapping a Spoken Dialogoue System for a New Domain by Real Users, in: SIGDIAL, Philadelphia. PDF Poster
  • Ondřej Dušek, Jan Hajič, Jaroslava Hlaváčová, Michal Novák, Pavel Pecina, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová and Daniel Zeman. Machine Translation of Medical Texts in the Khresmoi Project, in: WMT, Baltimore. PDF
  • Ondřej Dušek, Jan Hajič, and Zdeňka Urešová: Verbal Valency Frame Detection and Selection in Czech and English, in: EVENTS, Baltimore. PDF Poster
  • Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, and Zdeňka Urešová: Adaptation of Machine Translation for Multilingual Information Retrieval in the Medical Domain, in: Artificial Inteligence in Medicine (61) 3. ScienceDirect
  • Matěj Korvas, Ondřej Plátek, Ondřej Dušek, Lukáš Žilka, and Filip Jurčíček: Free English and Czech Telephone Speech Corpus Shared Under the CC-BY-SA 3.0 License, in: LREC, Reykjavík. PDF Slides
  • Zdeňka Urešová, Ondřej Dušek, Jan Hajič, and Pavel Pecina: Multilingual Test Sets for Machine Translation of Search Queries for Cross-lingual Information Retrieval in the Medical Domain, in: LREC, Reykjavík. PDF Poster

2013

  • Ondřej Dušek, Filip Jurčíček: Robust Multilingual Statistical Morphological Generation Models, in: ACL Student Research Workshop, Sofia. PDF Slides Video Github
  • Ondřej Dušek: Towards a Truly Statistical Natural Language Generator for Spoken Dialogues, in: Week of Doctoral Students. Prague. PDF Slides
  • Aleš Tamchyna, Ondřej Dušek, Rudolf Rosa, Pavel Pecina: MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service, in: The Prague Bulletin of Mathematical Linguistics 100, 31-40. PDF Poster Github

2012

  • Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin Majliš, Michal Novák, David Mareček: Formemes in English-Czech Deep Syntactic MT, in: WMT, Montréal. PDF
  • Rudolf Rosa, David Mareček, Ondrej Dušek: DEPFIX: A System for Automatic Correction of Czech MT Outputs, in: WMT, Montréal. PDF
  • Rudolf Rosa, Ondřej Dušek, David Mareček, Martin Popel: Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors, in: SSST-6, Jeju. PDF
  • Ondřej Bojar, Zdeněk Žabokrtský, Ondrej Dušek, Petra Galušcáková, Martin Majliš, David Marecek, Jiří Maršík, Michal Novák, Martin Popel, Aleš Tamchyna: The Joy of Parallelism with CzEng 1.0, in: LREC, Istanbul. PDF

Theses

  • Novel Methods for Natural Language Generation in Spoken Dialogue Systems. Ph.D. Thesis, Faculty of Mathematics and Physics, Charles University, Prague, 2017. PDF Summary PDF slides
  • Confrontation of Czech and German valency lexicons. Master's thesis, Faculty of Arts, Charles University in Prague, 2013. (in German) PDF
  • Deep automatic analysis of English. Master's thesis, Faculty of Mathematics and Physics, Charles University in Prague, 2010. PDF
  • BashCommander. Bachelor thesis, Faculty of Mathematics and Physics, Charles University in Prague, 2007. PDF

Talks

  • Large Neural Language Models for Data-to-text Generation. AICZECHIA Seminar, Online. Mar 22, 2022 PDF slides
  • Better Supervision for End-to-end Neural Dialogue Systems. VSG Invited Talks @ FIT, Brno University of Technology. Dec 1, 2021 Web PDF slides Video
  • Accuracy in Neural Text Generation. Heinrich-Heine University of Düsseldorf seminar on Selected Topic in Machine Learning and Natural Language Processing. Jul 23, 2021 PDF slides
  • Dialogue Systems at Charles University. Czechbots conference. Mar 3, 2020. PDF slides
  • Challenges in Neural NLG. ÚFAL Monday seminar. Dec 2, 2019. PDF slides
  • Challenges in Neural NLG. Apple Cambridge. Oct 16, 2019. PDF slides
  • Challenges in Response Generation and Conversational AI. ILCC/HCRC Seminar, University of Edinburgh. Sep 14, 2018. PPTX Slides (24MB)
  • Can You Be Friends with a Smart Speaker Device? Pint of Science Festival, Edinburgh. May 15, 2018. PPTX Slides (63MB)
  • Sequence-to-sequence Natural Language Generation. University of Sheffield. Jun 1, 2017. Slides
  • Home Intelligent? Assistants. Edinburgh Science Festival. Apr 8, 2017. PPTX slides (63MB)
  • Sequence-to-sequence Natural Language Generation for Spoken Dialogue Systems. ÚFAL Monday seminar. Mar 28, 2017. Slides Video
  • Sequence-to-sequence Natural Language Generation. HWU Interaction Lab meeting. Nov 16, 2016. Slides
  • Sequence-to-sequence Natural Language Generation. Diligent project meeting. Nov 10, 2016. Slides
  • Natural Language Generation (Mostly) for Spoken Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 11, 2016. Slides
  • Natural Language Generation for Spoken Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 14, 2015. Slides
  • A Two-stage Syntax-based Natural Language Generator. ÚFAL Monday seminar. Mar 9, 2015. Slides Video
  • Tecto to AMR and Translation (with Tim O'Gorman and others). JHU/CLSP Fred Jelinek Memorial PIRE Workshop, Aug 1, 2014. Slides Video
  • Ein Vergleich der deutschen und tschechischen Valenzwörterbücher durch Korpusanalyse und Befragung unter Linguisten. The 4th PRAGESTT Students' German Philology Conference. Mar 21, 2014. (in German) Slides Handout
  • Natural Language Generation (Not Only) in Dialogue Systems. Lecture in Filip Jurčíček's Statistical Dialogue Systems Course. May 22, 2013. Slides
  • Learning Morphology from the Corpus. ÚFAL Monday seminar. Nov 11, 2013. Slides Video