[A stylized clam.]

Computational
Linguistics
at
Manitoba

About CLAM

The Computational Linguistics at Manitoba (CLAM) Lab advances research at the confluence of artificial intelligence and linguistics:

  • We develop digital tools, methods, and resources that help linguists develop and test linguistic theories.
  • We develop natural language processing systems that help digital humanists and computational social scientists make sense of large, unstructured text collections.
  • We develop interactive language technology that supports the work of writers, translators, and other professional knowledge workers.

Team

CLAM is headed by Dr. Tristan Miller at the University of Manitoba's Department of Computer Science.

The Lab maintains close working relationships with the Austrian Research Institute for Artificial Intelligence (OFAI) and the Semantic Artificial Intelligence and Creativity Laboratory (SAICL) at Texas A&M University–Commerce.

Interested in joining our team? We are currently offering funded PhD positions for research topics in computational humour, historical born-digital corpora, and Indigenous language technology.

News

Featured publications

Liana Ermakova, Anne-Gwenn Bosser, Tristan Miller, Victor Manuel Palma Preciado, Grigori Sidorov, and Adam Jatowt.
Overview of the CLEF 2024 JOKER track: Automatic humour analysis.
In Lorraine Goeuriot, Philippe Mulhem, Georges Quénot, Didier Schwab, Laure Soulier, Giorgio Maria Di Nunzio, Petra Galuščáková, Alba García Seco de Herrera, Guglielmo Faggioli, and Nicola Ferro, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), volume 14959 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 165–182, Cham, September 2024. Springer. ISBN 978-3-031-71907-3. DOI: 10.1007/978-3-031-71908-0_8.
The JOKER Lab series at the Conference and Labs of the Evaluation Forum (CLEF) was established in 2022 to promote collaborative, interdisciplinary research on the automated processing of wordplay and verbal humour. This paper provides an overview of the setup and results of the Lab's 2024 edition. We describe the data and evaluation metrics used for the Lab's three shared tasks (on humour-aware information retrieval, humour classification according to genre and technique, and translation of puns from English to French) and introduce and compare the systems that participated in each task, with particular attention to their approaches and performance.
@inproceedings{ermakova2024overview,
author       = {Liana Ermakova and Anne-Gwenn Bosser and Tristan Miller and Victor Manuel {Palma Preciado} and Grigori Sidorov and Adam Jatowt},
editor       = {Lorraine Goeuriot and Philippe Mulhem and Georges Quénot and Didier Schwab and Laure Soulier and Giorgio Maria Di Nunzio and Petra Galuščáková and Alba García Seco de Herrera and Guglielmo Faggioli and Nicola Ferro},
title        = {Overview of the {CLEF} 2024 {JOKER} Track: Automatic Humour Analysis},
booktitle    = {Experimental {IR} Meets Multilinguality, Multimodality, and Interaction: Proceedings of the {Fifteenth} {International} {Conference} of the {CLEF} {Association} ({CLEF} 2024)},
volume       = {14959},
pages        = {165--182},
series       = {Lecture Notes in Computer Science},
month        = sep,
year         = {2024},
publisher    = {Springer},
address      = {Cham},
isbn         = {978-3-031-71907-3},
issn         = {0302-9743},
doi          = {10.1007/978-3-031-71908-0_8},
}

In contrast to verbal humour, visual humour remains a relatively underdeveloped area of research. In this exploratory study, we investigate whether scale incongruity – i.e., discrepancy between the expected and actual experience of the size of an object – can serve as a source of humour in the visual modality. We adapt a pre-existing visual data set of mundane scenes by altering the size of an individual object in each scene and collecting humorousness ratings from human annotators on the original and scale-distorted versions. Our analysis of these annotations reveals that scenes with distorted objects are perceived to be significantly funnier than the original images.
@article{swaboda2024use,
author       = {Clara Swaboda and Tristan Miller},
title        = {On the Use of Scale Distortion for Visual Humour: {A} Preliminary Analysis},
journal      = {European Journal of Humour Research},
volume       = {12},
number       = {2},
pages        = {206--211},
month        = jun,
year         = {2024},
issn         = {2307-700X},
doi          = {10.7592/EJHR.2024.12.2.904},
}

This article introduces heria, a LaTeX class to format funding proposals for the European Commission's Horizon Europe program. It provides a basic summary of the class's use; compares it to existing packages for funding proposals; discusses its motivations, design decisions, and limitations; and reports on its real-world use and plans for future development. Besides providing prospective Horizon Europe applicants with an overview of the class, this article may give prospective developers and users of classes for other proposal types some idea of the work involved and the potential pitfalls.
@article{miller2024preparing,
author       = {Tristan Miller},
title        = {Preparing {Horizon} {Europe} Proposals in {\LaTeX}{} with {heria}},
journal      = {{TUGboat}: The Communications of the {\TeX}{} {Users} {Group}},
volume       = {45},
number       = {1},
pages        = {59--64},
month        = apr,
year         = {2024},
issn         = {0896-3207},
doi          = {10.47397/tb/45-1/tb139miller-horizon},
}

Liana Ermakova, Anne-Gwenn Bosser, Adam Jatowt, and Tristan Miller.
The JOKER Corpus: English–French parallel data for multilingual wordplay recognition.
In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2796–2806, New York, NY, July 2023. Association for Computing Machinery. ISBN 978-1-4503-9408-6. DOI: 10.1145/3539618.3591885.
Despite recent advances in information retrieval and natural language processing, rhetorical devices that exploit ambiguity or subvert linguistic rules remain a challenge for such systems. However, corpus-based analysis of wordplay has been a perennial topic of scholarship in the humanities, including literary criticism, language education, and translation studies. The immense data-gathering effort required for these studies points to the need for specialized text retrieval and classification technology, and consequently for appropriate test collections. In this paper, we introduce and analyze a new dataset for research and applications in the retrieval and processing of wordplay. Developed for the JOKER track at CLEF 2023, our annotated corpus extends and improves upon past English wordplay detection datasets in several ways. First, we introduce hundreds of additional positive examples; second, we provide French translations for the examples; and third, we provide negative examples with characteristics closely matching those of the positive examples. This last feature helps ensure that AI models learn to effectively distinguish wordplay from non-wordplay, and not simply texts differing in length, style, or vocabulary. Our test collection represents then a step towards wordplay-aware multilingual information retrieval.
@inproceedings{ermakova2023joker,
author       = {Liana Ermakova and Anne-Gwenn Bosser and Adam Jatowt and Tristan Miller},
title        = {The {JOKER} {Corpus}: {English}–{French} Parallel Data for Multilingual Wordplay Recognition},
booktitle    = {{SIGIR} '23: Proceedings of the 46th {International} {ACM} {SIGIR} {Conference} on {Research} and {Development} in {Information} {Retrieval}},
pages        = {2796--2806},
month        = jul,
year         = {2023},
publisher    = {Association for Computing Machinery},
address      = {New York, NY},
isbn         = {978-1-4503-9408-6},
doi          = {10.1145/3539618.3591885},
}