[A stylized clam.]

Computational
Linguistics
at
Manitoba

About CLAM

The Computational Linguistics at Manitoba (CLAM) Lab advances research at the confluence of artificial intelligence and linguistics:

  • We develop digital tools, methods, and resources that help linguists develop and test linguistic theories.
  • We develop natural language processing systems that help digital humanists and computational social scientists make sense of large, unstructured text collections.
  • We develop interactive language technology that supports the work of writers, translators, and other professional knowledge workers.

Team

CLAM is headed by Dr. Tristan Miller at the University of Manitoba's Department of Computer Science.

The Lab maintains close working relationships with the Austrian Research Institute for Artificial Intelligence (OFAI) and the Semantic Artificial Intelligence and Creativity Laboratory (SAICL) at East Texas A&M University.

Interested in joining our team? We are currently offering funded PhD positions for research topics in computational humour, historical born-digital corpora, and Indigenous language technology.

News

Featured publications

Liana Ermakova, Ricardo Campos, Anne-Gwenn Bosser, and Tristan Miller.
Overview of JOKER: Humour in the machine.
In Jorge Carrillo de Albornoz, Julio Gonzalo, Laura Plaza, Alba García Seco de Herrera, Josiane Mothe, Florina Piroi, Paolo Rosso, Damiano Spina, Guglielmo Faggioli, and Nicola Ferro, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction: 16th International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9–12, 2025, Proceedings, volume 16089 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 315–337, Cham, 2025. Springer. ISBN 978-3-032-04353-5. DOI: 10.1007/978-3-032-04354-2_18.
Humour poses a unique challenge for artificial intelligence, as it often relies on non-literal language, cultural references, and linguistic creativity. The JOKER Lab, now in its fourth year, aims to advance computational humour research through shared tasks on curated, multilingual datasets, with applications in education, computer-mediated communication and translation, and conversational AI. This paper provides an overview of the JOKER Lab held at CLEF 2025, detailing the setup and results of its three main tasks: (1) humour-aware information retrieval, which involves searching a document collection for humorous texts relevant to user queries in either English or Portuguese; (2) pun translation, focussed on humour-preserving translation of paronomastic jokes from English into French; and (3) onomastic wordplay translation, a task addressing the translation of name-based wordplay from English into French. The 2025 edition builds upon previous iterations by expanding datasets and emphasising nuanced, manual evaluation methods. The Task 1 results show a marked improvement this year, apparently due to participants' judicious combination of retrieval and filtering techniques. Tasks 2 and 3 remain challenging, not only in terms of system performance but also in terms of defining meaningful and reliable evaluation metrics.
@inproceedings{ermakova2025overview,
author       = {Liana Ermakova and Ricardo Campos and Anne-Gwenn Bosser and Tristan Miller},
editor       = {Jorge Carrillo-de-Albornoz and Julio Gonzalo and Laura Plaza and Alba García Seco de Herrera and Josiane Mothe and Florina Piroi and Paolo Rosso and Damiano Spina and Guglielmo Faggioli and Nicola Ferro},
title        = {Overview of {JOKER}: Humour in the Machine},
booktitle    = {Experimental {IR} Meets Multilinguality, Multimodality, and Interaction: 16th {International} {Conference} of the {CLEF} {Association}, {CLEF}~2025, {Madrid}, {Spain}, {September}~9–12, 2025, Proceedings},
volume       = {16089},
pages        = {315--337},
series       = {Lecture Notes in Computer Science},
year         = {2025},
publisher    = {Springer},
address      = {Cham},
isbn         = {978-3-032-04353-5},
issn         = {0302-9743},
doi          = {10.1007/978-3-032-04354-2_18},
}

Punning is a form of humorous wordplay based on semantic ambiguity between two phonologically similar words – the pun and the target – in a context where both meanings are more or less acceptable. While the pun is expressed explicitly, the target is invoked implicitly in the text. Previous work has attempted to quantify and compare phonological features of puns and their targets, looking at correlations with the understandability of the jokes in which they occur. Our study quantifies the phonological distance between pun and target words and assesses possible correlations with funniness ratings of the corresponding jokes. Our statistical analyses on a large dataset of puns reveal a significant negative correlation between phonological distance and perceived funniness for two of the four phonological distance measures we applied. This finding supports the hypothesis, often (implicitly) made in previous research but never verified at this scale, that lower phonological distance between a pun and its target is associated with higher funniness ratings. The parameters of our study suggest that future work should examine the semantic features of pun and target in order to create a more holistic understanding of what contributes to the perceived funniness of punning jokes.
@article{palmann2025whats,
author       = {Anna Palmann and Tristan Miller},
title        = {What's in a Pun? Assessing the Relationship Between Phonological Distance and Perceived Funniness of Punning Jokes},
journal      = {Humor: International Journal of Humor Research},
volume       = {38},
number       = {4},
year         = {2025},
issn         = {0933-1719},
doi          = {10.1515/humor-2024-0060},
note         = {To appear},
}

Steffen Eger, Yong Cao, Jennifer D'Souza, Andreas Geiger, Christian Greisinger, Stephanie Gross, Yufang Hou, Brigitte Krenn, Anne Lauscher, Yizhi Li, Chenghua Lin, Nafise Sadat Moosavi, Wei Zhao, and Tristan Miller.
Transforming science with large language models: a survey on AI-assisted scientific discovery, experimentation, content generation, and evaluation.
ArXiv e-prints, 2502.05151, February 2025. DOI: 10.48550/arXiv.2502.05151.
With the advent of large multimodal language models, science is now at a threshold of an AI-based technological transformation. Recently, a plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. This includes all aspects of the research cycle, especially (1) searching for relevant literature; (2) generating research ideas and conducting experimentation; generating (3) text-based and (4) multimodal content (e.g., scientific figures and diagrams); and (5) AI-based automatic peer review. In this survey, we provide an in-depth overview over these exciting recent developments, which promise to fundamentally alter the scientific research process for good. Our survey covers the five aspects outlined above, indicating relevant datasets, methods and results (including evaluation) as well as limitations and scope for future research. Ethical concerns regarding shortcomings of these tools and potential for misuse (fake science, plagiarism, harms to research integrity) take a particularly prominent place in our discussion. We hope that our survey will not only become a reference guide for newcomers to the field but also a catalyst for new AI-based initiatives in the area of “AI4Science”.
@article{eger2025transforming,
author       = {Steffen Eger and Yong Cao and Jennifer D'Souza and Andreas Geiger and Christian Greisinger and Stephanie Gross and Yufang Hou and Brigitte Krenn and Anne Lauscher and Yizhi Li and Chenghua Lin and Nafise Sadat Moosavi and Wei Zhao and Tristan Miller},
title        = {Transforming Science with Large Language Models: a Survey on {AI}-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation},
journal      = {{ArXiv} e-prints},
volume       = {2502.05151},
month        = feb,
year         = {2025},
doi          = {10.48550/arXiv.2502.05151},
}

Christian F. Hempelmann, Julia Rayz, Tiansi Dong, and Tristan Miller, editors.
Proceedings of the 1st Workshop on Computational Humor (CHum).
Association for Computational Linguistics, Kerville, TX, January 2025. ISBN 979-8-89176-204-6.
@book{hempelmann2025first,
editor       = {Christian F. Hempelmann and Julia Rayz and Tiansi Dong and Tristan Miller},
title        = {Proceedings of the 1st Workshop on Computational Humor ({CHum})},
month        = jan,
year         = {2025},
publisher    = {Association for Computational Linguistics},
address      = {Kerville, TX},
isbn         = {979-8-89176-204-6},
}