Experiments in Euphemism Detection
Anna Feldman, Montclair State University
November 30, 2022 · 4:30 pm—6:00 pm · 1-S-5 Green Hall
Program in Linguistics
To fully understand human language, machines need to be able to recognize and interpret expressions that contain hidden meanings. This project concentrates on euphemisms, mild or indirect phrases used in place of harsher or more offensive ones. Euphemisms are often used to mask profanity or refer to sensitive topics such as death, sex, religion, disability, or personal relationships in a polite way. People use euphemisms all the time, e.g., ‘negative patient outcome’, ‘between jobs’, ‘financially fortunate’, ‘correctional facility’, ‘friendly fire’, or ‘sunshine unit’. Different cultures/languages use different euphemisms. Euphemisms change over time. Machines that process human language do not understand euphemisms yet.
In this talk, I will present our work in progress, a linguistically-driven proof of concept for automatically detecting euphemisms. We are using linguistic insights to build an algorithm to detect new euphemisms, not previously recorded in dictionaries, without human intervention.
The main observations are 1) euphemistic expressions and their paraphrased counterparts differ in the strength of the sentiment they convey; 2) euphemistic and non-euphemistic interpretation is context-sensitive; and 3) euphemisms are vaguer than the taboo expressions they substitute. I will describe the corpus collection and annotation process as well as a number of pilot experiments, their results and analysis, and suggest future directions.
Anna Feldman is a professor of Linguistics and Computer Science at Montclair State University. She received her Ph.D. in computational linguistics from The Ohio State University, her B.A. in English and East-Asian Studies and M.A. in theoretical linguistics from The Hebrew University in Jerusalem. She is the author of A Resource-light Approach to Morpho-syntactic Tagging (Brill). Her most recent projects deal with the computational processing of figurative language and Internet censorship. She is a recipient of nine NSF awards. Her work has also been supported by the Department of Defense and Army Research Lab. She is a co-organizer of the annual workshop on NLP4IF: Natural Language Processing for Internet Freedom — Censorship, Disinformation, and Propaganda (http://www.netcopia.net/nlp4if/) and a series of workshops on figurative language processing (https://sites.google.com/view/figlang2022). At Montclair she is directing the MA in Applied Linguistics and MS in Computational Linguistics programs as well as chairing the Linguistics Department. To learn more about her lab, visit: https://sites.google.com/view/montclairnlplab/.