R&D – Katarzyna Klessa

Annotation Pro

In 2013, I started my work on the design and co-development of Annotation Pro, a freely available software tool for annotation of linguistic and paralinguistic features. The programme offers a multilayer annotation interface, spectrogram display, graphical feature representation for annotation using continuous rating scales as well as perception test options, Annotation Pro is freely available for research and education purposes. Annotation Pro is continuously evolving – We are working on both its interface and functionality on a current basis. You can download the current version here: annotationpro.org/downloads. A number of plugins and external modules have so far been developed. One of them is the automatic transcription and segmentation module ANNPRO available from the CLARIN-PL repository (here). Any feedback is very welcome!

Multimodal communication studies and Dariah-PL research infrastructures

I am a member of MultiCo group at Adam Mickiewicz University in Poznan, Poland. The group is involved in projects aiming at the development of research infrastructure and resorces for the study of multimodal communication. The outcomes of the Dariah-Lab and Dariah-Hub projects include building on-site laboratories for data collection and exploration as well as the development of IT resources and tools for the studies of a wide range of human-human and human-machine communication studies. Read more about AMU modules for Dariah-Lab Read More, View Slides

DELAD Initiative and CLARIN-PL programme board

I am an active member of the steering committee of DELAD. DELAD stands for Database Enterprise for Language And speech Disorders, and is also Swedish for SHARED. The initiative is currently supported by the European CLARIIN ERIC and it aims at sharing corpora of speech of individuals with communication disorders (CSD) among researchers. My involvement in the initiative is a natural consequence of my lively interest in corpus-based research and collaborative efforts towards scientific data interoperability and reusability. I am also a member of the programme board of CLARIN-PL, a Polish scientific consortium creating scientific and technological infrastructures providing language resources and electronic tools for automatic natural language processing.

Resources for the analysis of linguistic and paralinguistic features in speech

I have been involved in Borderland, a project addressed at the documentation and interdisciplinary analysis of phenomena related to interpersonal communication in the region of Słubice (Poland) and Frankfurt Oder (Germany) – on the border of languages and cultures (http://borderland.amu.edu.pl/).

One of the corpora I co-developed earlier in 2013 is the Paralingua corpus for the study of linguistic and paralinguistic features (cf. Klessa et al., 2013 published in the proceedings of CILC 2013). It is sufficient to send the confirmation of reading and accepting the licence as an e-mail attachment. Please specify whether you are interested in EMO or DIAL subcorpus.

Using electromagnetic articulography to investigate speaking styles, speech disorders and primary functions

Infant directed speech (IDS) is a speaking style reported to involve certain facilitatory features that help infants to: segment speech; distinguish between speech sounds; learn new phonological categories in their native language; become more successful in word / language learning; develop better intersensory integration. In a team of researchers including Anita Lorenc and Łukasz Mik we use Carstens AG501 articulograph (at the Applied Phonetcs Lab in Warsaw, read more HERE) to collect speech production data for IDS and ADS (adult directed speech) as well as pilot datasets of disordered speech Read more.

Endangered Languages – Corpora and Education Resources

Following my interests in various kinds of speech and language corpora I have become involved in cooperation with a team of colleagues working on the issues of endangered languages within two projects: Dziedzictwo językowe Rzeczypospolitej. Baza dokumentacji zagrożonych języków – Poland’s Linguistic Heritage. Development of a Documentation Database for Endangered Languages : www.inne-jezyki.amu.edu.pl, and INNET – European Project for Endangered Languages Archive Network Management and Reinforcement. The product of the Innet project is among others the website languagesindanger.eu. As a member of the COLING Marie Skłodowska-Curie RISE (HORIZON 2020) project, I visited The University of Texas at Austin (twice), where we developed interactive education tools and materials, as well as documented speech of the 5th generation of Polish-Americans from the area of Chappell Hill, originatin in Greater Poland, read more at the Colingua platform.

Text and Speech Corpora for Speech Technology

In the years 2006-2010 I worked within research projects aiming at creating very large text and speech corpora for automatic speech synthesis and recognition for Polish. The resulting corpora are e.g. the Jurisdic acoustic database (approximately 2000 voices delivering read and semi-spontaneous speech, currently deposited at Speech and Language Data Repository (SLDR/ORTOLANG) ) and the Speechlabs ASR lexical database (above 3 mln vocabulary items phonetically transcribed, accompanied by inflection information).

Speech Prosody

I am interested in various types of prosodic phenomena, especially temporal organization of spoken language, the functions of melody in speech as well as the interactions between speech prosody and other aspects of human communication such as gesture or mimicry. In my MA project I focused on the acoustic-phonetic properties of vowels in the speech of hearing impaired children, primarily vowel frequency formants variability (2001). My doctoral dissertation was dedicated to Polish segmental duration modelling for the purposes of speech synthesis (2006). During my PhD studies I participated in a research project focusing on Polish intonation whose result was the very 1st digitally recorded Polish corpus of (semi)spontaneous task-oriented dialogues (PoInt: Polish Intonation Database, read more here).