In 2013, I started my work on the design and co-development of Annotation Pro, a freely available software tool for annotation of linguistic and paralinguistic features. The programme offers a multilayer annotation interface, spectrogram display, graphical feature representation for annotation using continuous rating scales as well as perception test options, Annotation Pro is freely available for research and education purposes. Annotation Pro is continuously evolving – We are working on both its interface and functionality on a current basis. You can download the current version here: annotationpro.org/downloads. A number of plugins and external modules have so far been developed. One of them is the automatic transcription and segmentation module ANNPRO available from the CLARIN-PL repository (here). Any feedback is very welcome!
Resources for the analysis of linguistic and paralinguistic features in speech
Since 2014, I have been involved in Borderland, a project addressed at the documentation and interdisciplinary analysis of phenomena related to interpersonal communication in the region of Słubice (Poland) and Frankfurt Oder (Germany) – on the border of languages and cultures (http://borderland.amu.edu.pl/).
One of the corpora I co-developed earlier in 2013 is the Paralingua corpus for the study of linguistic and paralinguistic features (cf. Klessa et al., 2013 published in the proceedings of CILC 2013). If you are interested in using the Paralingua corpus for your research please contact me at firstname.lastname@example.org The corpus is freely available for non-commercial research purposes after confirmation of reading and accepting the user’s licence. It is sufficient to send the confirmation of reading and accepting the licence as an e-mail attachment. Please specify whether you are interested in EMO or DIAL subcorpus.
Text and Speech Corpora for Speech Technology
In the years 2006-2010 I worked within research projects aiming at creating very large text and speech corpora for automatic speech synthesis and recognition for Polish. The resulting corpora are e.g. the Jurisdic acoustic database (approximately 2000 voices delivering read and semi-spontaneous speech, currently deposited at Speech and Language Data Repository (SLDR/ORTOLANG), here) and the Speechlabs ASR lexical database (above 3 mln vocabulary items phonetically transcribed, accompanied by inflection information).
Endangered Languages – Corpora and Education Resources
Following my interests in various kinds of speech and language corpora I have also become involved in cooperation with a team of colleagues working on the issues of endangered languages within two projects: Dziedzictwo językowe Rzeczypospolitej. Baza dokumentacji zagrożonych języków – Poland’s Linguistic Heritage. Development of a Documentation Database for Endangered Languages : www.inne-jezyki.amu.edu.pl, and INNET – European Project for Endangered Languages Archive Network Management and Reinforcement. The product of the Innet project is among others the website languagesindanger.eu
I am interested in various types of prosodic phenomena, temporal organization of spoken language, the functions of melody in speech as well as the interactions between speech prosody and other aspects of human communication such as gesture or mimicry. I wrote my doctoral dissertation about Polish segmental duration modelling for the purposes of speech synthesis (2006, Adam Mickiewicz University, The Institute of Linguistics). My adventure with speech prosody started even before I achieved my PhD, at the time when I had participated in a research project focusing on Polish intonation whose result was the 1st digitally recorded Polish corpus of (semi)spontaneous task-oriented dialogues (Polish Intonation Database).