Upcoming events
Event Information:
-
Thu13Jun20242:00 pmLokaal 3.30 - Camelot, Blandijn, Campus Boekentoren
Giuseppe Magistro (UGent) - "Creating a corpus of web-data with Pyrlato. A demonstration"
Show contentThe use of corpora in acoustic analyses has become a standard practice in phonetic phonological research, offering high ecological validity (see e.g. Beckman, 1997; Warner, 2012; Tucker & Mukai, 2023 for a discussion on validity). However, compiling corpora and looking for specific phenomena can be time and resource-consuming. In response to this challenge, we developed a program named Pyrlato, which we aim to demonstrate. Pyrlato is a novel tool designed for creating corpora of real-world spoken data from the web. The tool extracts audio files from YouTube, cutting and extracting desired segments such as specific phonemes, syllables, or words found in YouTube videos. This enables the creation of corpora with tens of thousands of tokens within a few computational hours. Pyrlato works across Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese, i.e. those languages for which YouTube provides automatic subtitles. The software searches for the desired string in the subtitles and, upon finding the match, extracts the relevant audio extract containing the string in .mp3 format (other formats are also possible).
The demonstration will showcase Pyrlato's online version and the application of some case studies.
• Beckman, M.E. (1997).A typology of spontaneous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing Prosody: Computational Models for Processing Spontaneous Speech (pp. 7–26). Springer. http://dx.doi.org/10.1007/978-1-4612-2258-3_2.
• Tucker, B.V., & Mukai, Y. (2023). Spontaneous speech. Cambridge University Press. http://doi.org/10.1017/9781108943024.
• Warner, N. (2012). Methods for studying spontaneous speech. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology (pp. 621–633). Oxford University Press.
Past events
Event Information:
-
Thu06Jun20192:30 pmroom 110.037 (Blandijnberg 2, ground floor)
Chris De Wulf (Zürich): "DoDO – Development of Dutch Orthography 1250-1400"
Show contentAbstract
In my talk, I will discuss my planned research on the Development of Dutch Orthography and I hope to exchange ideas on Data Enrichment for the first stage of the project. This first stage will take place within a research visit at Gent University.
The main scope of the project proposed here is the description of unguided (not-steered) development of writing systems for West Germanic dialects based on the Latin alphabet. It will render this from diatopic and diachronic grapheme research on Middle Dutch local charters.
Dutch diachronic orthography research has been the focus of research in the last decennium, however mostly focusing on Early Modern Dutch and later stages, and usually in the context of standardisation. That means it is limited to how orthographic development of a language operates within the parameters of a society that is aware of and pays lip service to a supra-regional, consciously and unconsciously superimposed or pursued variety. In my proposed research, I will focus on the period before Early Modern Dutch and the standardisation processes, and ask the question: “How do scribes cope in writing with the Latin alphabet in their dialects when there is no prescribed standard?”To answer this, the writings of scribes who operate in local writing systems, i.e. written dialect, need to be considered, and this should be done with manuscripts, e.g. handwritten administrative texts of local importance only, such as local charters.
Preliminary research suggests that in case of vowel grapheme systems, the aptness of singular graphemes is gradable and can be described in terms of the phonological distinctive features they may convey accurately (De Wulf 2019, in preparation). This stems from the fact that some graphemes are used to convey many more historical phonemes (i.e. West Germanic allophones) than others, and which graphemes these are, also varies from dialect to dialect. There is a clear indication that vowel grapheme systems in the Eastern dialects contain less accurate graphemes, since more of the historical vowel phonemes have in fact evolved into separate phonemes. My working hypothesis is that an implicational scale of phonological features can be established (per dialect or maybe more generally, dialect region), which means that certain features are to be prioritised in writing systems. This should be investigated for vowel as well as consonant graphemes.
The here proposed project will have to clarify whether this holds through for all types of graphemes, and whether this variety is maintained throughout medieval writing in the period 1250-1400.
As the main deliverable I will provide an open access and electronically published diachronic grapheme atlas with commentary.