Events – ΔiaLing

Upcoming events

Event Information:

Thu
13
Jun
2024

Giuseppe Magistro (UGent) - "Creating a corpus of web-data with Pyrlato. A demonstration"
2:00 pmLokaal 3.30 - Camelot, Blandijn, Campus Boekentoren
The use of corpora in acoustic analyses has become a standard practice in phonetic phonological research, offering high ecological validity (see e.g. Beckman, 1997; Warner, 2012; Tucker & Mukai, 2023 for a discussion on validity). However, compiling corpora and looking for specific phenomena can be time and resource-consuming. In response to this challenge, we developed a program named Pyrlato, which we aim to demonstrate. Pyrlato is a novel tool designed for creating corpora of real-world spoken data from the web. The tool extracts audio files from YouTube, cutting and extracting desired segments such as specific phonemes, syllables, or words found in YouTube videos. This enables the creation of corpora with tens of thousands of tokens within a few computational hours. Pyrlato works across Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese, i.e. those languages for which YouTube provides automatic subtitles. The software searches for the desired string in the subtitles and, upon finding the match, extracts the relevant audio extract containing the string in .mp3 format (other formats are also possible).

The demonstration will showcase Pyrlato's online version and the application of some case studies.

• Beckman, M.E. (1997).A typology of spontaneous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing Prosody: Computational Models for Processing Spontaneous Speech (pp. 7–26). Springer. http://dx.doi.org/10.1007/978-1-4612-2258-3_2.
• Tucker, B.V., & Mukai, Y. (2023). Spontaneous speech. Cambridge University Press. http://doi.org/10.1017/9781108943024.
• Warner, N. (2012). Methods for studying spontaneous speech. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology (pp. 621–633). Oxford University Press.

Show content

Past events

Event Information:

Mon
29
Apr
2019
IV Cambridge-Ghent Colloquium on the Histories of the Ibero-Romance Languages
2:30 pmGrote Vergaderzaal (Blandijnberg 2, 3rd floor)
Programme:
- 14:30-15:30 Plenary Talk - Manuel Leonetti Complutense University of Madrid: 'Orden de palabras y estructura informativa en la evolución del español'
- 15:30-16:00 Miriam Bouzouita & Antoine Primerano Ghent University: 'La influencia oriental en la gramaticalización del futuro y condicional en español'
- 16:00-16:30 Rocío Díaz Bravo & Miriam Bouzouita University of Granada & Ghent University: 'Usos innovadores de los clíticos de OD y OI en el Retrato de la Loçana andaluza'
- 16:30-17:00 COFFEE
- 17:00-17:30 Plenary Talk - Javier Rodríguez Molina University of Granada / Ghent University: 'Alomorfia IE - IA en los pluscuamperfectos de indicativo medievales'
- 17:30-18:00 Montserrat Batllori & Ioanna Sitaridou University of Girona & University of Cambridge: 'Fronting in the history of Spanish'
- 18:00-18:30 Álvaro Octavio de Toledo y Huerta Autonomous University of Madrid / Ghent University: 'Dislocaciones y doblados: elementos para un álgebra de los objetos clíticos en el primer español moderno'
Show content

Upcoming events

Event Information:

Giuseppe Magistro (UGent) - "Creating a corpus of web-data with Pyrlato. A demonstration"

Past events

Event Information:

IV Cambridge-Ghent Colloquium on the Histories of the Ibero-Romance Languages