Events

Upcoming events

Event Information:

  • Thu
    13
    Jun
    2024

    Giuseppe Magistro (UGent) - "Creating a corpus of web-data with Pyrlato. A demonstration"

    2:00 pmLokaal 3.30 - Camelot, Blandijn, Campus Boekentoren

    The use of corpora in acoustic analyses has become a standard practice in phonetic phonological research, offering high ecological validity (see e.g. Beckman, 1997; Warner, 2012; Tucker & Mukai, 2023 for a discussion on validity). However, compiling corpora and looking for specific phenomena can be time and resource-consuming. In response to this challenge, we developed a program named Pyrlato, which we aim to demonstrate. Pyrlato is a novel tool designed for creating corpora of real-world spoken data from the web. The tool extracts audio files from YouTube, cutting and extracting desired segments such as specific phonemes, syllables, or words found in YouTube videos. This enables the creation of corpora with tens of thousands of tokens within a few computational hours. Pyrlato works across Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese, i.e. those languages for which YouTube provides automatic subtitles. The software searches for the desired string in the subtitles and, upon finding the match, extracts the relevant audio extract containing the string in .mp3 format (other formats are also possible).

    The demonstration will showcase Pyrlato's online version and the application of some case studies.

    • Beckman, M.E. (1997).A typology of spontaneous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing Prosody: Computational Models for Processing Spontaneous Speech (pp. 7–26). Springer. http://dx.doi.org/10.1007/978-1-4612-2258-3_2.
    • Tucker, B.V., & Mukai, Y. (2023). Spontaneous speech. Cambridge University Press. http://doi.org/10.1017/9781108943024.
    • Warner, N. (2012). Methods for studying spontaneous speech. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology (pp. 621–633). Oxford University Press.

     

    Show content

 

Past events

Event Information:

  • Thu
    27
    Apr
    2017

    The reconstruction of proto-Burmish: a case study in the computational implementation of the comparative method

    3:00 pmGrote Vergaderzaal (Blandijn, 3de verdieping)

    Prof. dr. Nathan Hill & Johann-Mattis List (School of Oriental and African Studies/CNRS):

    The reconstruction of proto-Burmish: a case study in the computational implementation of the comparative method

    The use of computational methods in comparative linguistics increases ever in popularity. Nonetheless, the fruits of such methods have so far been meagre when compared to the results the traditional comparative method. This paper explores a dataset of Burmish languages as a case study in improving the methodology of computational reconstruction. In particular are aim is not replace or modify the comparative method, but rather to implement the traditional method using computational tools.

    Our database comprises 400 concepts and their translational counterparts in a dozen Burmish langauges. Concepts are linked to the Concepticon (List et al. 2016), languages are linked to Glottolog. The primary data comes from Huáng et al. (1992.), as digitized by STEDT (Matisoff 2011), but we supplement this with other sources. We employ an iterative workflow combining the absolute rigor of a computer with the insightful intuitions of trained historical linguists. After providing all of the data with unambiguous phonetic interpretations, including the explicit encoding of underdetermined segments, the computer provides a preliminary alignment and reconstruction. These reconstructions are then adjusted with an eye to the relevant literature on proto-Burmish. The adjustments are made inside of the workflow system so that the algorithm and general methodology will be enhanced and made more robust.

    References
    Hammarström, Harald & Forkel, Robert & Haspelmath, Martin & Bank, Sebastian (2015): Glottolog. Leipzig: Max Planck Institute of Evolutionary Anthropology (Available onlie at http://glottolog.org. Accessed on 2016-03-15).
    Huáng Bùfán 黄 布 凡 et al. eds. (1992). Zàng-Miǎn yǔzú yǔyán cíhuì 藏缅语族语言词汇. Běijīng: Zhōngyāng mínzú xuéyuàn chūbǎnshè 中央民族学院出版社.
    List, Johann-Mattis & Cysouw, Michael & Forkel, Robert (eds.) 2015. Concepticon. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://concepticon.clld.org, Accessed on 2016-03-15.)
    Matisoff, James (2011): STEDT. The Sino-Tibetan Etymological Dictionary and Thesaurus. University of California at Berkeley (available online at: http://stedt.berkeley.edu).

    Show content