Zadarski lingvistički forum: Radionice

Radionice | Courses

Eugenio Goria, Ph.D. (University of Bologna)

The use of ELAN in language documentation: transcribing and annotating spoken corpora

The course is intended as a three-step approach to the use of ELAN for language documentation and, more generally, for the treatment of oral data. At the end of the course, the participants will be able to use ELAN for different purposes with a particular focus on three different and partly independent tasks: transcribing spoken data, creating a set of annotations and performing various types of queries.

During each lesson, an introductory part concerning one specific function of ELAN will be followed by a training session of variable length where the students will have the opportunity to practice with the transcription and annotation of real data in order to get more acquainted with the software. For this reason, the participants are strongly invited to bring their computers to get the most of the course. See the specific requirements below:

Software requirements:
The latest version of ELAN can be downloaded at this link: https://tla.mpi.nl/tools/tla-tools/elan/. Minor parts will involve the use of PRAAT http://www.fon.hum.uva.nl/praat/ and Audacity https://www.audacityteam.org/download/

Hardware requirements:
The participants must bring their laptop computers as well as well as their personal headphones. ELAN works better with keyboard shortcuts, so a mouse is not essential for the training part; however, beginners usually find it easier to use the mouse, especially while transcribing.

Course materials:
All audio materials and transcription samples needed for the training sessions will be provided at the beginning of the course.

The last session will be organized as a question-time where everyone can submit specific questions concerning their own datasets and get practical help. This is addressed to either beginners who already have some recordings and want to figure out a way to treat them with ELAN and to intermediate users who already have worked with ELAN and want to get improvements in specific parts of their work. For this reason, the participants are strongly encouraged to bring their own data and share their experiences in transcribing and annotating data.

List of the topics that will be dealt with in each lesson:

22/01 15:00 – 17:00
Introduction to ELAN: overview of the basic functions and interface
How to start a transcription (Tiers and annotations; Segmentation mode or annotation mode?; If my audio has poor quality; What conventions should I use?; Common mistakes)

23/03 09:00 – 11:00
Annotating with ELAN (Different annotations for different purposes; Tier, types and stereotypes; Controlled vocabularies; Manual POS tagging and morphological glosses; The annotation of multilingual interactions)
Going in and out of ELAN (The Export function; Multiple exports; The Import function)

23/03 11:00 – 13:00
How do I search my corpus? Searching single files; Regular expressions; N-grams; Searching multiple files; Complex searches

Vincenzo Galatà, Ph.D. (ISTC-CNR, Padua)

Best practices in language documentation: designing, building and managing your corpus of spoken data

Corpora are, in general, powerful resources able to document various aspects of language diversity and change. Yet, contrary to a common belief, building a corpus for spoken language documentation and research is a time-consuming and costly work that goes far beyond the “simple” collection of a certain number of recordings. Indeed, the construction of a corpus of spoken data opens up a myriad of options from which the researcher can choose, and there are many variables at play during the whole process of corpus design, collection and management.

In this course, we will address and learn how to deal with the different aspects of this process: aspects that in the end may (positively or negatively) characterise your corpus as a resource.

Factors addressed will range from stimulus selection, elicitation techniques, participant sampling, recording conditions, recording equipment and software, recording characteristics, to data coding and management.

The course will also address and consider the use of different tools for specific pre- or post-processing operations on the collected data. Taking advantage of the interoperability that characterises some of the most known tools, the course will further provide:

1) practical “tips and tricks” consisting of (often) simple solutions contributing to a drastic reduction of possible human errors and at the same time ensuring the overall quality of the final resource (be it an audio recording or a time-aligned transcription or annotation);

2) possible solutions and procedures which may help you in significantly speeding up the completion of repetitive operations on hundreds of files thus reducing the time consuming workload implicit in the preparation and processing of resources of recorded speech.

At the end of the course, participants will have acquired key skills and best practices that will be useful to them in the creation and management of their own collected resources by further ensuring that the data gathered is valuable, and usable, even in other research domains not previously considered.

Where appropriate, short hands-on sessions will allow the attendees to grasp the key concepts and direct consequences of the choices made. To this end, please bring your own recording device with you (if you have one that is portable) and a copy of at least 5 of your own recordings (files) on which we will try and test a few things we will address and discuss during the lessons (if time allows it).

Bring your own laptop and headphones. Please also make sure you install the following tools on your computer before the school begins (follow the instructions on the download pages and select the appropriate version for your computer’s operating system):

- Praat v. 6.0.37: http://www.fon.hum.uva.nl/praat/

- Audacity: https://www.audacityteam.org/

- Advanced text editor; e.g. Notepad++ (Windows): https://notepad-plus-plus.org/download/v7.5.4.html, BBEdit (Mac): http://www.barebones.com/products/bbedit/

- Meld (Windows and Mac: check the download page for details): http://meldmerge.org/

- File renaming tool; e.g.: Bulk Rename Utility (Windows): http://www.bulkrenameutility.co.uk/Download.php, NameChanger (Mac): https://mrrsoftware.com/namechanger/

Assistant professor Zvjezdana Vrzić (University of Rijeka)

Introduction to FieldWorks Language Explorer (FLEx)

FLEx is a software tool for language documentation and description and it is used to create dictionaries and glossed text collections, among others. Participants will gain familiarity with the main FLEx work areas, Lexicon (used to create a dictionary) and Texts and Words (used to create glossed texts) as well as Grammar and Lists. They will learn how to create, open and back up a project, how to create lexical entries and add examples, variants and crossreferences to them, how to create and edit texts and add translations to them, how to analyze texts grammatically and add and edit grammatical categories, how to insert lexical entries and examples form the texts into the dictionary, and how to format, export and print dictionaries and interlinearized texts. The course will include hand-on practice with the FLEx software. Under the guidance of the instructor, participants will do work on sample texts from Vlashki/Zheyanski (Istro-Romanian) and Gurung. They are encouraged to bring their own text in any language of interest to them to practice on.

Course plan:

Session 1: Friday, March 23, 2018, 15:00-17:00
What is FLEx and what is it used for? Example projects. Getting familiar with the FLEx window and its areas and how to create, open and back up a project.

Session 2: Saturday, March 24, 2018, 9:00-11:00
How to create lexical entries and add examples to them, how to add variants and cross-references, how to use and modify grammar category edit and dialect labels, how to create and edit texts and add translations to them, how to analyze texts grammatically and add and edit grammatical categories, how to insert lexical entries and examples form the texts into the dictionary.

Session 3: Saturday, March 24, 2018, 11:00-13:00
How to format, export and print dictionaries and interlinearized texts. Additional hand-on practice with the FLEx software on assigned or own data under the guidance of the instructor.

Before the workshop, participants should download and install onto their laptops the latest stable edition of FLEx (8.3.10 or higher, Fieldwork SE Full): http://software.sil.org/fieldworks/download/fw-8311/.

Participants should also download (and optionally print out) the FLEx workbook, developed by the CoLang team, available at: http://www.uta.edu/faculty/cmfitz/swnal/projects/CoLang/courses/FLEx_1/CoLang_FLEx1_Handout.pdf.

Additional materials for download (such as data files) will be placed in the course drive to be shared with registered workshop participants.

For different sources of help provided by SIL, the software developer, visit:
https://software.sil.org/fieldworks/support/getting-help-and-training/