The Auslan Corpus
The Auslan Corpus consists of 300 hours of digital videos that record 100 deaf native and near-native signers across Australia using Auslan in conversations, interviews, elicitation tasks and story telling. The first version of the Auslan Corpus was deposited in 2008 at the Endangered Languages Archive (ELAR). The aims were: (i) create and secure a reference archive of Auslan because of language endangerment (the numbers of new deaf native sign language users had declined in second half of the twentieth century and the trend was projected to continue this century); and (ii) create a modern linguistic corpus of Auslan which could be used for language research and as a resource for students learning Auslan. The original deposit at ELAR included annotation files for only a small subset of the total recordings in the collection.
Definition: a corpus is: (i) a collection of samples of face-to-face signed or spoken language that have been recorded on audio or video tape and/or written down by someone; or (ii) a collection of samples of language which was originally written, e.g., inscriptions, public notices, official documents, scholarly and scientific works, and literature. In a modern linguistic corpus both types of language samples are copied and digitized so that they can be processed and analysed using computers. For example, researchers add written translations and annotations to recordings. (Annotations are symbols or codes added to a digital recording of speech or signing, or to a digital copy of an example of writing, that give information about the words or signs being used.) The translations and annotations are like captions added to movies. Using special computer programs or applications, researchers can instantly find any specific translation (e.g., English word) or specific annotation in the video.
Since 2008 language researchers have created more detailed annotations of the original sub-set, as well as created annotation files for many more of the previously unannotated videos. During 2022-23 a new deposit of these much more extensive corpus annotation files (and the digital videos they are based on) will be made at Monash University. It will be linked to Auslan Signbank so that it will be possible for users to view real instances of a sign being used by different people in different sentences and contexts in the corpus.
The Auslan Corpus annotation files
At present, 357 movies in the Auslan Corpus have annotation files containing annotations at various levels of detail. Annotations are being added to the corpus all the time. The current annotation files have one or more of the following types of annotations:
- identification and IDglossing of nouns and verbs only
- sign tokenization and IDglossing for all signs
- tagging for sign grammatical class ("part of speech")
- identification of gaze direction during points
- identification of palm orientation during points
- identification of clause boundaries
- identification of verb arguments
- tagging of verb arguments for macro-roles and semantic roles
- tagging for the presence or absence of spatial modification
- the identification of periods constructed action ('role shift')
- free translation
- literal translation.
The amount of time required for the annotation of signed language texts is enormous and it is anticipated that it will take many years before the Auslan archive becomes sufficiently richly annotated (and hence machine-readable) and qualifies as a true linguistic corpus.
Value-adding the movies in the archive with annotations is time consuming and expensive. These annotation files are not publicly available but will be made to fellow researchers on requests on a data-sharing and data-enrichment basis (i.e., access to existing annotation files will be granted on condition that enriched annotation files are returned to the corpus). Research collaboration is also encouraged.
Click here for a copy of the guidelines used to create the annotations for the Auslan Corpus as it now exists. (Last updated 2019.)
A selection of research publications using the Auslan Corpus:
- Johnston, T. (2019). Clause constituents, arguments and the question of grammatical relations in Auslan (Australian sign language): a corpus-based study. Studies in Language, 43(4), 941-996. 10.1075/sl.18035.joh
- Hodge, G., Sekine, K., Schembri, A., & Johnston, T. (2019). Comparing signers and speakers: Building a directly comparable corpus of Auslan and Australian English. Corpora, 14(1), 63-76.
- Hodge, G., Ferrara, L., & Anible, B. (2019). The semiotic diversity of doing reference in a deaf signed language. Journal of Pragmatics, 143, 33-53.
- Schembri, A., Fenlon, J., Cormier, K., & Johnston, T. (2018). Sociolinguistic Typology and Sign Languages. Frontiers in Psychology, 9(Feb). doi:10.3389/fpsyg.2018.00200
- Johnston, T. (2018). A corpus-based study of the role of headshaking in negation in Auslan (Australian Sign Language): implications for signed language typology. Linguistic Typology, 22(2), 185-231. doi:10.1515/lingty-2018-0008
- Ferrara, L., & Hodge, G. (2018). Language as Description, Indication, and Depiction. Frontiers in Psychology, 9(Article 716). doi:10.3389/fpsyg.2018.00716
- Johnston, T., van Roekel, J., & Schembri, A. (2016). On the conventionalization of mouth actions in Auslan (Australian Sign Language). Language and Speech, 59(1), 3-42. doi:10.1177/0023830915569334
- Johnston, T., Cresdee, D., Schembri, A., & Woll, B. (2015). FINISH variation and grammaticalization in a signed language: How far down this well-trodden pathway is Auslan (Australian Sign Language)? Language Variation and Change, 27, 117-155. doi:10.1017/S0954394514000209
- Johnston, T. (2014). The reluctant oracle: using strategic annotations to add value to, and extract value from, a signed language corpus. Corpora, 9(2), 155–189.
- Hodge, G., & Johnston, T. (2014). Points, depictions, gestures and enactment: Partly lexical and non-lexical signs as core elements of single clause-like units in Auslan (Australian sign language). Australian Journal of Linguistics, 34(2), 262-291.
- Ferrara, L., & Johnston, T. (2014). Elaborating Who's What: A Study of Constructed Action and Clause Structure in Auslan (Australian Sign Language). Australian Journal of Linguistics, 34(2), 193-215.
- Cresdee, D., & Johnston, T. (2014). Using corpus-based research to inform the teaching of Auslan as a second language. In D. McKee, R. Rosen, & R. McKee (Eds.), Teaching and Learning of Signed Languages: International Perspectives and Practices (pp. 85-110). Basingstoke, UK: Palgrave McMillan.
- Johnston, T., & Schembri, A. (2013). Corpus Analysis of Sign Languages. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics. doi:10.1002/9781405198431.wbeal0252
- Johnston, T. (2013). Towards a comparative semiotics of pointing actions in signed and spoken languages. Gesture, 13(2), 109-142. doi:10.1075/gest.13.2.01joh
- Johnston, T. (2013). Formational and functional characteristics of pointing signs in a corpus of Auslan (Australian sign language): are the data sufficient to posit a grammatical class of ‘pronouns’ in Auslan? Corpus Linguistics and Linguistic Theory, 9(1), 109-159
- Hodge, G. (2013). Patterns from a signed language corpus: Clause-like units in Auslan (Australian sign language). Macquarie University, Doctoral dissertation, Department of Linguistics.
- Gray, M. (2013). Aspect marking in Auslan: A system of gestural verb modification. (Doctoral dissertation). Macquarie University, Sydney.
- Schembri, A., & Johnston, T. (2012). Sociolinguistic aspects of variation and change. In R. Pfau, M. Steinbach, & B. Woll (Eds.), Sign Languages: An International Handbook (pp. 788-816). Berlin: Mouton de Gruyter.
- Johnston, T., & Ferrara, L. (2012). Lexicalization in signed languages: when is an idiom not an idiom? Proceedings of the 3rd UK Cognitive Linguistics Conference, University of Hertfordshire, 6-8 July 2010, 1, http://www.uk-cla.org.uk/proceedings. Retrieved from http://www.uk-cla.org.uk/proceedings
- Johnston, T. (2012). Lexical Frequency in Sign Languages. Journal of Deaf Studies and Deaf Education, 17(2), 163-193. doi:10.1093/deafed/enr036
- Ferrara, L. (2012). The grammar of depiction: Exploring gesture and language in Australian Sign Language (Auslan). (Doctoral dissertation). Macquarie University, Sydney.
- Schembri, A., Cormier, K., Johnston, T., McKee, D., McKee, R., & Woll, B. (2010). Sociolinguistic variation in British, Australian and New Zealand Sign Languages. In D. Brentari (Ed.), Sign Languages (pp. 476-498). Cambridge: Cambridge University Press
- Johnston, T., & Schembri, A. (2010). Variation, lexicalization and grammaticalization in signed languages. In B. Garcia & M. Derycke (Eds.), Sourds et langue des signs: Normes et variation. Langage et société. (Vol. 131, pp. 5-15). Paris: Editions de la Maison des sciences de l'homme.
- Johnston, T. (2010). Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation. In P. Dreuw, E. Efthimiou, T. Hanke, T. Johnston, G. Martinez-Ruiz, & A. Schembri (Eds.), Proceedings of the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies. Language Resources and Evaluation Conference (LREC) Valletta, Malta, May 2010 (pp. 137-142).
- Johnston, T. (2010). From archive to corpus: transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics, 15(1), 104-129. doi:110.1075/ijcl.1015.1071.1005joh
- Schembri, A., McKee, D., McKee, R. L., Johnston, T., Goswell, D., & Pivac, S. (2009). Phonological variation and change in Australian and New Zealand Sign Languages: The location variable. Language Variation and Change, 21(2), 193-231.
- de Beuzeville, L., Johnston, T., & Schembri, A. (2009). The Use of Space with Indicating Verbs in Auslan: A corpus based investigation. Sign Language & Linguistics, 12(1), 53-82. doi:10.1075/sll.12.1.03deb
- Cassidy, S., & Johnston, T. (2009). Ingesting the Auslan Corpus into the DADA Annotation Store b. In M. Stede & C.-R. Huang (Eds.), LAW III: 3rd Linguistic Annotation Workshop: Proceedings of the Workshop (pp. 154-157). Suntec, Singapore (6-7 August 2009): ACL & AFNLP, USA (ISSBN 978-1-932432-52-7).
- Johnston, T. (2008). Corpus linguistics and signed languages: no lemmata, no corpus. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (Eds.), Proceedings of the Sixth International Language Representation and Evaluation Conference (3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Signed Language Corpora) (pp. 82-87). Marrakech, Morocco (May 26-June 1).
- Johnston, T. (2008). The Auslan Archive and Corpus. In D. Nathan (Ed.), The Endangered Languages Archive—http://elar.soas.ac.uk/languages. London: Hans Rausing Endangered Languages Documentation Project, School of Oriental and African Studies, University of London.
- Schembri, A., & Johnston, T. (2007). Sociolinguistic Variation in the Use of Fingerspelling in Australian Sign Language (Auslan): A Pilot Study. Sign Language Studies, 7(3), 319-347. doi:10.1353/sls.2007.0019
- Johnston, T., Vermeerbergen, M., Schembri, A., & Leeson, L. (2007). “Real data are messy”: Considering cross-linguistic analysis of constituent ordering in Auslan, VGT, and ISL. In P. Perniss, R. Pfau, & M. Steinbach (Eds.), Visible Variation: Comparative Studies on Sign Language Structure (pp. 163-205). Berlin: Mouton de Gruyter.
- Johnston, T., & Schembri, A. (2007). Testing language description through language documentation, archiving and corpus creation: the case of indicating verbs in the Auslan Archive Corpus. In P. K. Austin, O. Bond, & D. Nathan (Eds.), Proceedings of Conference on Language Documentation and Linguistic Theory (pp. 145-154). London: SOAS.
- Johnston, T., & Schembri, A. (2007). Australian Sign Language (Auslan): An introduction to sign language linguistics. Cambridge: Cambridge University Press.
- Schembri, A., Johnston, T., & Goswell, D. (2006). NAME dropping: Location Variation in Australian Sign Language. In C. Lucas (Ed.), Multilingualism and sign languages: From the great plains to Australia (Vol. 12, pp. 121-156). Washington, DC: Gallaudet University Press.
- Johnston, T., & Schembri, A. (2006). Issues in the creation of a digital archive of a signed language. In L. Barwick & N. Thieberger (Eds.), Sustainable data from digital fieldwork: Proceedings of the conference held at the University of Sydney, 4-6 December 2006 (pp. 7-16). Sydney: Sydney University Press.