A Grid of Regional Language Archives
- Paul Trilsbeek; Daan Broeder; Tobias Van Valkenhoef; Peter Wittenburg
About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework  that the MPI has developed over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions...
Advanced Transaction Processing in Multilevel Secure File Stores
- Elisa Bertino; Sushil Jajodia; Senior Member; Luigi Mancini; Indrajit Ray
Abstract—The concurrency control requirements for transaction processing in a multilevel secure file system are different from those in conventional transaction processing systems. In particular, there is the need to coordinate transactions at different security levels avoiding both potential timing covert channels and the starvation of transactions at higher security levels. Suppose a transaction at a lower security level attempts to write a data item that is being read by a transaction at a higher security level. On the one hand, a timing covert channel arises if the transaction at the lower security level is either delayed or aborted by the...
T.: The Danish Dependency Treebank: Linguistic Principles and Semi-automatic Tagging Tools. Paper Represented at the Swedish Treebank Symposium
- Matthias T. Kromann
dependency treebank on top of the morphologically tagged Danish PAROLE corpus (291.000 words). This includes work on: (1) a tagging manual with our recommended dependency analyses of Danish and their underlying linguistic motivation; (2) software that can be used to tag a large corpus (either manually, semi-automatically, or auto-matically), given a dependency-based inheritance lexicon for Danish; and (3) software that allows linguists to search for examples of any particular grammatical construction within the tagged corpus. All three are work in progress (we expect to complete the software and tagging manual in November and the entire corpus in March). We will...
Unsupervised Relation Extraction for Automatic Generation of Multiple-Choice Questions
- Naveed Afzal; Viktor Pekar
In this paper, we investigate an unsupervised approach to Relation Extraction to be applied in the context of automatic generation of multiple-choice questions (MCQs). The approach aims to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. The paper examines three different surface pattern types, each implementing different assumptions about linguistic expression of semantic relations between named entities. Our main findings indicate that the approach is capable of achieving high precision rates and its enhancement with linguistic knowledge helps to produce significantly...
Song: Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey
- Illhoi Yoo; Min Song
In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple-topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature...
Aiding in the Treatment of Low Back Pain by a Fuzzy Linguistic Web System
- Bernabe ́ Esteban; Carlos Porcel; Jose ́ Antonio Moral-muñoz; Enrique Herrera-viedma
Abstract. Low back pain affects a large proportion of the adult popu-lation at some point in their lives and has a major economic and social impact. To soften this impact, one possible solution is to make use of rec-ommender systems, which have already been introduced in several health fields. In this paper, we present TPLUFIB-WEB, a novel fuzzy linguistic Web system that uses a recommender system to provide personalized exercises to patients with low back pain problems and to offer recom-mendations for their prevention. This system may be useful to reduce the economic impact of low back pain, help professionals...
Keyphrase Extraction for Summarization Purposes: The
- Bernardo Magnini; Alessandro Vallin
We report on ITC-irst participation at Task 1 (very short document summaries) at DUC-2004. We propose to exploit a keyphrase ex-traction methodology in order to identify rel-evant terms in the document. The LAKE al-gorithm first considers a number of linguis-tic features to extract a list of well moti-vated candidate keyphrases, then uses a ma-chine learning framework to select signifi-cant keyphrases for a document. With re-spect to other approaches to keyphrase ex-traction, LAKE makes use of linguistic pro-cessors such as multiword and named entities recognition, which are not usually exploited. 1
MIRACLE Retrieval Experiments with East Asian Languages
- Julio Villena-román; José Miguel Goñi-menoyo; José C. González-cristóbal; José Luis Martínez-fernández
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguistic-based...
Influence of brain lesion and educational background on language tests in aphasic subjects
- Ellen Cristina; Siqueira Soares; Karin Zazo Ortiz
Abstract – In language assessment, several socio-demographic variables must be taken into account. Objectives: To characterize the performance of aphasic patients with different educational background on language tasks and to compare their performance to that of individuals with no language disorders. Methods: Thirty aphasic patients and 30 healthy individuals were selected. Patients were divided into two groups according to educational level: A (1–4 years) n=15 and B (5–11 years) n=15. Age ranged from 27 to 78 years. All subjects were submitted to the Montreal Toulouse language assessment protocol. The pertinent statistical tests were applied. Results: Educational level interfered in the...
angestrebter akademischer Grad
- Mag Peter Fröhlich
People are increasingly using information technology with their auditory sense: they listen to their iPod playlists, talk to car navigation systems, and check accounts over telephone-banking systems. Nevertheless, the capabilities of the human auditory modality for interacting with computers are still insufficiently exploited. This dissertation thesis advocates an integrated perspective on the multifaceted research and application fields of auditory human-computer interaction. The first part of the thesis conceptualizes the many differing ways auditory information is exchanged and processed between humans and computers. A taxonomy of 8 auditory user interface representations is proposed, consisting of linguistic representations (language-specific code, prosody, pragmatics),...
DGfS 2009: Form und Funktion
- Dr. Sabine Bartsch; Monica Holtz
numerous studies addressing a wide spectrum of features deemed to be characteristic of specific registers (cf. e.g. on the expression of stance in scientific registers (Hunston, Thompson 2000), the self-construal of the scientist (Hyland 1998, 2002), changes in the course of the historical development of scientific registers (e.g. Halliday 1988); register profiling (Biber 1995)). Many register studies focus on single or relative small sets of texts instead of being based on studies of larger corpora. This limits the possibilities of systematically evaluating register features in larger and more diverse sets of registers and, thus, the wider applicability of the results....
AN ANALYSIS OF ARGUMENTATIVE TEXTS FOR CONTRASTIVE RHETORIC
- Fernando Trujillo Sáez
Contrastive Rhetoric has been an outstanding line of research of Writing across Cultures in the United States for over thirty years. Despite the criticism it has received, it is enjoying a revival favoured by a new approach to writing, text analysis and culture itself. In this paper a model of analysis for argumentative texts in Spanish and English is introduced in accordance with recent tendencies in Contrastive Rhetoric. Apart from a traditional quantitative text analysis, we propose an analysis of discourse markers and Rhetorical Structure Theory. These linguistic data are correlated with an evaluation of the texts to study the...
On the place of linguistic resources in the organization of talk-in-interaction: A co-investigation of English and Japanese grammatical practices
- Gene H. Lerner
Specific parts of grammatical structure can be employed by speakers to accomplish specifiable actions in talk-in-interaction. In this article, I describe the interactional use of “parts of speech ” ordinarily used by individual speakers to connect elements within single turn-constructional units. The items employed for these held-in-com-mon grammatical practices can also be deployed as stand-alone contributions that by their very incompleteness prompt a prior speaker to add another increment to their turn. As such, this constitutes a recipient-administered practice for expanding a turn at talk. I show that this usage constitutes another (previously undescribed) form of other-initiated repair that is...
- Isabel Moskowich; Javier Parapar
has been designed as a tool for the study of language change in English scientific writing in general as well as within the different scientific disciplines. Its purpose is to facilitate investigation at all linguistic levels, though, in principle, phonology is not included among our intended research topics. A rough definition of our corpus would say it contains English scientific texts other than medical produced between 1600 and 1900. Medical texts have been disregarded since they are being compiled by Taavitsainen Pahta and their team in Helsinki. Two of the ideas that triggered the whole project are the growing interest...
Press. Variety, style-shifting, and ideology1
Judith Irvine (this volume) has examined the role of ideology in the relation between social group language differences and the representation of those differences in each speaker’s style contrasts. Her focus is on the social meanings signified by styles, which are primarily contrastive. She is interested, then, not in what Labov (1966) called indicators, which she calls ‘empirical distributions, ’ but in ideas about language categories that represent social contrasts to participants. Her paper makes some powerful and important integrative claims. The principle of iconization is a claim that the social contrasts that are imputed to groups or to situations...
Temporal relations in learner varieties: Grammaticalization and discourse construction
- Colette Noyau
Works on temporality in the field of language acquisition have progressed considerably over the past few years, moving on from descriptions of the linguistic structure of learner varieties towards a more general concern for their dynamic nature, i.e. their developmental structure and factors relating to their restructuring. As far as the study of
REWRITING THE PAST: BARE VERBS IN THE OTTAWA REPOSITORY OF EARLY AFRICAN AMERICAN CORRESPONDENCE 1
- Gerard Van Herk; Shana Poplack
This paper describes the construction of the Ottawa Repository of Early African American Correspondence (OREAAC), a corpus of over 400 letters written by antebellum African American settlers in Liberia. Identifying the most speech-like letters by the least literate authors, we constituted perhaps the largest linguistically useful corpus of diachronic African American English primary data currently available. We demonstrate the utility of the OREAAC through analysis of factors conditioning the variable expression of past tempo-ral reference in nearly 2400 verbs. In these letters, zero-marking is favoured in weak verbs by a preceding consonant (a universal), and in strong verbs by lexical...
Evaluating and Improving an Automatic Sentiment Analysis System
- Viktoriya Kotelevskaya; Mats Dahllöf; Bengt Dahlqvist; Beáta Megyesi; Ra Stålnacke; Find Agent Ab
The purpose of this thesis is to improve OpenAmplify – a system for auto-matic sentiment analysis by analysing OA’s weaknesses and problematic areas and then modifying the resource files and lexicons by adding new linguistic items and improving the system’s rules. The performance of OA was first evaluated and the collected data was compared to opinions of human judges, allowing to identify and analyse the problematic areas and shortcomings of the system. The analysis lead to modifications of OA: idioms and missing expressions were added to the lexicons and the resource files were extended by some linguistic rules.The results of...
Sink positive: Linguistic experience with th substitutions influences nonnative word recognition
- Adriana Hanulíková; Andrea Weber
and production tasks to examine the influences of perceptual similarity and linguistic experience on word recognition in nonnative (L2) speech. Eye movements to printed words were tracked while German and Dutch learners of English heard words containing one of three pronunciation variants (/t/, /s/, or /f/) of the interdental fricative /θ/. Irrespective of whether the speaker was Dutch or German, looking prefer-ences for target words with /θ / matched the preferences for producing /s / variants in German speakers and /t / variants in Dutch speakers (as determined via the production task), while a control group of English participants showed...
THE POETICS OF EVERYDAY LANGUAGE DR. GEOFF. HALL*
- Key Words
There is a growing recognition on the part of linguists that everyday 'ordi-nary ' language is shot through with supposed poeticisms-metaphor, idiom and other varieties of non-literal language and language use. Gibbs (1994) has even questioned the usefulness of a literal- non-literal language divide, while Cárter and Nash (1990) propose a more modest cline of literariness, from tech-nical writing through ordinary conversations to advertising and on to literary text. In this view, possibly the only linguistic or formal feature differentiating literary language from more everyday uses is the tolerance of literature for almost all varieties and registers where non-literary texts...