  1. Linguistic Summarization Using IF-THEN Rules

    Dongrui Wu; Jerry M. Mendel; Life Fellow; Jhiin Joo; Student Member
    Abstract—Linguistic summarization (LS) is a data mining or knowledge discovery approach to extract patterns from databases. It has been studied by many researchers; however, none of them has used it to generate IF-THEN rules, which can be added to a knowledge base for better understanding of the data, or be used in Perceptual Reasoning to infer the outputs for new scenarios. In this paper LS using IF-THEN rules is proposed. Five quality measures for such summaries are defined. Among them, the degree of usefulness is especially valuable for finding the most reliable and representative rules, and the degree of outlier...

  2. Fuzzy to SQL Conversion using Gefred Model with the help of MATLAB

    For Many Years, achieving unambiguous knowledge has been turned to a serious challenge for human being. The aim of this paper is to emphasize situation when classical {true, false} logic is not adequate for data selection and data classification. Linguistic expression like: high salary, young etc are very often used in life and in statistics. The goal of this paper is brief study of fuzzy logic and sets and how to make it suitable for database queries and classification tasks. Fuzzy approach is introduced with usual relational database model to handle linguistic queries. The purposed fuzzy approach provides flexibility when...

  3. Properties of the word set for estimating similarities between prokaryotic genomes in linguistic approach

    Keishin Hanya; Satoshi Mizuta
    Recently, as completely sequenced genomes have been rapidly increasing in number, comparison be-tween whole genome sequences is becoming more important. Linguistic approach is one of the avail-able methods to estimate the similarities between long sequences such as whole genomes[1]. In the method, a word set W is constructed, in which a word is defined as a sequence piece of four letters

  4. Assessing Direct and Indirect Evidence in Linguistic Research

    Christina Behme; Springer Science+business Media Dordrecht
    Abstract This paper focuses on the linguistic evidence base provided by proponents of conceptualism (e.g., Chomsky) and rational realism (e.g., Katz) and challenges some of the arguments alleging that the evidence allowed by conceptualists is superior to that of rational realists. Three points support this challenge. First, neither concep-tualists nor realists are in a position to offer direct evi-dence. This challenges the conceptualists ’ claim that their evidence is inherently superior. Differences between the kinds of available indirect evidence will be discussed. Second, at least some of the empirical evidence provided by the conceptualist is flawed. It is not obtained...

  5. Perceptual Advantage from Generalized Linguistic Knowledge

    We address the question of how previously acquired linguis-tic knowledge facilitates perception and learning of a new lan-guage. We report results from two experiments showing evi-dence that participants better discriminate a segmental duration contrast in a novel language if they had some previous expo-sure to a language that uses duration contrastively. Crucially, the perceptual advantage occurs even when the novel language employs the contrast in entirely different conditions: in novel segmental contexts and for novel segments, including a change from application to vowels to application to consonants. We take these results to suggest that language learners use their knowledge of...


    Do Thi-ngoc-diep; Michaud Alexis; Castelli Eric
    Automatic speech processing technologies hold great potential to facilitate the urgent task of documenting the world’s languages. The present research aims to explore the application of speech recognition tools to a little-documented language, with a view to facilitating processes of annotation, transcription and linguistic analysis. The target language is Yongning Na (a.k.a. Mosuo), an unwritten Sino-Tibetan language with less than 50,000 speakers. An acoustic model of Na was built using CMU Sphinx. In addition to this ‘light ’ model, trained on a small data set (only 4 hours of speech from 1 speaker), ‘heavyweight ’ models from five national languages...

  7. Disquotationalism and the Compositional Principles

    Richard G Heck
    ... [S]emantics... is a sober and modest discipline which has no pretensions of being a universal patent-medicine for all the diseases of mankind, whether imaginary or real. You will not find in semantics any remedy for decayed teeth or illusions of grandeur or class conflicts. Nor is semantics a device for establishing that everyone except the speaker and his friends is speaking nonsense. (Tarski, 1944, p. 345) In their paper “The Use of Force Against Deflationism”, Bar-On and Simmons (2007, p. 61) helpfully distinguish three sorts of deflationary theses about truth. Metaphysical deflationism is a thesis about the property of...

  8. Gert de Cooman

    Vakgroep Elektrische Energietechniek; Etienne E. Kerre; Vakgroep Toegepaste; Wiskunde Informatica
    Possibility theory can be brie y described as the formalism that allows us to mathematically represent and manipulate linguistic information. This is the information contained in and conveyed by armative and conditional sentences in natural language. Armative sentences are of the type `(subject) is (predicate)', where the predicate involved

  9. Comparative Study of Text Summarization Methods

    Nikita Munot; Sharvari S. Govilkar
    Text summarization is one of application of natural language processing and is becoming more popular for information condensation. Text summarization is a process of reducing the size of original document and producing a summary by retaining important information of original document. This paper gives comparative study of various text summarization methods based on different types of application. The paper discusses in detail two main categories of text summarization methods these are extractive and abstractive summarization methods. The paper also presents taxonomy of summarization systems and statistical and linguistic approaches for summarization.

  10. Topicality, Predicate Prototypes, and Conceptual Space

    Patrick Murphy; Patrick Murphy
    This paper has the goal of investigating the nature of membership within the category ‘passive ’ and cross-linguistic comparison of constructions, ‘passive ’ and otherwise. Topicality measures were collected from the Uppsala Corpus of Russian for passive and active uses of the Russian verbs pisat’/napisat ’ ‘to write, ’ davat’/dat ’ ‘to give, ’ and zabyvat’/zabyt ’ ‘to forget. ’ Croft’s (2001) notion of plotting constructions in ‘conceptual space ’ is exploited as a means of cross-linguistic comparison using these topicality measures. Examining the conceptual space of various voice constructions with these Russian verbs, Croft’s generalizations are upheld, their position...

  11. A Fuzzy Approach to Image Analysis in HLA Typing using Oligonucleotide Microarrays

    G. B. Ferrara A; S. Rovetta A; R. Sensi A
    The Human Leukocyte Antigen (HLA) region is a part of genome which spans over 4 Mbases of DNA. The HLA system is strongly connected to immunological response and its compatibility between tissues is critical in transplantation. We have developed an application of oligonucleotide microarrays to HLA typing. In this paper we present a method based on a fuzzy system which interactively supports the user in analyzing the hybridization results, speeding-up the decision process moving from raw array data obtained from the scanner to their interpretation (genotyping). The two-level procedure starts with evaluation of spot activity, then it estimates probe hybridization...

  12. Detecting Non-modal Phonation in Telephone Speech

    Tae-jin Yoon; Jennifer Cole; Mark Hasegawa-johnson; Chilin Shih
    Non-modal phonation conveys both linguistic and paralinguistic information, and is distinguished by acoustic source and filter features. Detecting non-modal phonation in speech requires reliable F0 analysis, a problem for telephone-band speech, where F0 analysis frequently fails. We demonstrate an approach to the detection of creaky phonation in telephone speech based on robust F0 and spectral analysis. Our F0 analysis relies on an autocorrelation algorithm applied to the intensity-boosted and inverse-filtered speech signal and succeeds in regions of nonmodal phonation where the non-filtered F0 analysis typically fails. In addition to the extracted F0 values, spectral amplitude is measured at the first...

  13. A General Feature Space for Automatic Verb Classification

    We develop a general feature space for automatic classification of verbs into lexical semantic classes. Previous work was limited in scope by the need for manual selection of discriminating features, through a linguistic analysis of the target verb classes (Merlo and Stevenson, 2001). We instead analyze the classification structure at a higher level, using the possible defining characteristics of classes as the basis for our feature space. The general feature space achieves reductions in error rates of 42– 69%, on a wider range of classes than investigated previously, with comparable performance to feature sets manually selected for the particular classification...

  14. Don’t Believe in Underspecified Semantics Neg Raising in Lexical Resource Semantics

    O. Bonami; P. Cabredo Hofherr (eds; Manfred Sailer
    Neg raising is a construction that has been widely studied from different theoretical perspectives, going back to the classic philosophers (cf. Horn (1989)). Yet even the most central properties have not received a satisfactory integration into a linguistic framework. In this paper I will try to approach the phenomenon from a new angle:

  15. Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering

    Aline Villavicencio; Valia Kordoni; Yi Zhang; Marco Idiart; Carlos Ramisch
    This paper focuses on the evaluation of methods for the automatic acquisition of Multiword Expressions (MWEs) for robust grammar engineering. First we investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: mutual information (MI), χ 2 and permutation entropy (PE). Our overall conclusion is that at least two measures, MI and PE, seem to differentiate MWEs from non-MWEs. We then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. We conclude...

  16. Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features

    Nizar Habash; Ryan Gabbard; Owen Rambow; Seth Kulick; Mitch Marcus
    This paper discusses automatic determination of case in Arabic. This task is an important part and major source of errors in full diacritization of Arabic. We use a goldstandard syntactic tree, and obtain an error rate of about 4.2%, with a machine learning based system outperforming a system using hand-written rules. A careful error analysis suggests that when we account for annotation errors in the gold standard, the error rate drops to 0.9%, with the hand-written rules outperforming the machine learningbased system. 1

  17. Generating Adjectives to Express the Speaker's Argumentative Intent

    Michael Elhadad
    Abstract being modified. In addition, these decisions interact with We address the problem of generating adjectives in a text generation system. We distinguish between usages of ad-jectives informing the hearer of a property of an object and usages expressing an intention of the speaker, or an ar-gumentative orientation. For such argumentative usages, we claim that a generator cannot simply map from infor-mation in the knowledge base to adjectives. Instead, we identify various knowledge sources necessary to decide whether to use an adjective, what adjective should be selected and what syntactic function it should have. We show how these decisions interact...

  18. Corpus Variations for Translation Lexicon Induction

    Lexical mappings (word translations) between languages are an invaluable resource for multilingual processing. While the problem of extracting lexical mappings from parallel corpora is well-studied, the task is more challenging when the language samples are from nonparallel corpora. The goal of this work is to investigate one such scenario: finding lexical mappings between dialects of a diglossic language, in which people conduct their written communications in a prestigious formal dialect, but they communicate verbally in a colloquial dialect. Because the two dialects serve different socio-linguistic functions, parallel corpora do not naturally exist between them. An example of a diglossic dialect...

  19. How to Learn / What to Learn

    Roger C. Schank; Mallory Selfridge
    Abstract: This paper discusses the kind of information that must be present in a computer program that models the linguistic development of a child. A three stage model is presented that characterizes the development of a natural language parser in a child of ages one, one and a half, and two. Some data from children of these ages is presented. General problems with respect to computer learning are also discussed.

  20. Storage and Retrieval]: Content Analysis and Indexing – Linguistic processing.

    Xin-jing Wang
    In this paper, we propose a novel Chinese word segmentation method which leverages the huge deposit of Web documents and search technology. It simultaneously solves ambiguous phrase boundary resolution and unknown word identification problems. Evaluations prove its effectiveness.

