
181.
Low Size-Complexity Inductive Logic Programming: The East-West Challenge Considered as a Problem in Cost-Sensitive Classification
- Peter Turney
The Inductive Logic Programming community has considered proof-complexity and
model-complexity, but, until recently, size-complexity has received little attention.
Recently a challenge was issued "to the international computing community" to discover
low size-complexity Prolog programs for classifying trains. The challenge was based on
a problem first proposed by Ryszard Michalski, 20 years ago. We interpreted the challenge
as a problem in cost-sensitive classification and we applied a recently developed
cost-sensitive classifier to the competition. Our algorithm was relatively successful (we
won a prize). This paper presents our algorithm and analyzes the results of the competition.
Keywords: Machine Learning, Classification, Learning from Examples, Cost-Sensitive
Classification, Inductive Logic Programming, Size-Complexity.
1. Introduction
Inductive Logic...

182.
Low Size-Complexity Inductive Logic Programming: The East-West Challenge Considered as a Problem in Cost-Sensitive Classification
- Peter Turney
The Inductive Logic Programming community has considered proof-complexity and
model-complexity, but, until recently, size-complexity has received little attention.
Recently a challenge was issued "to the international computing community" to discover
low size-complexity Prolog programs for classifying trains. The challenge was based on
a problem first proposed by Ryszard Michalski, 20 years ago. We interpreted the challenge
as a problem in cost-sensitive classification and we applied a recently developed
cost-sensitive classifier to the competition. Our algorithm was relatively successful (we
won a prize). This paper presents our algorithm and analyzes the results of the competition.
Keywords: Machine Learning, Classification, Learning from Examples, Cost-Sensitive
Classification, Inductive Logic Programming, Size-Complexity.
1. Introduction
Inductive Logic...

183.
Using Inductive Logic Programming to Automate the Construction of Natural Language Parsers
- John M. Zelle,Dave Moriarty,Cindi Thompson,Ulf Hermjakob,Joshua Konvisser
vii
Chapter 1 Introduction 1
1.1 Empirical NLP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2
1.2 CHILL: An Empirical Parser Acquisition System : : : : : : : : : : : 4
1.3 Organization of Dissertation : : : : : : : : : : : : : : : : : : : : : : : 7
Chapter 2 Background 8
2.1 Control-Rule Learning : : : : : : : : : : : : :...

184.
A Comparison of Two Methods Employing Inductive Logic Programming for Corpus-based Parser Construction
- John M. Zelle,Raymond J. Mooney
This paper presents results from recent experiments
with Chill, a corpus-based parser acquisition
system. Chill treats grammar acquisition
as the learning of search-control rules within a
logic program. Unlike many current corpus-based
approaches that use propositional or probabilistic
learning algorithms, Chill uses techniques from
inductive logic programming (ILP) to learn relational
representations. The reported experiments
compare Chill's performance to that of a more
naive application of ILP to parser acquisition. The
results show that ILP techniques, as employed in
Chill, are a viable alternative to propositional
methods and that the control-rule framework is
fundamental to Chill's success.
Introduction
Empirical or corpus-based methods for constructing
natural language systems has been an area of
growing research interest in the last...

185.
Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
- Mary Elaine Califf,Raymond J. Mooney
This paper demonstrates the capabilities of Foidl, an inductive logic programming
(ILP) system whose distinguishing characteristics are the ability to produce first-order
decision lists, the use of an output completeness assumption to provide implicit negative
examples, and the use of intensional background knowledge. The development of Foidl
was originally motivated by the problem of learning to generate the past tense of English
verbs; however, this paper demonstrates its superior performance on two different
sets of benchmark ILP problems. Tests on the finite element mesh design problem
show that Foidl's decision lists enable it to produce better results than all other ILP
systems whose results on this problem have...

186.
Tidying up the Mess around the Subsumption Theorem in Inductive Logic Programming
- Shan-hwei Nienhuys-cheng,Ronald Wolf
The subsumption theorem is an important theorem concerning resolution. Essentially,
it says that if a set of clauses Sigma logically implies a clause C, then
either C is a tautology, or a clause D which subsumes C can be derived from Sigma
with resolution. It was originally proved in 1967 by Lee in [Lee67]. In Inductive
Logic Programming, interest in this theorem is increasing since its rediscovery
by Bain and Muggleton [BM92]. It provides a quite natural "bridge" between
subsumption and logical implication. Unfortunately, a correct formulation and
proof of the subsumption theorem are not available. It is not clear which forms
of resolution are allowed. In fact,...

187.
Learning the Past Tense of English Verbs Using Inductive Logic Programming
- Raymond J. Mooney,Mary Elaine Califf
. This paper presents results on using a new inductive logic
programming method called Foidl to learn the past tense of English
verbs. The past tense task has been widely studied in the context of
the symbolic/connectionist debate. Previous papers have presented results
using various neural-network and decision-tree learning methods.
We have developed a technique for learning a special type of Prolog
program called a first-order decision list, defined as an ordered list of
clauses each ending in a cut. Foidl is based on Foil [19] but employs
intensional background knowledge and avoids the need for explicit negative
examples. It is particularly useful for problems that involve rules
with specific...

188.
Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
- Mary Elaine Califf,Raymond J. Mooney
This paper demonstrates the capabilities of Foidl, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of Foidl was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that Foidl's decision lists enable it to produce generally more accurate results...

189.
Restricting the hypothesis space, guiding the search, and handling the redundant information in Inductive Logic Programming
- Uros Pompe
We developed and implemented an inductive logic programming system and the first order
classifier, called ILP-R. It is capable of efficiently inducing theories in a restricted subset
of function free normal programs. It is able to use the hypothesis to classify new instances,
by using the naive Bayesian reasoning. The contributions of the system are threefold.
First, we devised a weak syntactic declarative bias restricting the set of possible hypotheses.
We discuss and prove its properties. The latter are then exploited to encode
the current proof of a partially induced clause in such a way, that the encoding grows at
most linearly with respect to the clause...

190.
Automatically Exploring Hypotheses about Fault Prediction: a Comparative Study of Inductive Logic Programming Methods
- William W. Cohen,Premkumar T. Devanbu
We evaluate a class of learning algorithms known as inductive logic programming
(ILP) methods on the task of predicting fault occurrence in C++ classes. Using these
methods, a large space of possible hypotheses is searched in an automated fashion;
further, the hypotheses are based directly on an abstract logical representation of the
software, rather than on manually proposed numerical metrics that predict fault density.
We compare two ILP systems, FOIL and FLIPPER, and conclude that FLIPPER
generally outperforms FOIL on this problem. We analyze the reasons for the di#ering
performance of these two systems, and based on the analysis, propose two extensions
to FLIPPER: a user-directed bias towards...

191.
Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming
- Dimitar Kazakov,Suresh Manandhar
. This article presents a combination of unsupervised and supervised learning techniques for the generation of word segmentation rules from a raw list of words. First, a language bias for word segmentation is introduced and a simple genetic algorithm is used in the search for a segmentation that corresponds to the best bias value. In the second phase, the words segmented by the genetic algorithm are used as an input for the first order decision list learner Clog. The result is a set of first order rules which can be used for segmentation of unseen words. When applied on either...

192.
Use of Inductive Logic Programming to learn Principles of Protein Structure
- Marcel Turcotte,Stephen H. Muggleton,Michael J. E. Sternberg
Inductive Logic Programming (ILP) has been applied to learn rules which characterise protein
folds. Several representations for the background set have been explored and the results have been
interpreted in their biological context. In this paper, we present new results obtained with a background
set containing information about protein topology. The new rules are more descriptive than
the previous ones, i.e. where previous rules represented local motifs, often associated with functional
regions, the new rules represent more complete descriptions, often similar to the descriptions
found in SCOP. Cross-validation experiments were conducted for the 20 most populated folds. The
overall cross-validated accuracy was found to be 75.1
1.6 %...

193.
Combining Top-down and Bottom-up Techniques in Inductive Logic Programming
- John M. Zelle,Raymond J. Mooney,Joshua B. Konvisser
This paper describes a new method for inducing
logic programs from examples which attempts
to integrate the best aspects of existing
ILP methods into a single coherent framework.
In particular, it combines a bottomup
method similar to Golem with a topdown
method similar to Foil. It also includes
a method for predicate invention similar
to Champ and an elegant solution to
the "noisy oracle" problem which allows the
system to learn recursive programs without
requiring a complete set of positive examples.
Systematic experimental comparisons
to both Golem and Foil on a range of problems
are used to clearly demonstrate the advantages
of the approach.
1 INTRODUCTION
Inductive Logic Programming (ILP) research investigates
the construction of first-order, definite-clause
logic programs...

194.
Comparative Results on Using Inductive Logic Programming for Corpus-based Parser Construction
- John M. Zelle,Raymond J. Mooney
. This paper presents results from recent experiments with
Chill, a corpus-based parser acquisition system. Chill treats language
acquisition as the learning of search-control rules within a logic program.
Unlike many current corpus-based approaches that use statistical learning
algorithms, Chill uses techniques from inductive logic programming
(ILP) to learn relational representations. Chill is a very flexible system
and has been used to learn parsers that produce syntactic parse trees,
case-role analyses, and executable database queries. The reported experiments
compare Chill's performance to that of a more naive application
of ILP to parser acquisition. The results show that ILP techniques, as
employed in Chill, are a viable alternative to statistical methods...

195.
An Alternative to Clause-at-a-Time Hypothesis Construction in Inductive Logic Programming
- Simon Anthony,Alan M. Frisch
Hypotheses constructed by inductive logic programming
(ILP) systems are finite sets of definite clauses. Top-down ILP systems
usually adopt the following greedy clause-at-a-time strategy to construct
such a hypothesis: start with the empty set of clauses and repeatedly add
the clause that most improves the quality of the set.
This paper formulates and analyses an alternative method for constructing
hypotheses. The method, called cautious induction, consists of
a first stage, which finds a finite set of candidate clauses, and a second
stage, which selects a finite subset of these clauses to form a hypothesis.
By using a less greedy method in the second stage, cautious induction
can find hypotheses of...

196.
Cautious Induction: An Alternative to Clause-at-a-Time Hypothesis Construction in Inductive Logic Programming
- Simon Anthony,Alan M. Frisch
The hypotheses constructed by Inductive Logic Programming (ILP) systems are
logic programs that comprise a finite set of definite clauses. Much of the recent
work in ILP has examined the question of how to find a single, high quality clause---
that is, one which accurately explains the classification of a set of given examples.
Significantly less attention has been paid to the question of how to select a set of such
clauses to form a high quality hypothesis. The usual method is to select clauses in
a greedy manner: start with the empty set of clauses and repeatedly add the clause
that most improves the quality of...

197.
Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
- Mary Elaine Califf,Raymond J. Mooney
This paper demonstrates the capabilities of Foidl, an inductive logic programming
(ILP) system whose distinguishing characteristics are the ability to produce first-order
decision lists, the use of an output completeness assumption as a substitute for negative
examples, and the use of intensional background knowledge. The development of Foidl
was originally motivated by the problem of learning to generate the past tense of English
verbs; however, this paper demonstrates its superior performance on two different sets
of benchmark ILP problems. Tests on the finite element mesh design problem show that
Foidl's decision lists enable it to produce generally more accurate results than a range
of methods previously applied to...

198.
Secondary structure prediction by Inductive Logic Programming
residues we rst obtained
the PHD server [4, 5] secondary structure assignments with reliability index
R (0-9). We formed attributes with values of PHD predictions for H if R
3, L if R 4, E if R 5, and unassigned otherwise. We took the window
size of 6 residues, and additionally formed new attributes in terms of the
distance to the nearest H, L, or E residue. In order to prevent overtting the
data, the reduced-error pruning [1] with condence level CF = 1% was used
for learning the rules. 7-fold cross validation on the set of 887 chains yielded
accuracy Q3 = 72.1 9.7% and segment overlap...

199.
An empirical study of the use of relevance information in Inductive Logic Programming
- Ashwin Srinivasan,Ross D. King,Michael E. Bain
Inductive Logic Programming (ILP) systems construct models for data
using domain-speci
c background information. When using these systems, it is typically
assumed that sucient human expertise is at hand to rule out irrelevant
background information. Such irrelevant information can, and typically does, hinder
an ILP system's search for good models. Here, we provide evidence that if expertise
is available that can provide a partial-ordering on sets of background predicates in
terms of relevance to the analysis task, then this can be used to good eect by an
ILP system. In particular, using data from biochemical domains, we investigate an
incremental strategy of including sets of predicates in decreasing order...

200.
An empirical study of the use of relevance information in Inductive Logic Programming
- Ashwin Srinivasan,Ross D. King
Inductive Logic Programming (ILP) systems construct models for data using domain-specific background information. When using these systems, it is typically assumed that sufficient human expertise is at hand to rule out irrelevant background information. Such irrelevant information can, and typically does, hinder an ILP system's search for good models. Here, we provide evidence that if additional expertise is available that can provide a partial-ordering on sets of background predicates in terms of relevance to the analysis task, then this can be used to good effect by an ILP system. In particular, using data from biochemical domains, we investigate an incremental...