Research Interests ![]()
Computational linguistics and statistical natural language processing ![]()
Intelligent Computer-Assisted Language Learning Systems Constraint-based theories and gradient effects Research Interests
Applied InterestsAt present, I am involved in two on-going projects. Both projects involve the use of machine learning and text categronization techniques for linguistic purposes. The first project is automatic classification of language learner writings into language proficiency levels. This project involves statistical analyses of a number of textual features, finding reliable and automatically measurable linguistic features indicative of proficiency levels, and manual annotation and analysis of data. The goals of this project are twofold: (i) one aim is to develop reliable automatic evaluation software, and (ii) the other goal is to provide comparative data on second language development based on writers' first languages.
The second project, which is part of the Study for the Termination of Online Predators (STOP), involves automatic detection of child/pedophile communication in online text chats. The goals of this project are also twofold: (i) one goal is to develop a software application that can flag a text chat as suspicious (for law enforcement officials or for parents/guardians), and (ii) the other aim is to provide a better understanding of child/predator communication. Other co-investigators of this project are Chad Harms (Greenlee School of Journalism and Communication/HCI) and Brian Monahan (Sociology).
Theoretical Interests
Generally, my theoretical interests lie in middle-ground approaches to linguistics where symbolic and statistical approaches to language studies blend. Symbolic and statistical approaches to the study of language each have their strengths and weaknesses. While explaining linguistic patterns intuitively and parsimoniously, symbolic approaches stop short of making precise predictions about usage and speaker/hearer preferences. Purely statistical approaches, on the other hand, due to their heavy reliance on frequencies, often overlook the rich structure underlying language resulting in linguistically naive models. Recently, computational linguistics has witnessed a surge in middle-ground approaches, those that try to bring the best of the two worlds together in one coherent theory. As some noteworthy examples of such work, one can mention Abney (1996, 1997), Carroll and Rooth (1998), Keller (2000, 2003) and Manning (2003) among many others. The goal of these approaches is to enrich symbolic theories with information about language users’ preferences, which are usually represented in the form of probabilities or graded constraints and are arrived at through corpus analysis or psycholinguistic tests.
References
- Abney, S. (1996). Statistical methods and linguistics. In J. Klavans and P. Resnik (Eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge, MA: The MIT Press.
- Abney, S. (1997). Stochastic attribute-value grammars. Computational Linguistics 23(4), 597–618.
- Carroll, G. and M. Rooth (1998). Valence induction with a head-lexicalized PCFG. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Granada, pp. 36–45.
- Keller, F. (2000). Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh.
- Keller, F. (2003). A probabilistic parser as a model of global processing difficulty. In R. Alterman and D. Kirsh (Eds.), Proceedings of the 25th Annual Conference of the Cognitive Science Society, Boston, pp. 646–651.
- Manning, C. (2003). Probabilistic approaches to syntax. In R. Bod, J. Hay, and S. Jannedy (Eds.), Probabilistic Linguistics, Cambridge, MA. MIT Press.
Thesis Abstract My thesis advocates a modular and parallel grammar architecture with declarative constraints on the syntactic, semantic, prosodic, and pragmatic structures which are derived in parallel while mutually constraining one another as proposed by Jackendoff (1997, 2002). The main claim of this thesis is that because of the many conflicting requirements among modules, the interfaces cannot employ crisp constraints. Instead, a soft-constraint satisfaction approach is required. We also argue that simply violable constraints are insufficient to account for certain linguistic phenomena; there is need for graded constraints that allow for degrees of violation.
The dissertation first provides a review of different conceptions of gradience in linguistics followed by a review of the concept of modularity in cognitive science and linguistics. The problem of conflicting requirements in the field of Constraint Logic Programming (CLP) has led to various soft constraint satisfaction approaches. The dissertation then presents a generalized theory of soft constraint satisfaction Bistarelli (2001) from the CLP literature. The dissertation then presents a case study of graded constraints showing that such constraints exist at interfaces and that they can exhibit degrees of violation. Another case study shows that the modular parallel architecture allows for simpler modules and is able to capture generalizations better. We then conclude by showing how the generalized theory of soft-constraint satisfaction can be incorporated within grammar without disrupting the existing explanatory power of constraint-based theories such as LOT (Keller, 2000) and HPSG (Pollard and Sag, 1994).
This thesis was co-supervised by Elizabeth Cowper and Gerald Penn. Other committee members were Elan Dresher, Jean-Pierre Koenig, and Frank Keller.
References
- Bistarelli, S. (2001). Soft Constraint Solving and Programming: A General Framework. Ph.D. thesis, Universit� di Pisa.
- Jackendoff, R. (1997). The Architecture of the Language Faculty. Linguistic Inquiry: Monograph Twenty-Eight. Cambridge, Mass.: The MIT Press.
- Jackendoff, R. (2002). Foundations of language: Brain, Meaning, Grammar, Evolution. New York, NY: Oxford.
- Keller, F. (2000). Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh.
- Pollard, C. and I. Sag (1994). Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. Chicago: CSLI.
Previous Work The title of my first generals paper is "Plural and Sequential Events: Some Theoretical and Computational Implications." In this paper, I take a computational-semantic approach to plural and sequential events with special emphasis on Czech and Russian because of their rich aspectual systems. This research was supervised by Graeme Hirst and Elizabeth Cowper.
The title of my second generals paper is "A Constraint-Based Approach to Information Structure and Prosody Correspondence." This paper proposes a parallel architecture for constraint-based theories of grammar, HPSG in particular, in favour of modular, readable, and maintainable theories/grammars. In this architecture, syntax/semantics and information structure constrain prosodic structure, which is generated in par with the other two structures mentioned. This paper was supervised by Elizabeth Cowper and Gerald Penn.
I also worked as a research assistant to Gerald Penn. We were part of Module A4 of the MiLCA consortium. I was involved in developing a grammar formalism and computational system for parsing freer word order languages.