This project implements a rule-based natural language command recognizer using a feature-based context-free grammar (FCFG) in Python with the NLTK library.
The goal is to parse simple Spanish commands directed to a robot and extract their semantic representation, which can then be used for execution.
The system takes textual commands in Spanish as input and produces:
- A syntactic parse tree
- A semantic structure describing the intended action
The core of the project is the design and implementation of the grammar gramatica_base.fcfg.
The grammar was designed after manually analyzing a set of written commands from a training text. These commands were first analyzed syntactically, and the resulting structures were then enriched with semantic information encoded as feature attributes. This analysis made it possible to define a feature structure capable of representing actions and their parameters.
The resulting feature structure is shown in the following figure:
The syntactic and semantic representations were implemented as a set of feature-based grammar rules compatible with NLTK’s load_parser method. These rules combine syntactic structure with semantic attributes through feature unification, allowing the parser to build a structured representation of each command.
Each rule propagates and constrains features such as ACCION and DISTANCIA so that, when combined, they yield a semantic interpretation of the input sentence. The full set of these rules constitutes the grammar.
The following input:
avanzar diez metros
is parsed into the semantic representation:
SEM = [ACCION='avanzar',
DISTANCIA=[CANTIDAD='diez', UNIDAD='metro'],
VELOCIDAD=?v]
This interpretation is produced by combining grammar rules such as:
-
A sentence rule mapping the verb phrase to the global semantic structure:
S[SEM=[ACCION=?a, DISTANCIA=?d, VELOCIDAD=?v]] -> SV[ACCION=?a, DISTANCIA=?d, VELOCIDAD=?v] -
A verb phrase rule combining the verb and the distance phrase:
SV[ACCION='avanzar', DISTANCIA=?d, VELOCIDAD=?v] -> V[ACCION='avanzar'] SN[OBJETO='distancia', CANTIDAD=?c, UNIDAD=?u] -
A noun phrase rule for distance expressions:
SN[CANTIDAD=?c, UNIDAD=?u, TIPO='distancia'] -> Det[CANTIDAD=?c] N[TIPO='distancia', UNIDAD=?u] -
Lexical entries such as:
V[ACCION='avanzar'] -> 'avanzar' Det[CANTIDAD='diez'] -> 'diez' N[TIPO='distancia', UNIDAD='metro'] -> 'metros'
Although the grammar rules are written in top-down form, the resulting parse can be understood bottom-up. The parser first matches the input words with lexical entries and then combines them into progressively larger constituents through feature unification:
1. Lexical matching
avanzar -> V[ACCION='avanzar', COMP='si']
diez -> Det[CANTIDAD='diez']
metros -> N[TIPO='distancia', UNIDAD='metro']
2. Building the nominal constituent
N[TIPO='distancia', UNIDAD='metro']
→ Nominal[OBJETO='distancia', ..., TIPO='distancia', UNIDAD='metro']
Det[CANTIDAD='diez'] + Nominal[...]
→ SN[CANTIDAD='diez', OBJETO='distancia', ..., TIPO='distancia', UNIDAD='metro']
3. Building the verb phrase
V[ACCION='avanzar', COMP='si'] + SN[...]
→ SV[ACCION='avanzar', COMP='si', DISTANCIA=[CANTIDAD='diez', UNIDAD='metro']]
4. Building the sentence
SV[ACCION='avanzar', COMP='si', DISTANCIA=[CANTIDAD='diez', UNIDAD='metro']]
→ S[SEM=[ACCION='avanzar', DISTANCIA=[CANTIDAD='diez', UNIDAD='metro'], VELOCIDAD=?v]]
├── gramatica_base.fcfg # Feature-based grammar defining the accepted command structures
├── programa_base.py # Base Python script that loads the grammar and parses the input sentences
├── text-input-train.txt # Training sentences used during grammar development
├── text-input-test.txt # Test sentences used to evaluate the grammar on unseen examples
├── text-output-train.txt # Parser output obtained on the training set
├── text-output-test.txt # Parser output obtained on the test set
└── Procedimiento y discusión.pdf # Report describing the methodology, analysis, and results
When applied to the test set (text-input-test.txt), the system correctly analyzes seven of the nine input commands: it partially analyzes one, and fails to recognize another one. See Procedimiento y discusión.pdf for a more detailed discussion.
- Python 3.6
- NLTK (Natural Language Toolkit)
- Feature-based grammars (FCFG)
- Unification-based parsing
This project shows that feature-based grammars with unification are a powerful approach for mapping natural language commands into structured semantic representations.
However:
-
Grammar rules can become complex and repetitive
-
Handling implicit meaning and lexical variation requires further refinement