Grew-TSE

Python Package for Targeted Syntactic Evaluation of LLMs Cross-Linguistically via Treebank Querying.

Last updated: Jan 1, 0001

Grew-TSE, or Grew for Targeted Syntactic Evaluation, is a Python package that generates minimal-pair datasets from Universal Dependency treebanks with user-defined queries.

This can be used for the evaluation of a language model's performance on syntactic tasks, whether that be masked- or prompt-based. Its key advantage is the use of the [Grew query language](https://grew.fr/), allowing users to specify the exact syntactic construction and target feature they want to evaluate in a relatively easily-interpretable format. This means that you, your friends, and your family can all specify syntactic constructions they want to evaluate LLMs on. See an example below.

/* a grew query to find subject-verb-object constructions with accusative object*/
pattern {
  V [upos=VERB];
  OBJ [Case="Acc"]
  V -[nsubj]-> SUBJ;
  V -[obl]-> OBJ;
  SUBJ << V;
  OBJ >> V;
}

If we want to generate a minimal-pair dataset with the above query, we can use the package like so:


from grewtse.pipeline import GrewTSEPipe

grewtse = GrewTSEPipe()
grewtse.parse_treebank("./treebanks/example.conllu")

grew_query = 
"""
    pattern {
      V [upos=VERB];
      OBJ [Case="Acc"]
      V -[nsubj]-> SUBJ;
      V -[obl]-> OBJ;
      SUBJ << V;
      OBJ >> V;
    }
"""
target_node = "OBJ"

grewtse.generate_masked_dataset(grew_query, target_node)
final_dataset = grewtse.generate_minimal_pairs({ 'case': 'Dat' }, {})

You can find more information on the GitHub page here