-
Notifications
You must be signed in to change notification settings - Fork 98
Description
Hi PROSE team! I've been working on creating a DSL for web scraping and it currently supports various tree/list operations to find data in a DOM tree. For selecting/filtering elements in my "node lists" I'm currently using the built-in functions Kth and Filter. My current DSL is shown below:
@input ProseHtmlNode tree;
@start IReadOnlyList<ProseHtmlNode> program := rule;
IReadOnlyList<ProseHtmlNode> rule :=
:= Concat(rule, rule)
| MatchNodes(match, nodes) = Filter(\x: ProseHtmlNode => match, nodes)
| nodes
IReadOnlyList<ProseHtmlNode> nodes
:= Children(subTree)
| Descendants(subTree)
| Single(subTree)
ProseHtmlNode subTree
:= tree
| SelectChild(rule, k) = Kth(rule, k)
bool match
:= MatchTag(x, tag)
| MatchAttribute(x, attr)
| True()
This is then able to generate programs like Single(SelectChild(Descendents(tree), 1)) and MatchNodes(MatchTag(x, "div"), Descendants(tree)) but it fails when trying to select the Kth element out of a filtered list. For something like "Give the first node with the tag of 'div'" I'm hoping to generate something like:
Single(SelectChild(MatchNodes(MatchTag(x, "div"), Descendants(tree)), 0))
I currently don't have witness functions for either SelectChild or MatchNodes as I wasn't sure if I needed to write them for built-in functions (there was some mention that the framework could more aggressively optimize the built-ins so I didn't want to interfere with that). I have noticed that it gets close when calling the witness function for MatchTag but the incoming args are never really satisfiable by the predicate. For example, if I was searching for the first "div" node, Filter will clearly be attempting to filter the correct list that contains that "div" node, but MatchNode will get examples like:
("div", true),
("exclude", false),
("div", false),
("div", false)
Which, with the way I have MatchTag implemented, won't contain a single tag that satisfies all the examples. It seems like for this to work, MatchTag would need to receive the examples:
("div", true),
("exclude", false),
("div", true),
("div", true)
and have SelectChild select only the first of the filtered list, but I'm not sure what I need to change to make that happen.
The tree synthesis portion of the repo can be found here if you'd like to take a look: https://github.com/zachwood0s/dom_program_synthesis/tree/main/ProseTutorial/tree_synthesis
Any help or suggestions on how to get this interaction to work would be greatly appreciated!