Skip to content

Using Filter with Kth #61

@zachwood0s

Description

@zachwood0s

Hi PROSE team! I've been working on creating a DSL for web scraping and it currently supports various tree/list operations to find data in a DOM tree. For selecting/filtering elements in my "node lists" I'm currently using the built-in functions Kth and Filter. My current DSL is shown below:

@input ProseHtmlNode tree;

@start IReadOnlyList<ProseHtmlNode> program := rule;

IReadOnlyList<ProseHtmlNode> rule :=
	:= Concat(rule, rule)
	 | MatchNodes(match, nodes) = Filter(\x: ProseHtmlNode => match, nodes)
	 | nodes

IReadOnlyList<ProseHtmlNode> nodes 
	:= Children(subTree)
         | Descendants(subTree)
	 | Single(subTree)

ProseHtmlNode subTree 
	:= tree
	 | SelectChild(rule, k) = Kth(rule, k)

bool match 
	:= MatchTag(x, tag)
	 | MatchAttribute(x, attr)
	 | True()

This is then able to generate programs like Single(SelectChild(Descendents(tree), 1)) and MatchNodes(MatchTag(x, "div"), Descendants(tree)) but it fails when trying to select the Kth element out of a filtered list. For something like "Give the first node with the tag of 'div'" I'm hoping to generate something like:
Single(SelectChild(MatchNodes(MatchTag(x, "div"), Descendants(tree)), 0))

I currently don't have witness functions for either SelectChild or MatchNodes as I wasn't sure if I needed to write them for built-in functions (there was some mention that the framework could more aggressively optimize the built-ins so I didn't want to interfere with that). I have noticed that it gets close when calling the witness function for MatchTag but the incoming args are never really satisfiable by the predicate. For example, if I was searching for the first "div" node, Filter will clearly be attempting to filter the correct list that contains that "div" node, but MatchNode will get examples like:

("div", true),
("exclude", false),
("div", false),
("div", false)

Which, with the way I have MatchTag implemented, won't contain a single tag that satisfies all the examples. It seems like for this to work, MatchTag would need to receive the examples:

("div", true),
("exclude", false),
("div", true),
("div", true)

and have SelectChild select only the first of the filtered list, but I'm not sure what I need to change to make that happen.

The tree synthesis portion of the repo can be found here if you'd like to take a look: https://github.com/zachwood0s/dom_program_synthesis/tree/main/ProseTutorial/tree_synthesis

Any help or suggestions on how to get this interaction to work would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions