This was seen when viewing a VNC session inside a canvas element.
Since the entire 'VNC video' is a single canvas element, we cannot visually click the elements inside the canvas using this MCP. A similar problem when an image tag has <area> tags defined with it. We cannot click at specific location within the image.
Since browser_click is element based, can we trigger a mouse_click event at specific coordinates within an element?
I tried VNC-mcp (which has such support), but it did not handle the image scaling correctly.
Similarly, the canvas element does not support typing directly. So, AI agent falls back to individual press_key operations, which become very slow.
I am not sure if this can be optimized to send the string faster. Maybe the MCP server can receive the string together and perform press_key internally.