Feature Request
Currently markgrab does not check robots.txt before fetching. Add an optional respect_robots=True parameter that:
- Fetches and parses
robots.txt for the target domain
- Checks if the URL path is allowed for the configured user agent
- Raises
RobotsDisallowed or silently skips if disallowed
This should be opt-in (default False) to maintain backward compatibility.
Motivation
Legal compliance for production deployments. Currently documented in Disclaimer but not enforced.
Feature Request
Currently markgrab does not check
robots.txtbefore fetching. Add an optionalrespect_robots=Trueparameter that:robots.txtfor the target domainRobotsDisallowedor silently skips if disallowedThis should be opt-in (default
False) to maintain backward compatibility.Motivation
Legal compliance for production deployments. Currently documented in Disclaimer but not enforced.