Add TechRadar publisher by maelrx · Pull Request #925 · flairNLP/fundus

maelrx · 2026-05-21T07:40:22Z

Summary

Add a UK TechRadar publisher parser.
Register TechRadar with sitemap and news sitemap sources plus a URL filter for noisy paths.
Extract article body, title, authors, publishing date, topics, and images.
Add generated parser test data and update the supported publishers table.

Validation

python -m pytest -q -W ignore::DeprecationWarning -> 795 passed
ruff format --check src -> 231 files already formatted
ruff check src -> All checks passed!
mypy src/fundus/publishers/uk/techradar.py src/fundus/publishers/uk/__init__.py -> Success: no issues found in 2 source files

Note: full local mypy src was also checked on Python 3.12 and reports three existing errors outside this patch (src/fundus/parser/utility.py and src/fundus/publishers/kr/hankook_ilbo.py). I left those unrelated files untouched.

addie9800

Thank you so much for contributing to Fundus! 🚀 This PR already looks really good. I only have a couple of minor changes I would ask you to make and then we can go ahead and merge this PR

addie9800 · 2026-05-21T13:25:15Z

+        domain="https://www.techradar.com/",
+        parser=TechRadarParser,
+        sources=[
+            Sitemap("https://www.techradar.com/sitemap.xml", reverse=True),


I would add a filter such that only sitemaps of this format https://www.techradar.com/sitemap-yyyy-mm.xml are crawled. (You can use the sitemap_filter attribute and regex_filter class). This reduces the load on the website since the remaining sitemaps only contain pages that are unparsable by Fundus anyway

addie9800 · 2026-05-21T13:29:14Z

+            "Those results, though, will likely be below the AI Overviews that already sit atop those classic results. If anything, Overviews may be even richer and more accurate thanks to the intelligent query guidance you received in the search box. Scrolling down below them might be pointless.",
+            "It doesn't take much imagination to envision a future in which the AI Overviews are your Google Search results, and there is nothing below because it's not as useful, or at least it doesn't \"speak\" to you in the same way the overviews do. They seem to get you because they're designed to respond to your intention in a way that traditional search results could never do.",
+            "For some, this is progress. For me? The jury's still out.",
+            "What about you? Share your thoughts on Google's new Intelligent Search Box in the comments below."


I would suggest adding lines like this also to the bloat regex.

addie9800 · 2026-05-21T13:33:31Z

+                upper_boundary_selector=XPath("//article"),
+                image_selector=XPath("//article//figure//img"),
+                caption_selector=XPath("./ancestor::figure//figcaption"),
+                author_selector=re.compile(r"(?i)image credit[s]?: (?P<credits>.*)"),


The credits sometimes seem to end in /. Perhaps you can add an optional slash to the end of the (non-selecting) part of the regex such that it gets filtered out. e.g. here

addie9800 · 2026-05-21T13:36:05Z

+            return self.precomputed.ld.bf_search("headline")
+
+        @attribute
+        def topics(self) -> List[str]:


The list of topics at the end of an article seems to be sometimes be more comprehensive tham the ones that are used in the meta data. e.g. here. If you have a selector for the elements, you can pass that into generic_nodes_to_text

Add TechRadar publisher

ad9c063

addie9800 self-assigned this May 21, 2026

addie9800 requested changes May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TechRadar publisher#925

Add TechRadar publisher#925
maelrx wants to merge 1 commit into
flairNLP:masterfrom
maelrx:add-techradar-publisher

maelrx commented May 21, 2026

Uh oh!

addie9800 left a comment

Uh oh!

addie9800 May 21, 2026

Uh oh!

addie9800 May 21, 2026

Uh oh!

addie9800 May 21, 2026

Uh oh!

addie9800 May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maelrx commented May 21, 2026

Summary

Validation

Uh oh!

addie9800 left a comment

Choose a reason for hiding this comment

Uh oh!

addie9800 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

addie9800 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

addie9800 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

addie9800 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants