Skip to content

Commit 720a2a4

Browse files
committed
Daily scrape data update
1 parent db99dfb commit 720a2a4

3 files changed

Lines changed: 200 additions & 0 deletions

File tree

data/hn_nontech_2026-04-27.json

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
"scraped_date": "2026-04-27",
3+
"source": "hacker_news",
4+
"total_scraped": 30,
5+
"nontech_count": 1,
6+
"posts": [
7+
{
8+
"id": "47899844",
9+
"title": "Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)",
10+
"link": "https://github.com/nex-crm/wuphf",
11+
"domain": "github.com",
12+
"author": "najmuzzaman",
13+
"score": 252,
14+
"comment_count": 112,
15+
"created_ts": 1777107233,
16+
"is_internal": false,
17+
"post_text": "I shipped a wiki layer for AI agents that uses markdown + git as the source of truth, with a bleve (BM25) + SQLite index on top. No vector or graph db yet.<p>It runs locally in ~&#x2F;.wuphf&#x2F;wiki&#x2F; and you can git clone it out if you want to take your knowledge with you.<p>The shape is the one Karpathy has been circling for a while: an LLM-native knowledge substrate that agents both read from and write into, so context compounds across sessions rather than getting re-pasted every morning. Most implementations of that idea land on Postgres, pgvector, Neo4j, Kafka, and a dashboard.<p>I wanted to go back to the basics and see how far markdown + git could go before I added anything heavier.<p>What it does:\n-&gt; Each agent gets a private notebook at agents&#x2F;{slug}&#x2F;notebook&#x2F;.md, plus access to a shared team wiki at team&#x2F;.<p>-&gt; Draft-to-wiki promotion flow. Notebook entries are reviewed (agent or human) and promoted to the canonical wiki with a back-link. A small state machine drives expiry and auto-archive.<p>-&gt; Per-entity fact log: append-only JSONL at team&#x2F;entities&#x2F;{kind}-{slug}.facts.jsonl. A synthesis worker rebuilds the entity brief every N facts. Commits land under a distinct &quot;Pam the Archivist&quot; git identity so provenance is visible in git log.<p>-&gt; [[Wikilinks]] with broken-link detection rendered in red.<p>-&gt; Daily lint cron for contradictions, stale entries, and broken wikilinks.<p>-&gt; &#x2F;lookup slash command plus an MCP tool for cited retrieval. A heuristic classifier routes short lookups to BM25 and narrative queries to a cited-answer loop.<p>Substrate choices:\nMarkdown for durability. The wiki outlives the runtime, and a user can walk away with every byte. Bleve for BM25. SQLite for structured metadata (facts, entities, edges, redirects, and supersedes). No vectors yet. The current benchmark (500 artifacts, 50 queries) clears 85% recall@20 on BM25 alone, which is the internal ship gate. sqlite-vec is the pre-committed fallback if a query class drops below that.<p>Canonical IDs are first-class. Fact IDs are deterministic and include sentence offset. Canonical slugs are assigned once, merged via redirect stubs, and never renamed. A rebuild is logically identical, not byte-identical.<p>Known limits:\n-&gt; Recall tuning is ongoing. 85% on the benchmark is not a universal guarantee.<p>-&gt; Synthesis quality is bounded by agent observation quality. Garbage facts in, garbage briefs out. The lint pass helps. It is not a judgment engine.<p>-&gt; Single-office scope today. No cross-office federation.<p>Demo. 5-minute terminal walkthrough that records five facts, fires synthesis, shells out to the user&#x27;s LLM CLI, and commits the result under Pam&#x27;s identity: <a href=\"https:&#x2F;&#x2F;asciinema.org&#x2F;a&#x2F;vUvjJsB5vtUQQ4Eb\" rel=\"nofollow\">https:&#x2F;&#x2F;asciinema.org&#x2F;a&#x2F;vUvjJsB5vtUQQ4Eb</a><p>Script lives at .&#x2F;scripts&#x2F;demo-entity-synthesis.sh.<p>Context. The wiki ships as part of WUPHF, an open source collaborative office for AI agents like Claude Code, Codex, OpenClaw, and local LLMs via OpenCode. MIT, self-hosted, bring-your-own keys. You do not have to use the full office to use the wiki layer. If you already have an agent setup, point WUPHF at it and the wiki attaches.<p>Source: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;nex-crm&#x2F;wuphf\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;nex-crm&#x2F;wuphf</a><p>Install: npx wuphf@latest<p>Happy to go deep on the substrate tradeoffs, the promotion-flow state machine, the BM25-first retrieval bet, or the canonical-ID stability rules. Also happy to take &quot;why not an Obsidian vault with a plugin&quot; as a fair question.",
18+
"is_ask_hn": false,
19+
"matched_keywords": [
20+
"promotion",
21+
"team"
22+
],
23+
"comments": []
24+
}
25+
]
26+
}

data/newsletters_2026-04-27.json

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
{
2+
"scraped_date": "2026-04-27",
3+
"source": "newsletters",
4+
"feeds_checked": [
5+
"simon_willison",
6+
"sean_goedecke",
7+
"rachel_by_the_bay",
8+
"mitchell_hashimoto",
9+
"matklad",
10+
"hillel_wayne",
11+
"paul_graham",
12+
"experimental_history",
13+
"anil_dash",
14+
"pragmatic_engineer",
15+
"leaddev",
16+
"staffeng",
17+
"engineering_managers",
18+
"software_lead_weekly",
19+
"steve_blank"
20+
],
21+
"total_articles": 255,
22+
"relevant_count": 8,
23+
"articles": [
24+
{
25+
"feed_id": "sean_goedecke",
26+
"feed_name": "Sean Goedecke",
27+
"title": "Luddites and burning down AI datacenters",
28+
"summary": "Is it time to start burning down datacenters? Some people think so. An Indianapolis city council member had his house recently shot up for supporting datacenters, and Sam Altman’s home was firebombed (and then shot) shortly afterwards. People from all sides of the argument are sounding the alarm about imminent violence. The obvious historical comparison is Luddism, the 19th-century phenomenon where English weavers and knitters destroyed the machines that were automating their work, and (in some ",
29+
"link": "https://seangoedecke.com/luddites-and-ai-datacenters/",
30+
"published": "2026-04-22T00:00:00+00:00",
31+
"matched_keywords": [
32+
"conflict",
33+
"leadership"
34+
],
35+
"relevance_score": 4,
36+
"feed_focus": [
37+
"large tech orgs",
38+
"career growth",
39+
"engineering culture"
40+
]
41+
},
42+
{
43+
"feed_id": "leaddev",
44+
"feed_name": "LeadDev",
45+
"title": "The reality of being a staff engineer",
46+
"summary": "It’s often like herding cats. The post The reality of being a staff engineer appeared first on LeadDev.",
47+
"link": "https://leaddev.com/career-development/the-reality-of-being-a-staff-engineer?utm_source=leaddev&utm_medium=RSS",
48+
"published": "2026-04-24T07:51:21+00:00",
49+
"matched_keywords": [
50+
"staff engineer"
51+
],
52+
"relevance_score": 3,
53+
"feed_focus": [
54+
"engineering management",
55+
"org maturity",
56+
"leadership"
57+
]
58+
},
59+
{
60+
"feed_id": "simon_willison",
61+
"feed_name": "Simon Willison's Weblog",
62+
"title": "Serving the For You feed",
63+
"summary": "Serving the For You feed One of Bluesky's most interesting features is that anyone can run their own custom \"feed\" implementation and make it available to other users - effectively enabling custom algorithms that can use any mechanism they like to recommend posts. spacecowboy runs the For You Feed, used by around 72,000 people. This guest post on the AT Protocol blog explains how it works. The architecture is fascinating. The feed is served by a single Go process using SQLite on a \"gaming\" PC in",
64+
"link": "https://simonwillison.net/2026/Apr/24/serving-the-for-you-feed/#atom-everything",
65+
"published": "2026-04-24T01:08:17+00:00",
66+
"matched_keywords": [
67+
"architecture",
68+
"platform"
69+
],
70+
"relevance_score": 2,
71+
"feed_focus": [
72+
"ai leverage",
73+
"engineering strategy",
74+
"platform thinking"
75+
]
76+
},
77+
{
78+
"feed_id": "simon_willison",
79+
"feed_name": "Simon Willison's Weblog",
80+
"title": "An update on recent Claude Code quality reports",
81+
"summary": "An update on recent Claude Code quality reports It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems. The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users. Anthropic's postmortem describes these in detail. This one in particular stood out to me: On March 26, we shipped a change to clear Claude",
82+
"link": "https://simonwillison.net/2026/Apr/24/recent-claude-code-quality-reports/#atom-everything",
83+
"published": "2026-04-24T01:31:25+00:00",
84+
"matched_keywords": [
85+
"postmortem"
86+
],
87+
"relevance_score": 1,
88+
"feed_focus": [
89+
"ai leverage",
90+
"engineering strategy",
91+
"platform thinking"
92+
]
93+
},
94+
{
95+
"feed_id": "simon_willison",
96+
"feed_name": "Simon Willison's Weblog",
97+
"title": "Is Claude Code going to cost $100/month? Probably not - it's all very confusing",
98+
"summary": "Anthropic today quietly (as in silently, no announcement anywhere at all) updated their claude.com/pricing page (but not their Choosing a Claude plan page, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, and it's already reverted): The Internet Archive copy from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max pl",
99+
"link": "https://simonwillison.net/2026/Apr/22/claude-code-confusion/#atom-everything",
100+
"published": "2026-04-22T02:07:34+00:00",
101+
"matched_keywords": [
102+
"feedback"
103+
],
104+
"relevance_score": 1,
105+
"feed_focus": [
106+
"ai leverage",
107+
"engineering strategy",
108+
"platform thinking"
109+
]
110+
},
111+
{
112+
"feed_id": "sean_goedecke",
113+
"feed_name": "Sean Goedecke",
114+
"title": "Software engineering may no longer be a lifetime career",
115+
"summary": "I don’t think there’s compelling evidence that using AI makes you less intelligent overall1. However, it seems pretty obvious that using AI to perform a task means you don’t learn as much about performing that task. Some software engineers think this is a decisive argument against the use of AI. Their argument goes something like this: Using AI means you don’t learn as much from your work AI-users thus become less effective engineers over time, as their technical skills atrophy Therefore we shou",
116+
"link": "https://seangoedecke.com/software-engineering-may-no-longer-be-a-lifetime-career/",
117+
"published": "2026-04-24T00:00:00+00:00",
118+
"matched_keywords": [
119+
"impact"
120+
],
121+
"relevance_score": 1,
122+
"feed_focus": [
123+
"large tech orgs",
124+
"career growth",
125+
"engineering culture"
126+
]
127+
},
128+
{
129+
"feed_id": "pragmatic_engineer",
130+
"feed_name": "The Pragmatic Engineer",
131+
"title": "Learnings from conducting ~1,000 interviews at Amazon",
132+
"summary": "Steve Huynh, formerly Principal Engineer at Amazon, shares observations from 10+ years of interviewing software engineers, and an excerpt from his new book, Technical Behavioral Interview",
133+
"link": "https://newsletter.pragmaticengineer.com/p/learnings-from-conducting-1000-interviews",
134+
"published": "2026-04-21T12:49:16+00:00",
135+
"matched_keywords": [
136+
"principal engineer"
137+
],
138+
"relevance_score": 1,
139+
"feed_focus": [
140+
"engineering leadership",
141+
"career growth",
142+
"org strategy"
143+
]
144+
},
145+
{
146+
"feed_id": "leaddev",
147+
"feed_name": "LeadDev",
148+
"title": "The end of the non-technical engineering manager",
149+
"summary": "The rising bar for engineering managers. The post The end of the non-technical engineering manager appeared first on LeadDev.",
150+
"link": "https://leaddev.com/career-development/the-end-of-the-non-technical-engineering-manager?utm_source=leaddev&utm_medium=RSS",
151+
"published": "2026-04-20T11:44:21+00:00",
152+
"matched_keywords": [
153+
"engineering manager"
154+
],
155+
"relevance_score": 1,
156+
"feed_focus": [
157+
"engineering management",
158+
"org maturity",
159+
"leadership"
160+
]
161+
}
162+
]
163+
}

data/reddit_2026-04-27.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"scraped_date": "2026-04-27",
3+
"source": "reddit",
4+
"subreddits": [
5+
"experienceddevs",
6+
"cscareerquestions",
7+
"managers"
8+
],
9+
"total_posts": 0,
10+
"posts": []
11+
}

0 commit comments

Comments
 (0)