fix: word_count, paths, and __init__.py for importability by ljluestc · Pull Request #35 · wdndev/llm_interview_note

ljluestc · 2026-03-08T04:57:36Z

Summary

Bug fixes for RAG ingestion and path handling.

Changes

scripts/convert_md_to_rag.py: Fix word_count to use len(content.split()) (words, not chars)
scripts/ingest_devops_interview.py: Replace hardcoded path with Path(__file__).resolve().parent and _PROJECT_ROOT fallback
Add rag_system/__init__.py and scripts/__init__.py for importability

Made with Cursor

Update llama系列模型.md

fix the mistake of latex grammer

- Add RAG dataset schema documentation (data/RAG_SCHEMA.md) - Convert 82 markdown documents to RAG-ready JSONL format - Generate 10 Q&A pairs from interview content - Implement complete RAG engine with: * Multilingual embedding generation (sentence-transformers) * Vector similarity search (FAISS + numpy fallback) * Semantic reranking for improved relevance * Answer generation with source attribution - Add conversion script (scripts/convert_md_to_rag.py) - Add comprehensive documentation (data/README.md) - Add requirements.txt for dependencies Dataset Statistics: - 82 documents across 11 categories - 10 Q&A pairs extracted - Full metadata with keywords, difficulty, URLs Usage: python scripts/convert_md_to_rag.py # Convert markdown python rag_system/rag_engine.py # Run RAG demo Co-Authored-By: Oz <oz-agent@warp.dev>

Co-Authored-By: Oz <oz-agent@warp.dev>

- Add scripts/ingest_arxiv_papers.py for ingesting arXiv papers - Update all_documents.jsonl and all_qa_pairs.jsonl with new content Made-with: Cursor

Sources: - devops_rag_answered_full_405.json (405 zh-CN, 24 categories) - devops_quiz_10_en.json (10 en, merged into zh records) - jenkins_beginner_24_33_en.json (10 en) - jenkins_advanced_34_40_en.json (7 en) RAG totals: 517 documents, 451 QA pairs Co-Authored-By: Oz <oz-agent@warp.dev>

Made-with: Cursor

…init__.py - convert_md_to_rag.py: word_count uses len(content.split()) instead of len(content) - ingest_devops_interview.py: replace hardcoded /home path with relative path + env var - Add __init__.py for rag_system/ and scripts/ to support module imports Co-Authored-By: Oz <oz-agent@warp.dev>

Made-with: Cursor

lerogo and others added 30 commits April 22, 2024 14:29

Update llama系列模型.md

c257c73

Merge pull request wdndev#3 from lerogo/patch-1

5171252

Update llama系列模型.md

online

c1793ca

test

dd345a8

online llm

9fbd0b0

update

0ce6a11

readme

7376004

ch7

8ed25b8

content

8559a37

index

f7f9d81

index

5cfdc9e

sub sidebar

4381052

sub max level

4fca0e9

update

f9658e8

navbar

12dbc41

online llm

994c953

llm course

0f61f34

llm course

39f9e5f

llm

48134fc

crossChapter

d962ae3

index

53e9e88

update

b7ce85c

prefix lm

c756721

1.2

bdd2ff4

1.2

167c72c

Update README.md

72c7714

readme

b206869

补充训练数据集关于数据增强的方法

4aad111

ai note

2bc44c5

readme

407b1c0

ljluestc and others added 30 commits March 7, 2026 10:26

llm

eb068bd

crossChapter

50249d4

index

c253952

update

2ccc556

prefix lm

e9cfe56

1.2

423a14f

1.2

e5d48aa

Update README.md

d64aa55

readme

a7852e5

补充训练数据集关于数据增强的方法

dfd518c

ai note

e04e8b1

readme

8d4ff91

readme

9b01907

Update 2.layer_normalization.md

5602f66

fix the mistake of latex grammer

Update 2.layer_normalization.md

b13e0cd

fix grammer mistake in 2.layer_normalization.md

e2c929b

fix 2.layer_normalization.md

6b01ffb

Update 2.layer_normalization.md

065651b

Update 2.layer_normalization.md

a1757c6

fix grammer mistake in 2.layer_normalization.md

48df69d

Update llm推理优化技术.md

73cc876

docs: Add RAG implementation summary

bd443d2

Co-Authored-By: Oz <oz-agent@warp.dev>

Merge branch 'add-rag-system'

aa67e08

Merge branch 'main' of github.com:ljluestc/llm_interview_note

db1027a

feat: add arXiv papers ingestion script and update processed data

1ecac23

- Add scripts/ingest_arxiv_papers.py for ingesting arXiv papers - Update all_documents.jsonl and all_qa_pairs.jsonl with new content Made-with: Cursor

docs: add fork note and RAG/ljluestc tiny-mcp links

62d6e66

Made-with: Cursor

fix: use Path for RAG_DIR and robust DEVOPS_DATA fallback

aa2b3c7

Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: word_count, paths, and init.py for importability#35

fix: word_count, paths, and init.py for importability#35
ljluestc wants to merge 127 commits intowdndev:mainfrom
ljluestc:main

ljluestc commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ljluestc commented Mar 8, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants