Skip to content

fix: word_count, paths, and __init__.py for importability#35

Open
ljluestc wants to merge 127 commits intowdndev:mainfrom
ljluestc:main
Open

fix: word_count, paths, and __init__.py for importability#35
ljluestc wants to merge 127 commits intowdndev:mainfrom
ljluestc:main

Conversation

@ljluestc
Copy link

@ljluestc ljluestc commented Mar 8, 2026

Summary

Bug fixes for RAG ingestion and path handling.

Changes

  • scripts/convert_md_to_rag.py: Fix word_count to use len(content.split()) (words, not chars)
  • scripts/ingest_devops_interview.py: Replace hardcoded path with Path(__file__).resolve().parent and _PROJECT_ROOT fallback
  • Add rag_system/__init__.py and scripts/__init__.py for importability

Made with Cursor

ljluestc and others added 30 commits March 7, 2026 10:26
fix the mistake of latex grammer
- Add RAG dataset schema documentation (data/RAG_SCHEMA.md)
- Convert 82 markdown documents to RAG-ready JSONL format
- Generate 10 Q&A pairs from interview content
- Implement complete RAG engine with:
  * Multilingual embedding generation (sentence-transformers)
  * Vector similarity search (FAISS + numpy fallback)
  * Semantic reranking for improved relevance
  * Answer generation with source attribution
- Add conversion script (scripts/convert_md_to_rag.py)
- Add comprehensive documentation (data/README.md)
- Add requirements.txt for dependencies

Dataset Statistics:
- 82 documents across 11 categories
- 10 Q&A pairs extracted
- Full metadata with keywords, difficulty, URLs

Usage:
  python scripts/convert_md_to_rag.py  # Convert markdown
  python rag_system/rag_engine.py     # Run RAG demo

Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
- Add scripts/ingest_arxiv_papers.py for ingesting arXiv papers
- Update all_documents.jsonl and all_qa_pairs.jsonl with new content

Made-with: Cursor
Sources:
- devops_rag_answered_full_405.json (405 zh-CN, 24 categories)
- devops_quiz_10_en.json (10 en, merged into zh records)
- jenkins_beginner_24_33_en.json (10 en)
- jenkins_advanced_34_40_en.json (7 en)

RAG totals: 517 documents, 451 QA pairs

Co-Authored-By: Oz <oz-agent@warp.dev>
…init__.py

- convert_md_to_rag.py: word_count uses len(content.split()) instead of len(content)
- ingest_devops_interview.py: replace hardcoded /home path with relative path + env var
- Add __init__.py for rag_system/ and scripts/ to support module imports

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants