Skip to content
@DataArcTech

DataArcTech

Welcome to DataArc Tech Inc.

⚡DataArcTech⚡

👉 Data-Driven, Intelligently Synthesized

🔥 We specialize in intelligent synthetic data generation and knowledge-augmented LLM reasoning technologies.

🌟 With a focus on context graphs and multi-agent systems, we build more efficient and trustworthy next-generation data and model infrastructure.

🚀 Through open-source projects and in-depth research, we explore the full technical cycle from data synthesis and continual pre-training to model evaluation.

👋 Join us in contributing high-quality algorithms, data, and insights to the open-source community.

         

Popular repositories Loading

  1. DataArc-SynData-Toolkit DataArc-SynData-Toolkit Public

    Synthetic Data Generation Platform By DataArcTech

    Python 943 18

  2. ToG ToG Public

    This is the official github repo of Think-on-Graph (ICLR 2024). If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email (xuchengj…

    Python 620 69

  3. LLM-as-a-Judge LLM-as-a-Judge Public

    168 5

  4. SQL-R1 SQL-R1 Public

    [NeurIPS'25] Official Repository for the Paper "SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning"

    Python 123 15

  5. ToG-2 ToG-2 Public

    Python 103 17

  6. ChartMoE ChartMoE Public

    [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding

    Jupyter Notebook 95 8

Repositories

Showing 10 of 31 repositories
  • RAG-ARC Public

    A modular, high-performance Retrieval-Augmented Generation framework with multi-path retrieval, graph extraction, and fusion ranking

    DataArcTech/RAG-ARC’s past year of commit activity
    Python 29 MIT 10 1 1 Updated Jan 24, 2026
  • DataArc-SynData-Toolkit Public

    Synthetic Data Generation Platform By DataArcTech

    DataArcTech/DataArc-SynData-Toolkit’s past year of commit activity
    Python 943 18 1 0 Updated Jan 23, 2026
  • DataSet-Gen Public

    用于多PDF的多跳QA对合成

    DataArcTech/DataSet-Gen’s past year of commit activity
    Python 0 0 1 0 Updated Jan 21, 2026
  • DataArcTech/AIPracticePartner’s past year of commit activity
    Go 0 0 0 0 Updated Jan 13, 2026
  • SoE Public

    [EACL'26] Repository for the Paper "Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs"

    DataArcTech/SoE’s past year of commit activity
    Python 3 Apache-2.0 1 1 0 Updated Jan 12, 2026
  • .github Public
    DataArcTech/.github’s past year of commit activity
    0 0 0 0 Updated Jan 9, 2026
  • Awesome-LLMs-for-Mathematical-Modeling Public

    🥇 A curated list of awesome Large Language Models/Agents for Mathematical Modeling tasks, including papers,models,datasets and codebases. 专门用于数学建模任务的大模型/Agent。

    DataArcTech/Awesome-LLMs-for-Mathematical-Modeling’s past year of commit activity
    4 Apache-2.0 0 0 0 Updated Jan 8, 2026
  • RAGExplorer Public Forked from Thymezzz/RAGExplorer

    This is the code repository for the paper "RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems".

    DataArcTech/RAGExplorer’s past year of commit activity
    TypeScript 0 3 0 0 Updated Dec 11, 2025
  • ToG-3 Public

    Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval

    DataArcTech/ToG-3’s past year of commit activity
    Python 71 MIT 9 5 0 Updated Dec 5, 2025
  • SQL-R1 Public

    [NeurIPS'25] Official Repository for the Paper "SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning"

    DataArcTech/SQL-R1’s past year of commit activity
    Python 123 Apache-2.0 15 1 0 Updated Nov 20, 2025

Top languages

Loading…

Most used topics

Loading…