Skip to content
@DeepExperience

DeepExperience

Welcome to DeepExperience

Introduction

The DeepExperience team is dedicated to a dual mission: crafting exceptional AI-powered product experiences and pioneering self-evolving AI systems. For us, "experience" is both the end goal and the essential fuel. We aim to create intuitive products that users love, while designing AI that treats every interaction as a learning opportunity, accumulating "experience" to become more capable and intelligent over time.

Research

  • MMSkills is a framework for representing, loading, and using reusable multimodal procedural knowledge for visual agents. Each skill combines textual procedure guidance, compact state-card metadata, and optional visual references. At inference time, the agent keeps only lightweight skill hints in the main context, then opens a temporary skill branch when task state suggests that a skill may help.
  • HyperEyes is a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective.
  • MuSEAgent enhances multimodal agent reasoning by leveraging fine-grained stateful experiences, consisting of two phases: (1) Experience Abstraction, which extracts state-level experiences via hindsight evaluation and builds multi-viewpoint embeddings for each experience; (2) Experience Exploitation, where the agent performs a deep-and-wide search over the experience bank to determine the next action at inference time.
  • REAL is a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) that reformulates policy optimization as a classification problem. By treating verifiable rewards as categorical labels rather than scalar weights, REAL addresses fundamental gradient mismatches in existing GRPO-style methods and achieves superior training stability and performance on mathematical reasoning tasks.
  • OmniGAIA is a comprehensive benchmark designed to evaluate the capabilities of omni-modal general AI assistants. Unlike existing benchmarks that focus on a single modality, OmniGAIA requires agents to jointly reason over video, audio, and image inputs while leveraging external tools such as web search and code execution. We also introduce OmniAtlas, an agentic reasoning system that extends a base LLM with active perception tools, enabling the model to request and examine additional media segments during multi-step reasoning.
  • Video-Thinker is an end-to-end video reasoning framework that empowers MLLMs to autonomously leverage intrinsic "grounding" and "captioning" capabilities during inference. This paradigm extends "Thinking with Images" to video understanding, enabling dynamic temporal navigation and visual cue extraction without relying on external tools or pre-designed prompts.
  • Agent2World is a tool-augmented multi-agent framework for generating executable symbolic world models (e.g., PDDL domains and runnable simulators) from natural language specs. It grounds generation in execution-based feedback to catch behavior-level errors missed by static validation.
  • DeepAgent is an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single, coherent reasoning process. This paradigm shifts away from traditional, predefined workflows (e.g., ReAct's "Reason-Act-Observe" cycle), allowing the agent to maintain a global perspective on the entire task and dynamically discover tools on an as-needed basis.
  • LoopTool is a fully automated, model-aware data evolution framework that closes the data–training loop for robust LLM tool calls by tightly integrating data synthesis and model training through synergistic modules like Capability Probing and Error-Driven Data Expansion. This paradigm shifts away from traditional static synthetic data pipelines, enabling the adaptive diagnosis of model weaknesses and the progressive purification of the dataset to dramatically enhance tool-use capabilities.

Pinned Loading

  1. DeepAgent DeepAgent Public

    Forked from RUC-NLPIR/DeepAgent

    🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets

    Python 53 3

  2. Video-Thinker Video-Thinker Public

    Forked from shijian2001/Video-Thinker

    Sparking "Thinking with Videos" via Reinforcement Learning

    Python 8 2

  3. LoopTool LoopTool Public

    Python 65 4

  4. agent2world agent2world Public

    🪐 Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback

    Python 22 4

Repositories

Showing 10 of 10 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…