Skip to content

sdq/ai-alignment-ui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Building Intelligent User Interfaces for Human-AI Alignment

A List of User Interface Research for Human-AI Alignment.


Overview


RL Agent Alignment

User interfaces for reward learning and preference elicitation to align RL agents.

Classic Pairwise Comparison, "Deep Reinforcement Learning from Human Preferences". arXiv

Christiano et al., NeurIPS 2017. Introduces pairwise comparison of trajectory segments as a human feedback interface for RL, enabling non-experts to train complex behaviors in Atari and simulated robotics with approximately one hour of human oversight.

Cluster Ranking, "Time-Efficient Reward Learning via Visually Assisted Cluster Ranking". arXiv

Zhang et al., NeurIPS 2022 HiLL Workshop. Improves efficiency of human feedback by batching comparisons together through an interactive visualization interface rather than labeling each comparison separately, greatly increasing agent performance given the same labeling time.

Groupwise Comparison, "Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback". arXiv

Kompatscher et al., Computer Graphics Forum, 2025. Enables groupwise evaluation of RL agent trajectories through an exploratory overview with hierarchical clustering and a detailed comparison interface, increasing final rewards by 69.34% over conventional pairwise approaches.


LLM Alignment

User interfaces for collecting and improving human feedback to align language models.

HH-RLHF, "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". arXiv

Bai et al., 2022. Applies preference modeling and RLHF to train helpful and harmless language assistants, with a comparison interface for crowdworkers to choose between model responses and iterative online training with weekly-updated preference models.

InstructGPT, "Training Language Models to Follow Instructions with Human Feedback". arXiv

Ouyang et al., NeurIPS 2022. Fine-tunes language models with human feedback via a labeling interface for demonstrations and rankings, producing InstructGPT models that outperform much larger GPT-3 in human evaluations.

DxHF, "Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition". arXiv

Shi et al., 2025. Addresses cognitive challenges in comparing lengthy text responses by decomposing text into individual claims with visual encoding of relevance and linked similar statements, improving feedback accuracy by ~5%.


Contributing

Contributions are welcome! If you know of a paper on user interfaces for AI alignment that should be included, please submit a pull request or open an issue. You can also directly send me email.

To add a paper, follow this format:

**Short Name**, "Full Paper Title". [![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)](https://arxiv.org/abs/XXXX.XXXXX)
> *Authors, Venue, Year.* Brief description of the UI contribution.

Citation

If you find this repository useful, please consider citing it:

@article{shi2026building,
  title={Building Intelligent User Interfaces for Human-AI Alignment},
  author={Shi, Danqing},
  journal={arXiv preprint arXiv:2602.11753},
  year={2026}
}

About

Building Intelligent User Interfaces for Human-AI Alignment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors