corrigibility

Here are 8 public repositories matching this topic...

leenathomas01 / Stability-Before-Alignment

Structural stability architecture for self-modifying optimisation systems. Defines structural, dynamic, and perceptual control constraints that preserve coherence and stability before value alignment.

complex-systems control-theory ai-safety system-design robustness autonomous-systems adaptive-systems ai-alignment systems-thinking ai-governance system-stability corrigibility self-modifying-systems

Updated May 15, 2026
Python

Aliipou / authgate-kernel

Star

Capability-security kernel for autonomous agents — seccomp/SELinux for agentic AI. Formal, auditable, language-agnostic, cryptographically verifiable.

rust openai formal-verification tla-plus object-capabilities pyo3 capability-security ai-governance agi-safety langchain anthropic corrigibility

Updated Jun 20, 2026
Python

JamesCrasher / supplied-not-discovered

Star

Sixteen small, fully-reproducible (CPU, numpy-only) experiments showing the normative anchor of AI alignment is supplied, not discovered — across verification, optimization, social emergence, and value learning. Includes a preregistered experiment with an honest negative. A synthesis, not a novelty claim.

python reproducible-research ai-safety interpretability ai-alignment corrigibility moral-uncertainty value-learning

Updated Jun 18, 2026
Python

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

Star

A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis

machine-learning artificial-intelligence research-paper ai-safety deception ai-alignment recursive-self-improvement corrigibility alignment-research

Updated Feb 1, 2026

MaxwellCalkin / alignment-evals

Star

Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals

machine-learning evaluation alignment ai-safety ai-alignment llm sycophancy corrigibility

Updated Mar 2, 2026
Python

tretoef-estrella / THE-ANT-AND-THE-ASI

Star

On the infantile expectation of controlling what we cannot comprehend. A philosophical critique of the ASI control paradigm, developed through four-AI adversarial debate. Extension of the Coherence Basin Hypothesis

philosophy asi ai-safety ai-alignment control-problem superintelligence corrigibility proyecto-estrella epistemic-asymmetry coherence-basin-hypothesis four-ai-debate

Updated Feb 2, 2026

bethediamond / ai-alignment-landscape

Star

Toy 7. An elimination-filter landscape applying two structural constraints simultaneously to map which objective classes can persist under sustained optimization pressure — and which cannot. Includes a four-stage scenario engine and open-question frontier. Companion simulation for The Shape of What Does Not End — Series 2, Part 4.

Updated May 28, 2026
HTML

Kirill-Kruglov / ascesis

Star

Research trail of honest bridges in AI alignment: pre-registered toy experiments + field ownership. Current: a type-blind arbiter holding population equilibrium against reward-hacking under hard optimization

research reinforcement-learning multi-agent ai-safety ai-alignment value-alignment reward-hacking replicator-dynamics corrigibility alignment-research goodharts-law

Updated Jun 21, 2026
Python

Improve this page

Add a description, image, and links to the corrigibility topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corrigibility topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrigibility

Here are 8 public repositories matching this topic...

leenathomas01 / Stability-Before-Alignment

Aliipou / authgate-kernel

JamesCrasher / supplied-not-discovered

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

MaxwellCalkin / alignment-evals

tretoef-estrella / THE-ANT-AND-THE-ASI

bethediamond / ai-alignment-landscape

Kirill-Kruglov / ascesis

Improve this page

Add this topic to your repo