Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 39 additions & 14 deletions codebasin/util.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2019-2024 Intel Corporation
# Copyright (C) 2019-2025 Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause
"""
Contains utility functions for common operations, including:
Expand All @@ -21,7 +21,7 @@
log = logging.getLogger(__name__)


def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this, but I'm curious. Do conventions say to specify -> None in the case of no return types? I'm not sure which style guides to look at here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought there was that Python functions always return None, even if they don't have a return statement. For the developer it seems to be better to be explicit, since then you remove the ambiguity of "did we leave out the return signature, or does it really return None"?

Seems like mypy recommends this as well, even if we don't use mypy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's good enough for me!

"""
Ensure that a path has one of the specified extensions.

Expand All @@ -33,11 +33,6 @@ def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
extensions: Iterable[str]
The valid extensions to test against.

Returns
-------
bool
True if `path` is a file with one of the specified extensions.

Raises
------
TypeError
Expand All @@ -56,10 +51,10 @@ def ensure_ext(path: os.PathLike[str], extensions: Iterable[str]):
extension = "".join(path.suffixes)
if extension not in extensions:
exts = ", ".join([f"'{ext}'" for ext in extensions])
raise ValueError(f"{path} does not have a valid extension: f{exts}")
raise ValueError(f"{path} does not have a valid extension: {exts}")


def safe_open_write_binary(fname):
def safe_open_write_binary(fname: os.PathLike[str]) -> typing.BinaryIO:
"""Open fname for (binary) writing. Truncate if not a symlink."""
fpid = os.open(
fname,
Expand All @@ -69,8 +64,34 @@ def safe_open_write_binary(fname):
return os.fdopen(fpid, "wb")


def valid_path(path):
"""Return true if the path passed in is valid"""
def valid_path(path: os.PathLike[str]) -> bool:
"""
Check if a given file path is valid.

This function ensures that the file path does not contain
potentially dangerous characters such as null bytes (`\x00`)
or carriage returns/line feeds (`\n`, `\r`).

Parameters
----------
path : os.PathLike[str]
The file path to be validated.

Returns
-------
bool
A boolean value indicating whether the path is valid
(`True`) or invalid (`False`).

Examples
--------
>>> valid_path("/home/user/file.txt")
True
>>> valid_path("/home/user/\x00file.txt")
False
>>> valid_path("/home/user/file\n.txt")
False
"""
valid = True

# Check for null byte character(s)
Expand Down Expand Up @@ -201,7 +222,10 @@ def _load_json(file_object: typing.TextIO, schema_name: str) -> object:
return json_object


def _load_toml(file_object: typing.TextIO, schema_name: str) -> object:
def _load_toml(
file_object: typing.TextIO,
schema_name: str,
) -> dict[str, typing.Any]:
"""
Load TOML from file and validate it against a schema.

Expand All @@ -215,8 +239,9 @@ def _load_toml(file_object: typing.TextIO, schema_name: str) -> object:

Returns
-------
Object
The loaded TOML.
dict[str, Any]
The loaded TOML object, represented as a Python
dict with str key/value mappings.
Comment on lines +242 to +244
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about this before, so want to double-check you agree.

I think this was set up as -> object before because when we were loading JSON, the result was not guaranteed to be a dict. For example, a compilation database is an array of objects.

It does seem like TOML must be key-value pairs, and the documentation says that "TOML is designed to map unambiguously to a hash table". So I think this is right...?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's why I left the JSON signature as object, but updated it for the TOML. tomllib.load's signature only returns dict[str, Any], because I think by design it can't return anything else (i.e. keys have to be strings).


Raises
------
Expand Down