Skip to content

QDB-18722 - [python] - remove deprecated API code - fix bugs#115

Open
vikonix wants to merge 13 commits intomasterfrom
sc-18722/-python-remove-deprecated-api-2
Open

QDB-18722 - [python] - remove deprecated API code - fix bugs#115
vikonix wants to merge 13 commits intomasterfrom
sc-18722/-python-remove-deprecated-api-2

Conversation

@vikonix
Copy link
Copy Markdown
Contributor

@vikonix vikonix commented Apr 7, 2026

No description provided.

@vikonix vikonix requested a review from solatis April 7, 2026 15:28
Copy link
Copy Markdown
Contributor

@solatis solatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly concerned about starting to support raw object conversions in our c++ code, I don't think that's a great approach.

What exactly is the problem with binary data that needs to be fixed?

Doing this special-case conversion directly inside the masked array is the wrong place, if you implement this in convert/value.hpp (but.. i think you already had added a py::object based conversion there?), all array, masked array, etc conversions will automatically make use of that.

Comment on lines +189 to +199
auto xform = [](delegate_value_type const & x) -> value_type {
if (Delegate::is_null(x))
{
return Delegate::null_value();
return To::null_value();
}
else
{
return static_cast<delegate_value_type>(x);
return static_cast<value_type>(x);
}
};
return ranges::views::transform(xform) | delegate();
return delegate() | ranges::views::transform(xform);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is being fixed here? This appears to change 'A -> B' conversion into 'B -> A', is that intentional? Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the qdb -> numpy conversion path for delegated dtypes such as int16, int32, and float32.

For delegated dtypes, conversion must go through the delegate type first. For example, int16 is handled via int64: the QDB value is first converted to int64, then the intermediate int64 value is checked against the delegate null sentinel, and only non-null values are narrowed to int16.

Previously, this pipeline was composed in the wrong order. The narrowing transform was applied too early, so QDB null sentinels could be converted into ordinary narrowed values such as 0 instead of being preserved as nulls. This caused masked values to disappear in roundtrip tests.

The pipeline now works in the correct order:

convert from the QDB primitive to the delegate dtype
detect null at the delegate stage
emit the destination null sentinel for null values
otherwise cast to the final dtype

The fix is therefore not just the null check itself, but the placement of that check in the conversion pipeline. Null handling now happens at the correct stage, which preserves masks for narrower numpy dtypes.

Comment on lines +576 to +590
{
using dtype = traits::pyobject_dtype;

value_converter<py::bytes, qdb_blob_t> delegate_{};

inline qdb_blob_t operator()(py::object const & x) const
{
if (dtype::is_null(x))
{
return traits::null_value<qdb_blob_t>();
}

return delegate_(py::cast<py::bytes>(x));
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is being fixed here? Why isn't the is-null check of py::bytes enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds an explicit traits::pyobject_dtype -> qdb_blob_t converter to correctly handle NumPy object arrays when writing blob data.

Previously, object-typed values were passed directly through py::castpy::bytes without a null check. As a result, Python nulls such as None were not recognized and were incorrectly sent through the bytes conversion path.

With this implementation, nulls are handled explicitly: values are first checked via dtype::is_null(x), and None is converted to traits::null_value<qdb_blob_t>(). Non-null values are delegated to the existing py::bytes -> qdb_blob_t converter, so the byte conversion logic remains unchanged.

This fixes blob writes from dtype=object arrays and other object-based inputs by preserving null semantics instead of forcing all values through the bytes conversion path.

"Forced dtype provided for column '%s' = %s, but that column is not found in the table. Skipping...",
k,
)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good

Comment thread quasardb/masked_array.hpp
Comment on lines +506 to +521
if constexpr (std::is_same_v<Dtype, traits::pyobject_dtype>)
{
auto begin = static_cast<PyObject * const *>(xs.data());
auto end = begin + xs.size();

// NumPy object arrays store raw PyObject * values, not py::object wrappers.
for (auto cur = begin; cur != end; ++cur, ++p_ret)
{
*p_ret = (*cur == nullptr) || (*cur == Py_None);
}

return mask::of_array(py::cast<py::array>(ret));
}

using value_type = typename Dtype::value_type;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:o this does a huge array copy, right? Why can't PyObject * be interpreted as py::object -- py::object is PyObject.

This seems like a very expensive conversion. Why is it not integrated into the convert facilities, and why isn't it explained better why we need to do this conversion here?

In the current shape, this requires a much better approach. From what I understand you're sending a number array of objects, which we don't really support / STRONGLY DISCOURAGE, to prevent people from doing extremely performance-terrible stuff, and handle all the sanitization and conversion in numpy/__init__.py, such that we don't have to do these kind of crazy things: we support unicode and blobs both as "raw", continuous allocations. This is a conscious performance vs memory consumption tradeoff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special-case is needed because NumPy object arrays store raw PyObject * values, not py::object wrappers. The generic masked_null() path reads elements as Dtype::value_type, which is valid for regular fixed-width dtypes but not for traits::pyobject_dtype.

For dtype('O'), we therefore read the buffer as PyObject * const * and mark entries as null when the slot is nullptr or Py_None.

This does not introduce an extra object-array copy. masked_null() already allocates a boolean mask and scans the array; this only changes how object-array elements are read during that scan. The fix belongs here because the bug was in mask reconstruction, not in the conversion step that produced the object array.

Comment thread quasardb/reader.cpp Outdated
xs = convert::masked_array<double, traits::float64_dtype>(
ranges::views::counted(column.data.doubles, data.row_count));
break;
case qdb_ts_column_symbol:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the reader emitting "symbol" as a value type? symbol is a column type, not a value type, and the reader only "sees" value types.

In the same way the query api cannot return a "symbol" type for its results, because "symbol" is not a value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment thread tests/test_convert.cpp
auto tmp = convert::point_array<DTYPE, VALUE_TYPE>(input); \
auto tmp2 = convert::point_array<VALUE_TYPE, DTYPE>(tmp); \
output = input; \
output = std::move(tmp2); \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

np.datetime64("2017-01-01T00:00:03", "ns"),
]
)
return m.test_array_recode(quasardb.ColumnType.Blob, np.dtype("O"), (idx, values))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of providing blobs as discrete python objects.

I believe binary / blob data can be supported using:

  • 'S' (zero-terminated blobs, which is... iffy)
  • 'b'
  • 'i8' (i.e. char)

What 'O' means is that all of these are encoded as separate Python objects, and that means that when we access them in the C++ side, each of the value lookups is super expensive as we cannot just access numpy's native arrays, which defeats the whole purpose of using numpy -- we may as well just use a regular list. It's easily 100x slower (and use a lot more memory), so I prefer to just make it impossible to do, and do the necessary conversion in numpy/init.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used np.dtype("O") here only to check the exact regression path we were fixing: pyobject_dtype -> qdb_blob_t plus readback mask reconstruction.

I am not proposing object arrays as a good blob representation. I agree they are much slower.

If we want to forbid blob writes from dtype('O'), I will turn this into a rejection test instead.
As long as that path is still accepted by the code, though, I think it is worth having a regression test for it.

@vikonix vikonix requested a review from solatis April 8, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants