QDB-18722 - [python] - remove deprecated API code - fix bugs by vikonix · Pull Request #115 · bureau14/qdb-api-python

vikonix · 2026-04-07T11:32:40Z

No description provided.

solatis

Mostly concerned about starting to support raw object conversions in our c++ code, I don't think that's a great approach.

What exactly is the problem with binary data that needs to be fixed?

Doing this special-case conversion directly inside the masked array is the wrong place, if you implement this in convert/value.hpp (but.. i think you already had added a py::object based conversion there?), all array, masked array, etc conversions will automatically make use of that.

solatis · 2026-04-08T06:36:21Z

+        auto xform = [](delegate_value_type const & x) -> value_type {
+            if (Delegate::is_null(x))
            {
-                return Delegate::null_value();
+                return To::null_value();
            }
            else
            {
-                return static_cast<delegate_value_type>(x);
+                return static_cast<value_type>(x);
            }
        };
-        return ranges::views::transform(xform) | delegate();
+        return delegate() | ranges::views::transform(xform);


What is being fixed here? This appears to change 'A -> B' conversion into 'B -> A', is that intentional? Why?

I fixed the qdb -> numpy conversion path for delegated dtypes such as int16, int32, and float32.

For delegated dtypes, conversion must go through the delegate type first. For example, int16 is handled via int64: the QDB value is first converted to int64, then the intermediate int64 value is checked against the delegate null sentinel, and only non-null values are narrowed to int16.

Previously, this pipeline was composed in the wrong order. The narrowing transform was applied too early, so QDB null sentinels could be converted into ordinary narrowed values such as 0 instead of being preserved as nulls. This caused masked values to disappear in roundtrip tests.

The pipeline now works in the correct order:

convert from the QDB primitive to the delegate dtype
detect null at the delegate stage
emit the destination null sentinel for null values
otherwise cast to the final dtype

The fix is therefore not just the null check itself, but the placement of that check in the conversion pipeline. Null handling now happens at the correct stage, which preserves masks for narrower numpy dtypes.

solatis · 2026-04-08T06:38:43Z

+{
+    using dtype = traits::pyobject_dtype;
+
+    value_converter<py::bytes, qdb_blob_t> delegate_{};
+
+    inline qdb_blob_t operator()(py::object const & x) const
+    {
+        if (dtype::is_null(x))
+        {
+            return traits::null_value<qdb_blob_t>();
+        }
+
+        return delegate_(py::cast<py::bytes>(x));
+    }
+};


What is being fixed here? Why isn't the is-null check of py::bytes enough?

This change adds an explicit traits::pyobject_dtype -> qdb_blob_t converter to correctly handle NumPy object arrays when writing blob data.

Previously, object-typed values were passed directly through py::castpy::bytes without a null check. As a result, Python nulls such as None were not recognized and were incorrectly sent through the bytes conversion path.

With this implementation, nulls are handled explicitly: values are first checked via dtype::is_null(x), and None is converted to traits::null_value<qdb_blob_t>(). Non-null values are delegated to the existing py::bytes -> qdb_blob_t converter, so the byte conversion logic remains unchanged.

This fixes blob writes from dtype=object arrays and other object-based inputs by preserving null semantics instead of forcing all values through the bytes conversion path.

solatis · 2026-04-08T06:38:57Z

                    "Forced dtype provided for column '%s' = %s, but that column is not found in the table. Skipping...",
                    k,
                )
+                continue


solatis · 2026-04-08T06:43:55Z

+        if constexpr (std::is_same_v<Dtype, traits::pyobject_dtype>)
+        {
+            auto begin = static_cast<PyObject * const *>(xs.data());
+            auto end   = begin + xs.size();
+
+            // NumPy object arrays store raw PyObject * values, not py::object wrappers.
+            for (auto cur = begin; cur != end; ++cur, ++p_ret)
+            {
+                *p_ret = (*cur == nullptr) || (*cur == Py_None);
+            }
+
+            return mask::of_array(py::cast<py::array>(ret));
+        }
+
+        using value_type = typename Dtype::value_type;
+


:o this does a huge array copy, right? Why can't PyObject * be interpreted as py::object -- py::object is PyObject.

This seems like a very expensive conversion. Why is it not integrated into the convert facilities, and why isn't it explained better why we need to do this conversion here?

In the current shape, this requires a much better approach. From what I understand you're sending a number array of objects, which we don't really support / STRONGLY DISCOURAGE, to prevent people from doing extremely performance-terrible stuff, and handle all the sanitization and conversion in numpy/__init__.py, such that we don't have to do these kind of crazy things: we support unicode and blobs both as "raw", continuous allocations. This is a conscious performance vs memory consumption tradeoff.

This special-case is needed because NumPy object arrays store raw PyObject * values, not py::object wrappers. The generic masked_null() path reads elements as Dtype::value_type, which is valid for regular fixed-width dtypes but not for traits::pyobject_dtype.

For dtype('O'), we therefore read the buffer as PyObject * const * and mark entries as null when the slot is nullptr or Py_None.

This does not introduce an extra object-array copy. masked_null() already allocates a boolean mask and scans the array; this only changes how object-array elements are read during that scan. The fix belongs here because the bug was in mask reconstruction, not in the conversion step that produced the object array.

solatis · 2026-04-08T06:44:46Z

            xs = convert::masked_array<double, traits::float64_dtype>(
                ranges::views::counted(column.data.doubles, data.row_count));
            break;
+        case qdb_ts_column_symbol:


Why is the reader emitting "symbol" as a value type? symbol is a column type, not a value type, and the reader only "sees" value types.

In the same way the query api cannot return a "symbol" type for its results, because "symbol" is not a value

solatis · 2026-04-08T06:44:57Z

            auto tmp  = convert::point_array<DTYPE, VALUE_TYPE>(input);          \
            auto tmp2 = convert::point_array<VALUE_TYPE, DTYPE>(tmp);            \
-            output    = input;                                                   \
+            output    = std::move(tmp2);                                         \


solatis · 2026-04-08T06:50:42Z

+            np.datetime64("2017-01-01T00:00:03", "ns"),
+        ]
+    )
+    return m.test_array_recode(quasardb.ColumnType.Blob, np.dtype("O"), (idx, values))


I'm not a big fan of providing blobs as discrete python objects.

I believe binary / blob data can be supported using:

'S' (zero-terminated blobs, which is... iffy)

'b'

'i8' (i.e. char)

What 'O' means is that all of these are encoded as separate Python objects, and that means that when we access them in the C++ side, each of the value lookups is super expensive as we cannot just access numpy's native arrays, which defeats the whole purpose of using numpy -- we may as well just use a regular list. It's easily 100x slower (and use a lot more memory), so I prefer to just make it impossible to do, and do the necessary conversion in numpy/init.py

I used np.dtype("O") here only to check the exact regression path we were fixing: pyobject_dtype -> qdb_blob_t plus readback mask reconstruction.

I am not proposing object arrays as a good blob representation. I agree they are much slower.

If we want to forbid blob writes from dtype('O'), I will turn this into a rejection test instead.
As long as that path is still accepted by the code, though, I think it is worth having a regression test for it.

vikonix added 9 commits April 7, 2026 13:25

blob conv test

e196cc6

fix

a55ed5a

fix 2

948afbb

fix 3

f501e5b

fix 4

3c0a781

result none

dc354a6

variant 1

ab83d22

fix bug 2

2546402

fix more

2b23456

vikonix requested a review from solatis April 7, 2026 15:28

solatis requested changes Apr 8, 2026

View reviewed changes

vikonix requested a review from solatis April 8, 2026 08:35

vikonix added 4 commits April 8, 2026 20:04

fix symbols

e0bba12

check revert 1

9d59029

check revert 2

cbdb5c6

Merge branch 'master' into sc-18722/-python-remove-deprecated-api-2

2d653f5

Conversation

vikonix commented Apr 7, 2026

Uh oh!

solatis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants