Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .beads/issues.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@
{"id":"vgi-python-g7i","title":"Add validation for contiguous positional indices","description":"Neither argument_specs_to_schema() nor schema_to_argument_specs() validates that positional argument indices are contiguous (0, 1, 2...). Gaps like (0, 2, 3) would serialize fine but might indicate a bug. Consider adding validation that positional indices form a contiguous sequence starting from 0.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-05T11:51:19.868862-05:00","created_by":"rusty","updated_at":"2026-01-05T11:56:01.878179-05:00","closed_at":"2026-01-05T11:56:01.878179-05:00","close_reason":"Closed"}
{"id":"vgi-python-ivf","title":"Add required_settings to function Meta class","description":"Update function metadata to support declaring required DuckDB settings.\n\nChanges needed:\n- Add 'required_settings: list[str]' to FunctionMeta in vgi/metadata.py\n- Update Meta class resolution in vgi/function.py\n- Add validation that required_settings is a list of strings\n- Make it available via get_metadata() for introspection\n\nExample usage:\nclass MyFunction(TableInOutFunction):\n class Meta:\n required_settings = ['timezone', 'threads']","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:47.903747-05:00","created_by":"rusty","updated_at":"2026-01-04T13:20:41.169516-05:00","closed_at":"2026-01-04T13:20:41.169516-05:00","close_reason":"Implementation complete, all tests pass","dependencies":[{"issue_id":"vgi-python-ivf","depends_on_id":"vgi-python-aad","type":"blocks","created_at":"2026-01-04T13:06:13.690253-05:00","created_by":"rusty"}]}
{"id":"vgi-python-j4t","title":"Update client to pass DuckDB settings in Invocation","description":"Update vgi/client/client.py to support passing DuckDB settings.\n\nChanges needed:\n- Add 'duckdb_settings: dict[str, str] | None = None' parameter to relevant methods\n- Include settings in Invocation creation\n- Add helper to query function's required_settings from metadata\n\nThe client needs to know what settings to pass. Options:\n1. Client queries worker for function metadata first\n2. Settings passed explicitly by caller\n3. Client introspects function class if available locally","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:48.358656-05:00","created_by":"rusty","updated_at":"2026-01-04T13:20:41.173178-05:00","closed_at":"2026-01-04T13:20:41.173178-05:00","close_reason":"Implementation complete, all tests pass","dependencies":[{"issue_id":"vgi-python-j4t","depends_on_id":"vgi-python-aad","type":"blocks","created_at":"2026-01-04T13:06:13.761572-05:00","created_by":"rusty"}]}
{"id":"vgi-python-j8a","title":"Investigate named argument field name ambiguity","description":"For named arguments, position is set to field.name (the SQL key). But there's potential ambiguity between the Python attribute name and the named argument key if they could differ. Currently they're the same, but if ArgumentSpec.name (attribute) ever differs from ArgumentSpec.position (key), the schema only preserves one. Investigate whether this is a real concern or document the assumption that they're always equal.","status":"open","priority":4,"issue_type":"task","created_at":"2026-01-05T11:51:20.257539-05:00","created_by":"rusty","updated_at":"2026-01-05T11:51:20.257539-05:00"}
{"id":"vgi-python-j8a","title":"Investigate named argument field name ambiguity","description":"For named arguments, position is set to field.name (the SQL key). But there's potential ambiguity between the Python attribute name and the named argument key if they could differ. Currently they're the same, but if ArgumentSpec.name (attribute) ever differs from ArgumentSpec.position (key), the schema only preserves one. Investigate whether this is a real concern or document the assumption that they're always equal.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-05T11:51:20.257539-05:00","created_by":"rusty","updated_at":"2026-01-05T12:16:59.737914-05:00","closed_at":"2026-01-05T12:16:59.737914-05:00","close_reason":"Closed"}
{"id":"vgi-python-j9k","title":"Add protocol types for IPC stream writers in cli.py","notes":"Line 53: self._writer: Any = None\n\nCould define a Protocol type for the IPC stream writer interface:\n```python\nclass IPCWriter(Protocol):\n def write_batch(self, batch: pa.RecordBatch) -\u003e None: ...\n def close(self) -\u003e None: ...\n```\n\nPart of 14.17% imprecision in cli.py (34 Anys total).","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-04T22:19:50.31711-05:00","created_by":"rusty","updated_at":"2026-01-04T22:37:01.488788-05:00","closed_at":"2026-01-04T22:37:01.488788-05:00","close_reason":"Replaced _writer: Any with _writer: pq.ParquetWriter | None. Removes 1 Any and provides proper type information."}
{"id":"vgi-python-jrf","title":"Add varargs parameter to Arg descriptor","description":"In vgi/arguments.py:\n- Add varargs: bool = False to Arg.__init__ and __slots__\n- Update _resolve() to collect positional[position:] when varargs=True\n- Validate at least 1 value provided\n- Update _validate() to validate each element in tuple\n- Add Arguments.get_varargs(start, type=None) method\n- Update __repr__ to show varargs flag","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-05T10:49:20.012964-05:00","created_by":"rusty","updated_at":"2026-01-05T10:55:22.479344-05:00","closed_at":"2026-01-05T10:55:22.479344-05:00","close_reason":"Implemented varargs parameter in Arg descriptor with get_varargs() method and _validate_single()"}
{"id":"vgi-python-k7x","title":"Use Mapping instead of dict in extract_argument_specs signature","description":"The arg_types parameter in extract_argument_specs() is typed as dict[str, pa.DataType]. Using Mapping[str, pa.DataType] from collections.abc would be more flexible, accepting any mapping type.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-05T11:51:21.021496-05:00","created_by":"rusty","updated_at":"2026-01-05T12:03:51.771301-05:00","closed_at":"2026-01-05T12:03:51.771301-05:00","close_reason":"Closed"}
{"id":"vgi-python-kz4","title":"Rename TableInOutGeneratorFunction to TableInOutGenerator for consistency","description":"Naming inconsistency: TableFunctionGenerator uses *Generator suffix, but TableInOutGeneratorFunction uses *GeneratorFunction suffix. Rename TableInOutGeneratorFunction to TableInOutGenerator for consistency. Also consider renaming ScalarFunctionGenerator if needed.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:41.581028-05:00","created_by":"rusty","updated_at":"2026-01-04T21:43:58.141038-05:00","closed_at":"2026-01-04T21:43:58.141038-05:00","close_reason":"PR #7 created: https://github.com/Query-farm/vgi-python/pull/7"}
{"id":"vgi-python-l1u","title":"Consider custom __repr__ for ArgumentSpec","description":"The default dataclass __repr__ includes the full Arrow type repr which can be verbose. Consider a custom __repr__ that's more concise for debugging, e.g., 'ArgumentSpec(name=\"count\", pos=0, type=int64)' instead of showing the full pa.DataType object.","status":"open","priority":4,"issue_type":"task","created_at":"2026-01-05T11:51:21.415976-05:00","created_by":"rusty","updated_at":"2026-01-05T11:51:21.415976-05:00"}
{"id":"vgi-python-l1u","title":"Consider custom __repr__ for ArgumentSpec","description":"The default dataclass __repr__ includes the full Arrow type repr which can be verbose. Consider a custom __repr__ that's more concise for debugging, e.g., 'ArgumentSpec(name=\"count\", pos=0, type=int64)' instead of showing the full pa.DataType object.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-05T11:51:21.415976-05:00","created_by":"rusty","updated_at":"2026-01-05T12:15:02.029743-05:00","closed_at":"2026-01-05T12:15:02.029743-05:00","close_reason":"Closed"}
{"id":"vgi-python-lec","title":"Add test coverage for testing.py helper edge cases","notes":"Coverage: 89% in vgi/testing.py. Missing tests for:\n- Lines 421-422, 450-451: StopIteration handling in _process_batch\n- Lines 468-472: FINISHED status during data phase\n- Lines 485-486, 502-503: _finalize edge cases\n\nLow priority since these are test helpers.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-04T22:15:34.006563-05:00","created_by":"rusty","updated_at":"2026-01-05T12:07:48.026786-05:00","closed_at":"2026-01-05T12:07:48.026786-05:00","close_reason":"Closed"}
{"id":"vgi-python-lzc","title":"Extract duplicated sort_key function in argument_spec","description":"The sort_key function is duplicated at lines 139-142 and 309-312 in argument_spec.py. Extract it to a module-level function to follow DRY principles.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-05T11:51:19.141041-05:00","created_by":"rusty","updated_at":"2026-01-05T11:55:22.535816-05:00","closed_at":"2026-01-05T11:55:22.535816-05:00","close_reason":"Closed"}
{"id":"vgi-python-m45","title":"Create tests/test_argument_spec.py","description":"## Overview\n\nCreate comprehensive tests for the argument specification serialization module.\n\n## File Location\n\n`tests/test_argument_spec.py`\n\n## Test Classes and Cases\n\n### TestArgumentSpecToSchema\n\nTest converting ArgumentSpec objects to Arrow schema.\n\n#### test_positional_arguments_preserve_order\n- Create specs with positions 0, 1, 2\n- Convert to schema\n- Verify field order matches position order\n- Verify field types are preserved\n\n#### test_named_arguments_have_metadata\n- Create spec with position='key' (named)\n- Convert to schema\n- Verify field has `vgi_arg=named` metadata\n\n#### test_mixed_positional_and_named\n- Create mix of positional (0, 1) and named ('format', 'verbose') specs\n- Convert to schema\n- Verify positional come first, then named\n- Verify named have correct metadata\n\n#### test_table_input_uses_null_type\n- Create spec with is_table_input=True\n- Convert to schema\n- Verify field type is pa.null()\n- Verify field has `vgi_type=table` metadata\n\n#### test_any_type_uses_null_type\n- Create spec with is_any_type=True\n- Convert to schema\n- Verify field type is pa.null()\n- Verify field has `vgi_type=any` metadata\n\n#### test_varargs_has_metadata\n- Create spec with is_varargs=True and arrow_type=pa.int64()\n- Convert to schema\n- Verify field type is pa.int64() (element type preserved)\n- Verify field has `vgi_varargs=true` metadata\n\n### TestSchemaToArgumentSpecs\n\nTest converting Arrow schema back to ArgumentSpec objects.\n\n#### test_positional_arguments_from_schema\n- Create schema with 3 fields (no metadata)\n- Convert to specs\n- Verify positions are 0, 1, 2\n\n#### test_named_arguments_from_metadata\n- Create schema with `vgi_arg=named` metadata on fields\n- Convert to specs\n- Verify position is field name string\n\n#### test_table_input_detected\n- Create schema with `vgi_type=table` metadata\n- Convert to specs\n- Verify is_table_input=True\n\n#### test_any_type_detected\n- Create schema with `vgi_type=any` metadata\n- Convert to specs\n- Verify is_any_type=True\n\n#### test_varargs_detected\n- Create schema with `vgi_varargs=true` metadata\n- Convert to specs\n- Verify is_varargs=True\n\n### TestRoundTrip\n\nTest that specs survive serialization round-trip.\n\n#### test_complex_arrow_types_preserved\nTest each of these types round-trips correctly:\n- pa.int64(), pa.float32(), pa.utf8()\n- pa.list_(pa.float64())\n- pa.struct([pa.field('a', pa.int32()), pa.field('b', pa.string())])\n- pa.map_(pa.string(), pa.int64())\n- pa.decimal128(10, 2)\n- pa.timestamp('us', tz='UTC')\n\n#### test_full_function_signature_roundtrip\n- Create specs matching a realistic function:\n - count: int, position 0\n - data: TableInput, position 1\n - extra: float varargs, position 2\n - format: str, named 'format'\n- Convert to schema, serialize to bytes, deserialize, convert back to specs\n- Verify all specs match original\n\n### TestExtractArgumentSpecs\n\nTest extracting specs from function classes.\n\n#### test_extract_from_simple_function\n- Define function class with Arg descriptors\n- Call extract_argument_specs with arg_types dict\n- Verify specs match descriptors\n\n#### test_extract_table_input\n- Define function with Arg[TableInput]\n- Extract specs\n- Verify is_table_input=True\n\n#### test_extract_any_arrow\n- Define function with Arg[AnyArrow]\n- Extract specs\n- Verify is_any_type=True\n\n#### test_extract_varargs\n- Define function with Arg[int](2, varargs=True)\n- Extract specs\n- Verify is_varargs=True\n\n### TestEdgeCases\n\n#### test_empty_schema\n- Convert empty list of specs to schema\n- Verify empty schema works\n- Convert back, verify empty list\n\n#### test_only_named_arguments\n- Create specs with only named arguments (no positional)\n- Round-trip and verify\n\n#### test_only_positional_arguments\n- Create specs with only positional arguments (no named)\n- Round-trip and verify\n\n## Test Utilities\n\nConsider creating fixtures for common patterns:\n- `make_spec()` helper for creating ArgumentSpec\n- Sample function classes for extraction tests","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-05T11:18:53.312911-05:00","created_by":"rusty","updated_at":"2026-01-05T11:32:35.580879-05:00","closed_at":"2026-01-05T11:32:35.580879-05:00","close_reason":"Created comprehensive tests with 43 passing test cases","dependencies":[{"issue_id":"vgi-python-m45","depends_on_id":"vgi-python-cd0","type":"blocks","created_at":"2026-01-05T11:19:30.779207-05:00","created_by":"rusty"}]}
Expand Down
11 changes: 11 additions & 0 deletions vgi/argument_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,17 @@ class ArgumentSpec:
is_varargs: True if this argument collects all remaining positional
arguments (varargs=True).

Note:
For named arguments, the Python attribute name (``name``) and the SQL
key (``position``) are assumed to be identical. This is the standard
convention::

format = Arg[str]("format") # name="format", position="format"

If they differ, the ``position`` value will be lost during schema
round-trip serialization, as only ``name`` is stored in the Arrow
schema field name.

"""

name: str
Expand Down
13 changes: 13 additions & 0 deletions vgi/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,19 @@ def transform(self, batch):
# Validation happens automatically on first access
...

Note:
For named arguments (string position), the Python attribute name should
match the SQL key. This is the standard convention::

format = Arg[str]("format") # Recommended: attribute == key

Avoid using different names::

output_format = Arg[str]("format") # Not recommended

While this works at runtime, it can cause issues with metadata
serialization where only one name is preserved.

"""

__slots__ = (
Expand Down
Loading