[SYNPY-1749]Allow quote, apostrophe and ellipsis in store_row_async by danlu1 · Pull Request #1316 · Sage-Bionetworks/synapsePythonClient

danlu1 · 2026-02-09T20:01:40Z

Problem:

A JSON serialization issue occurs when a DataFrame passed to store_row_async contains a list or dictionary with strings that include both double quotes and apostrophes.

Solution:

Add default value "escapechar": "\\" to to_csv_kwargs in store_rows_async
Add to_csv_kwargs to _stream_and_update_from_df so it can take the passed to_csv_kwargs values for downstream data processing.

Testing:

Unit test and integration test have been added.

andrewelamb · 2026-02-09T22:44:41Z

@danlu1 Is this still WIP, or are you looking for reviews?

danlu1 · 2026-02-10T00:47:02Z

@andrewelamb sorry I should have marked this a draft.

…ctly when upload data from a dataframe

…ger output json string

danlu1 · 2026-02-18T18:40:24Z

The integration test failures are in the recordset and submission modules and do not appear to be related to my changes.

linglp

I think overall it looks good. The tests can be consolidated a bit to test all the edge cases in fewer integration tests to improve performance, and the docstring can be updated to reflect the new state of the code since json.dumps() was removed. There's also some logic that can be simplified in the redundant checks where sample_values is created but never actually used. The function name could also be more descriptive of what it actually does now (sanitizing special values rather than just converting dtypes)

BryanFauble

This is looking great, once we get the last few items handled (and develop merged in), I can approve!

BryanFauble

Nice work on this fix -- the backslash-escaping approach for embedded quotes makes sense, and the Ellipsis/pd.NA handling is solid. I flagged one issue that I think needs to be addressed before merge, plus a few cleanup nits.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

…v_kwargs

linglp

Hi @danlu1 ! Thanks for your hard work. I think I found two bugs after a few rounds of testing myself:

Bug 1: Nested np.nan not handled

df = pd.DataFrame({"val": [[1.0, np.nan], [2.0, 3.0]]})

The nested np.nan would currently pass through to JSON serialization, which could cause issues since np.nan is not valid JSON.

This happened becauseconvert_dtypes() only converts top-level np.nan to pd.NA but nested np.nan would remain unchanged. And even though in your code, you have:

if obj is pd.NA:

But this won't handle np.nan. The fix would just be:

def _reformat_special_values(obj):
    if pd.isna(obj):      # Catches pd.NA, np.nan, and None
        return None

Bug 2: Top-level missing values not handled

def _serialize_json_value(x):
    if isinstance(x, (list, dict)):
        # pd.NA handling is only here, inside _reformat_special_values
        ...
    if x is ...:
        return "..."
    return x  # <-- Top-level pd.NA, np.nan, None just pass through

Top-level pd.NA (or np.nan) isn't converted to None. It only works now because .replace({pd.NA: None}) is called later.

BryanFauble

This is great progress on a real user-reported issue, nice work on the recursive handling of Ellipsis and pd.NA in nested structures! There's one item that needs to be fixed before we can merge.

Note: This comment was drafted with AI assistance and reviewed by me for accuracy.

…uote-apostrophe-in-store-rows

…args Replace the mutable dict default with None sentinel and merge user overrides on top of {"escapechar": "\\"} so callers who pass their own to_csv_kwargs still get the escapechar fix. Update the docstring to describe the merge behavior.

Copilot

Pull request overview

This PR addresses JSON/CSV serialization failures when uploading DataFrames via store_rows_async, specifically when cells contain JSON-like dict/list values with embedded quotes/apostrophes and when Ellipsis/pd.NA values appear in nested structures.

Changes:

Adds a default escapechar ("\\ ") to to_csv_kwargs in store_rows_async.
Threads to_csv_kwargs through the DataFrame streaming/upload path so the same CSV settings are used downstream.
Introduces convert_dtypes_to_json_serializable and adds unit + integration coverage around quotes/apostrophes/ellipsis handling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`synapseclient/models/mixins/table_components.py`	Adds default `escapechar`, threads `to_csv_kwargs`, and adds `convert_dtypes_to_json_serializable` used before chunked DF upload.
`synapseclient/core/upload/multipart_upload_async.py`	Updates docstring to describe `to_csv_kwargs` usage for DataFrame multipart uploads.
`tests/unit/synapseclient/mixins/unit_test_table_components.py`	Adds unit tests for `convert_dtypes_to_json_serializable`.
`tests/integration/synapseclient/models/async/test_table_async.py`	Adds an integration test covering quotes/apostrophes/ellipsis round-tripping through table storage/query.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ializable - Use pd.isna() instead of `is pd.NA` to catch nested np.nan (not valid JSON) - Handle top-level pd.NA/np.nan/None in _serialize_json_value - Remove redundant dropna()+len() guard so all-NA columns still get converted to object dtype with None values

- Update convert_dtypes_to_json_serializable docstring to describe actual behavior (recursive Ellipsis/pd.NA/np.nan/None cleanup, dtype cast for all non-ROW_* columns, in-place mutation) - Use DEFAULT_ESCAPSE_CHAR constant instead of hardcoded "\\" in store_rows_async to_csv_kwargs default - Rename test_no_conversion_when_no_na_in_column -> test_int_and_float_columns_converted_to_object and fix docstring to match the assertion (columns are always cast to object) - Rename test_none_in_list_serialized_to_empty_list -> test_none_in_list_column_remains_none and fix docstring to match the actual None (not []) behavior - In test_mixed_column_types_no_conversion_needed, snapshot df with copy(deep=True) before the in-place call and assert against the snapshot so the test can detect unintended mutations

BryanFauble

The last round of changes with my final patches looked good to me.

@danlu1 is going to give this a final review as well before we merge in.

danlu1

lgtm

…e-rows merge upstream changes

reformat script

3feba11

danlu1 requested a review from a team as a code owner February 9, 2026 20:01

danlu1 added 3 commits February 9, 2026 16:23

reorganize code to ensure row columns remain int

1c68dac

add unit test for convert_dtypes_to_json_serializable

4a29a16

correct unit for datetime64

3ecb6ec

danlu1 marked this pull request as draft February 10, 2026 00:48

danlu1 added 8 commits February 9, 2026 18:08

remove the unwanted code

af989c0

revert changes in test_csv_to_pandas_df_with_date_columns

4d06d3a

update doctrings

e1b20dc

add integration test for store_rows

7ef7110

add to_csv kwargs to ensure double quote and apostophe formated corre…

a4913a6

…ctly when upload data from a dataframe

remove json string dumps function to let synapse decode data directly

98689d3

update unit test since the convert_dtypes_to_json_serializable no lon…

a0af1b6

…ger output json string

update integration test as no json string need to be generated

5002bd6

danlu1 marked this pull request as ready for review February 18, 2026 18:36

linglp requested changes Feb 20, 2026

View reviewed changes

danlu1 added 2 commits February 23, 2026 16:09

remvoe unwanted code

c874fe4

simplify test cases

dab80f0

danlu1 requested a review from linglp February 24, 2026 00:15

linglp reviewed Feb 25, 2026

View reviewed changes

Comment thread tests/integration/synapseclient/models/synchronous/test_table.py Outdated

linglp reviewed Feb 25, 2026

View reviewed changes

Comment thread synapseclient/models/mixins/table_components.py Outdated

linglp reviewed Feb 25, 2026

View reviewed changes

Comment thread synapseclient/models/mixins/table_components.py Outdated

BryanFauble reviewed Feb 26, 2026

View reviewed changes

Comment thread synapseclient/core/upload/upload_utils.py Outdated

BryanFauble requested changes Feb 26, 2026

View reviewed changes

BryanFauble requested changes Mar 5, 2026

View reviewed changes

danlu1 added 2 commits March 9, 2026 10:04

merge develop branch changes

8644201

add to_csv_kwargs to store_rows function for pandas dataframe

3412534

danlu1 requested review from BryanFauble and linglp March 9, 2026 19:03

danlu1 added 3 commits March 9, 2026 12:26

add default to_csv_kwargs for store_row_async

7db7c85

set escapechar default value in store_rows_async

8a75043

add notes to ensure escapechar is set correctly if using custom to_cs…

d00b30b

…v_kwargs

linglp requested changes Mar 13, 2026

View reviewed changes

Comment thread synapseclient/models/mixins/table_components.py Outdated

Comment thread synapseclient/models/mixins/table_components.py Outdated

Comment thread synapseclient/models/mixins/table_components.py

Comment thread synapseclient/models/mixins/table_components.py Outdated

BryanFauble requested changes Mar 24, 2026

View reviewed changes

Comment thread synapseclient/models/mixins/table_components.py Outdated

Comment thread synapseclient/models/mixins/table_components.py Outdated

BryanFauble added 2 commits May 6, 2026 23:17

Merge remote-tracking branch 'origin/develop' into SYNPY-1749-allow-q…

957b03b

…uote-apostrophe-in-store-rows

Copilot AI review requested due to automatic review settings May 6, 2026 23:24

Copilot started reviewing on behalf of BryanFauble May 6, 2026 23:25 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

BryanFauble added 2 commits May 6, 2026 23:33

BryanFauble approved these changes May 6, 2026

View reviewed changes

danlu1 commented May 7, 2026

View reviewed changes

Merge branch 'develop' into SYNPY-1749-allow-quote-apostrophe-in-stor…

488fffc

…e-rows merge upstream changes

Conversation

danlu1 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Testing:

Uh oh!

andrewelamb commented Feb 9, 2026

Uh oh!

danlu1 commented Feb 10, 2026

Uh oh!

danlu1 commented Feb 18, 2026

Uh oh!

linglp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linglp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

danlu1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danlu1 commented Feb 9, 2026 •

edited

Loading

linglp left a comment •

edited

Loading