Feature/chamap by Watthana65022128 · Pull Request #4 · iots1/migration-toolkit

Watthana65022128 · 2026-02-04T09:26:19Z

REPLACE SQL Query
Connection String
charmap error

…oint JSON

… and charset options

Copilot

Pull request overview

This PR addresses charset encoding issues (specifically "charmap error") in the migration tool by adding support for different character encodings and cleaning problematic characters at the source for SQL Server databases.

Changes:

Modified SQL query generation to apply REPLACE operations for non-breaking spaces (CHAR(160)) and null bytes (CHAR(0)) when reading from SQL Server
Added dynamic charset selection UI based on source database type (MySQL: tis620, SQL Server: cp874, PostgreSQL: utf8)
Updated SQL Server connection string to support legacy Thai databases with TDS version 7.0 for better compatibility

Reviewed changes

Copilot reviewed 2 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
views/migration_engine.py	Added REPLACE operations in SELECT queries for SQL Server and charset selection UI with database-specific options
services/db_connector.py	Added TDS version parameter for SQL Server legacy charset support
views/pycache/migration_engine.cpython-313.pyc	Compiled Python bytecode (should not be in version control)
services/pycache/transformers.cpython-313.pyc	Compiled Python bytecode (should not be in version control)
services/pycache/ml_mapper.cpython-313.pyc	Compiled Python bytecode (should not be in version control)
services/pycache/db_connector.cpython-313.pyc	Compiled Python bytecode (should not be in version control)
pycache/database.cpython-313.pyc	Compiled Python bytecode (should not be in version control)
pycache/config.cpython-313.pyc	Compiled Python bytecode (should not be in version control)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T09:40:51Z

views/migration_engine.py

+                # REPLACE(col, CHAR(0), '') -> remove null bytes
+                col_expr = f'REPLACE(REPLACE({col_expr}, CHAR(160), \' \'), CHAR(0), \'\')'
+
+                selected_cols.append(f'{col_expr} AS "{source_col}"')


SQL injection vulnerability in the AS clause. The source column name is used directly in quotes without sanitization. If a malicious column name contains double quotes or other SQL special characters, this could be exploited. Consider using SQLAlchemy's identifier quoting mechanism or validating/sanitizing column names.

Copilot · 2026-02-04T09:40:52Z

services/db_connector.py

+            query_params = {"charset": mssql_charset}
+
+            # For legacy Thai databases, add TDS version for better compatibility
+            if charset in ['tis620', 'cp874', 'latin1']:


The condition checks if 'charset' is in the list, but 'charset' could be None when default charsets are used (lines 59). This means the TDS version will never be set when using default charset, even if it might be beneficial for older SQL Server versions. Consider checking the actual value of 'mssql_charset' instead of 'charset', or explicitly handle the None case.

Suggested change

if charset in ['tis620', 'cp874', 'latin1']:

if mssql_charset in ['tis620', 'cp874', 'latin1']:

Copilot · 2026-02-04T09:40:52Z

views/migration_engine.py

    """
    Generate a SELECT query based on configuration.
    Applies TRIM at source for MSSQL CHAR columns to prevent padding.
+    For SQL Server: Also cleans non-breaking spaces and control characters at source.


The comment mentions cleaning control characters at source, but the implementation only removes CHAR(0) (null bytes) and CHAR(160) (non-breaking space). Other control characters (CHAR(1)-CHAR(31) excluding newlines/tabs) are not handled. Consider if additional control character cleaning is needed or if the comment should be updated to accurately reflect what is cleaned.

Suggested change

For SQL Server: Also cleans non-breaking spaces and control characters at source.

For SQL Server: Also replaces non-breaking spaces (CHAR(160)) with regular spaces and removes null bytes (CHAR(0)) at source.

Copilot · 2026-02-04T09:40:52Z

views/migration_engine.py

+            else:
+                charset_options = [
+                    "utf8 (Default)",
+                    "latin1 (Raw Bytes)"
+                ]
+                help_text = "เลือก charset ตามฐานข้อมูลต้นทาง"


When no source profile is selected (src_db_type is None), the charset selection defaults to "utf8 (Default)" and "latin1 (Raw Bytes)" options. However, the charset_map on line 352 maps "utf8 (Default)" to None, which might not be appropriate for all database types. When a user later selects a source profile (e.g., MySQL), they won't see MySQL-specific options unless they change the selection again. Consider disabling charset selection until a source profile is selected or updating the options dynamically.

Copilot · 2026-02-04T09:40:52Z

views/migration_engine.py

+
+                # Apply TRIM if specified in transformers
+                if 'TRIM' in mapping.get('transformers', []):
+                    col_expr = f'TRIM({col_expr})'
+
+                # Clean non-breaking spaces and problematic characters for VARCHAR/NVARCHAR/TEXT columns
+                # REPLACE(col, CHAR(160), ' ') -> replace nbsp with regular space
+                # REPLACE(col, CHAR(0), '') -> remove null bytes
+                col_expr = f'REPLACE(REPLACE({col_expr}, CHAR(160), \' \'), CHAR(0), \'\')'


The REPLACE operations are unconditionally applied to all columns for SQL Server, regardless of the column's data type. This could cause issues with numeric, date, or binary columns where REPLACE operations on character data are not appropriate. Consider checking the column data type or restricting this transformation to text/varchar column types only.

Suggested change

# Apply TRIM if specified in transformers

if 'TRIM' in mapping.get('transformers', []):

col_expr = f'TRIM({col_expr})'

# Clean non-breaking spaces and problematic characters for VARCHAR/NVARCHAR/TEXT columns

# REPLACE(col, CHAR(160), ' ') -> replace nbsp with regular space

# REPLACE(col, CHAR(0), '') -> remove null bytes

col_expr = f'REPLACE(REPLACE({col_expr}, CHAR(160), \' \'), CHAR(0), \'\')'

# Determine if the source column is a text-like type where string REPLACE is appropriate.

# This relies on an optional 'source_type' field in the mapping (e.g. 'VARCHAR(50)', 'NVARCHAR', 'TEXT').

source_type_raw = str(mapping.get('source_type', '') or '')

source_type_upper = source_type_raw.upper()

text_type_prefixes = ('CHAR', 'NCHAR', 'VARCHAR', 'NVARCHAR')

is_text_column = (

any(source_type_upper.startswith(prefix) for prefix in text_type_prefixes)

or source_type_upper in ('TEXT', 'NTEXT')

)

# Apply TRIM if specified in transformers

if 'TRIM' in mapping.get('transformers', []):

col_expr = f'TRIM({col_expr})'

# Clean non-breaking spaces and problematic characters only for text-like columns.

# REPLACE(col, CHAR(160), ' ') -> replace nbsp with regular space

# REPLACE(col, CHAR(0), '') -> remove null bytes

if is_text_column:

col_expr = f'REPLACE(REPLACE({col_expr}, CHAR(160), \' \'), CHAR(0), \'\')'

Copilot · 2026-02-04T09:40:53Z

views/migration_engine.py

+
+            # Special handling for SQL Server text columns
+            if db_type == 'Microsoft SQL Server':
+                col_expr = f'"{source_col}"'


The column identifier uses double quotes directly from the source column name without sanitization. If the source column name contains malicious content (e.g., special characters like double quotes), this could lead to SQL injection. Consider using SQLAlchemy's identifier quoting mechanism or validating/sanitizing column names before using them in queries.

Copilot · 2026-02-04T09:40:53Z

views/migration_engine.py

+                # Other databases: just apply TRIM if needed
                selected_cols.append(f'TRIM("{source_col}") AS "{source_col}"')
            else:
                selected_cols.append(f'"{source_col}"')


Similar to line 72, this column identifier uses double quotes directly from the source column name without sanitization, which could lead to SQL injection if the source column name contains malicious content. Consider using SQLAlchemy's identifier quoting mechanism or validating/sanitizing column names.

Copilot · 2026-02-04T09:40:53Z

views/migration_engine.py

+                selected_cols.append(f'{col_expr} AS "{source_col}"')
+            elif 'TRIM' in mapping.get('transformers', []):
+                # Other databases: just apply TRIM if needed
                selected_cols.append(f'TRIM("{source_col}") AS "{source_col}"')


Similar SQL injection vulnerability as line 72. The column identifier uses double quotes directly from the source column name in both the TRIM function and the AS clause. Consider using SQLAlchemy's identifier quoting mechanism or validating/sanitizing column names.

Watthana65022128 added 3 commits February 3, 2026 13:51

chore: update compiled Python bytecode files in __pycache__ directories

4b6a621

chore: update compiled Python bytecode files and add migration checkp…

b9fabff

…oint JSON

refactor(migration_engine, db_connector): enhance SQL Server handling…

b8e276b

… and charset options

Watthana65022128 requested a review from Copilot February 4, 2026 09:26

Copilot started reviewing on behalf of Watthana65022128 February 4, 2026 09:26 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/chamap#4

Feature/chamap#4
Watthana65022128 wants to merge 3 commits intomainfrom
feature/chamap

Watthana65022128 commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if charset in ['tis620', 'cp874', 'latin1']:
	if mssql_charset in ['tis620', 'cp874', 'latin1']:

	For SQL Server: Also cleans non-breaking spaces and control characters at source.
	For SQL Server: Also replaces non-breaking spaces (CHAR(160)) with regular spaces and removes null bytes (CHAR(0)) at source.

Conversation

Watthana65022128 commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants