Skip to content

Fix inconsistent index scan behavior due to char signedness on different architectures.#29

Open
MasahikoSawada wants to merge 1 commit intopgbigm:masterfrom
MasahikoSawada:signedness
Open

Fix inconsistent index scan behavior due to char signedness on different architectures.#29
MasahikoSawada wants to merge 1 commit intopgbigm:masterfrom
MasahikoSawada:signedness

Conversation

@MasahikoSawada
Copy link
Contributor

GIN indexes utilizing pg_bigm's opclasses store sorted bigrams within index tuples. When comparing and sorting each bigram, pg_bigm treats each character as a 'char[3]' type in C. However, the char type in C can be interpreted as either signed char or unsigned char, depending on the platform, if the signedness is not explicitly specified. Consequently, as reported in issue #15, during replication between different CPU architectures, there was an issue where index scans on standby servers could not locate matching index tuples due to the differing treatment of character signedness. This problem can also happen in cases where the database cluster was initialized on one platform and later migrated to a different platform.

This change introduces comparison functions for bigm that explicitly handle signed char and unsigned char. The appropriate comparison function will be dynamically selected based on the character signedness stored in the control file. Therefore, upgraded clusters can utilize the indexes without rebuilding, provided the cluster upgrade occurs on platforms with the same character signedness as the original cluster initialization.

This commit is inspired by pg_trgm's change: dfd8e6c.

Issue #15

…ent architectures.

GIN indexes utilizing pg_bigm's opclasses store sorted bigrams within
index tuples. When comparing and sorting each bigram, pg_bigm treats
each character as a 'char[3]' type in C. However, the char type in C
can be interpreted as either signed char or unsigned char, depending
on the platform, if the signedness is not explicitly
specified. Consequently, as reported in issue pgbigm#15, during replication
between different CPU architectures, there was an issue where index
scans on standby servers could not locate matching index tuples due to
the differing treatment of character signedness. This problem can also
happen in cases where the database cluster was initialized on one
platform and later migrated to a different platform.

This change introduces comparison functions for bigm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

This commit is inspired by pg_trgm's change: dfd8e6c.

Issue pgbigm#15

Reviewed-by:
@MasahikoSawada
Copy link
Contributor Author

One concern about this approach is that we are no longer able to inline bigmstrcmp(), leading performance regressions, but I don't have better ideas how to fix this issue. If we see performance regressions we can provide a macro, say DISABLE_CMPBIGM_RUNTIME_SELECTION, to use the current behavior for users who are sure that databases are never migrated or replicated to another architecture platform.

@MasaoFujii
Copy link
Member

@MasahikoSawada

One concern about this approach is that we are no longer able to inline bigmstrcmp(), leading performance regressions, but I don't have better ideas how to fix this issue.

Just idea would be to keep the current pg_bigm opclass as-is and introduce a new opclass that uses CMPBIGM selected based on the platform. Users who need pg_bigm indexes to work across different architectures could use the new opclass, while others could continue using the existing one. However, this approach may be more complicated and potentially confusing for users.

If we see performance regressions we can provide a macro

Are you planning to measure the performance impact with and without this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants