It is necessary to decide on a number of parameters before running the Graph Matching Attack. The choice of parameter values can significantly impact attack performance, both in terms of attack duration and success rate.
We have tried to come up with reasonable defaults that proved useful in our experiments. However, your specific experiment might work better with other values.
The tables below describe the individual parameters, along with their default values and references to further information.
The run-Method in main.py expects four dictionaries as arguments that specify
the parameters for different stages of the attack.
main.py (Line 667 onwards) as well as the benchmarking scripts already contain the required
dictionaries, which are filled with default values. You may edit the values freely.
Argument Name: GLOBAL_CONFIG
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| Data | Dataset to run the attack on. | "./data/titanic_full.tsv" | |
| Overlap | The share of overlapping records between the attacker's and the victim's data. Must be >0 and <=1. | 1 | Chapter 5 |
| DropFrom | Which dataset should records be dropped from to achieve the desired overlap? One of "Eve" (Attacker), "Alice" (Victim) or "Both". | "Alice" | Chapter 5 |
| DevMode | If True, similarity graphs (edgelists) and serialized embedding models are stored in the ./dev/ directory. |
False | |
| BenchMode | If True, performance metrics and parameter details are stored as tab-separated values in ./data/benchmark.tsv |
False | |
| Verbose | If True, prints detailed status messages | True | |
| MatchingMetric | Similarity metric to be computed on aligned embeddings during bipartite graph matching. Must be available in scikit learn. | "cosine" | Chapter 4.3 |
| Matching | Matching algorithm for bipartite graph matching. Must be "MinWeight", "Stable", "Symmetric" or "NearestNeighbor". | "MinWeight" | Chapter 4.3 |
| Workers | Number of cores used in multiprocessing. If >= 1 defines the number of cores. If -1, all available cores are used. If < -1, all available cores except (value-1) ones are used. | -1 | |
| StoreAliceEncs | Stores a pickled dictionary containing UIDs as keys and encodings as values in ./data/encoded/ for Alice's (victim) dataset. |
False | |
| StoreEveEncs | Stores a pickled dictionary containing UIDs as keys and encodings as values in ./data/encoded/ for Eve's (attacker) dataset. |
False |
Argument Name: ENC_CONFIG
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceAlgo | Algorithm used for encoding Alice's data. One of "BloomFilter", "TabMinHash", "TwoStepHash" or None (No Encoding). | "TwoStepHash" | Appendix A |
| AliceSecret | Secret (seed for hash function selection/salt) used when encoding Alice's data. Can be String or Integer. | "SuperSecretSalt1337" | Appendix A |
| AliceN | Size of N-grams used for encoding Alice's data. | 2 | |
| AliceMetric | Similarity metric to be computed during similarity graph generation on Alice's data | "dice" | Chapter 3.1 |
| EveAlgo | Algorithm used for encoding Eve's data. One of "BloomFilter", "TabMinHash", "TwoStepHash" or None (No Encoding). | None | Appendix A |
| EveSecret | Secret (seed for hash function selection/salt) used when encoding Eve's data. Can be String or Integer. | "ATotallyDifferentString42" | Appendix A |
| EveN | Size of N-grams used for encoding Eve's data. | 2 | |
| EveMetric | Similarity metric to be computed during similarity graph generation on Eve's data | "dice" | Chapter 3.1 |
Additional Parameters for Bloom Filter Encoding
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceBFLength | Length of the Bloom Filters created for Alice's data. Must be a power of 2. | 1024 | Appendix A |
| AliceBits | Number of hash functions to populate the Bloom Filter, i.e. bits per N-Gram | 10 | Appendix A |
| AliceDiffuse | If True, adds diffusion layer to Bloom Filter encoding | False | Armknecht et al. |
| AliceT | Diffusion parameter t, i.e. number of bit positions in Alice's encodings to be XORed when creating ELDs. | 10 | Armknecht et al. |
| AliceEldLength | Length of the ELD, i.e. BF with applied diffusion, for Alice's encodings. | 1024 | Armknecht et al. |
| EveBFLength | Length of the Bloom Filters created for Eve's data. Must be a power of 2. | 1024 | Appendix A |
| EveBits | Number of hash functions to populate the Bloom Filter, i.e. bits per N-Gram | 10 | Appendix A |
| EveDiffuse | If True, adds diffusion layer to Bloom Filter encoding | False | Armknecht et al. |
| EveT | Diffusion parameter t, i.e. number of bit positions in Eve's encodings to be XORed when creating ELDs. | 10 | Armknecht et al. |
| EveEldLength | Length of the ELD, i.e. BF with applied diffusion, for Eve's encodings. | 1024 | Armknecht et al. |
Additional Parameters Tabulation MinHash Encoding
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceNHash | Number of (tabulation-based) hash functions to use during MinHashing of Alice's data. | 1024 | Smith |
| AliceNHashBits | Number of bits to be generated per hash function during MinHashing of Alice's data. Must be 8, 16, 32 or 64. | 64 | Smith |
| AliceNSubKeys | Number of sub-keys to be generated from the initial 64-bit hash during MinHashing of Alice's data. Must be a divisor of 64. | 8 | Smith |
| Alice1BitHash | If True, applies LSB hashing, i.e. returns only the least significant bit of the MinHash results. | True | Smith |
| EveNHash | Number of (tabulation-based) hash functions to use during MinHashing of Eve's data. | 1024 | Smith |
| EveNHashBits | Number of bits to be generated per hash function during MinHashing of Eve's data. Must be 8, 16, 32 or 64. | 64 | Smith |
| EveNSubKeys | Number of sub-keys to be generated from the initial 64-bit hash during MinHashing of Eve's data. Must be a divisor of 64. | 8 | Smith |
| Eve1BitHash | If True, applies LSB hashing, i.e. returns only the least significant bit of the MinHash results. | True | Smith |
Additional Parameters for Two-Step-Hash Encoding
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceNHashFunc | Number of hash function, i.e. number of bits per N-gram, to use when populating the intermediate BFs on Alice's data | 10 | Ranbaduge et al. |
| AliceNHashCol | Number of columns (length) of intermediate BFs computed on Alice's data | 1000 | Ranbaduge et al. |
| AliceRandMode | Algorithm to be used for column-wise hashing of Alice's intermediate encodings. Either "PNG" for a pseudo-random number generator or "SHA" for SHA-256. | "PNG" | Ranbaduge et al. |
| EveNHashFunc | Number of hash function, i.e. number of bits per N-gram, to use when populating the intermediate BFs on Eve's data | 10 | Ranbaduge et al. |
| EveNHashCol | Number of columns (length) of intermediate BFs computed on Eve's data | 1000 | Ranbaduge et al. |
| EveRandMode | Algorithm to be used for column-wise hashing of Eve's intermediate encodings. Either "PNG" for a pseudo-random number generator or "SHA" for SHA-256. | "PNG" | Ranbaduge et al. |
Argument Name: EMB_CONFIG
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceAlgo | Algorithm to use for embedding Alice's data. Must be "Node2Vec" or "NetMF". | "Node2Vec" | |
| AliceQuantile | Drop edges with AliceQuantile lowest edgeweights in Alice's similarity graph. Must be >=0 (keep all) and < 1. | 0.9 | Chapter 5 |
| AliceDiscretize | If True, sets all edgeweights in Alice's similarity graph to 1. | False | |
| AliceDim | Dimensionality of Alice's embeddings. | 128 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| AliceContext | Context size to use when calculating Alice's embeddings | 10 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| AliceNegative | Number of negative samples during training of Alice's embeddings (NetMF only). | 1 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| Alice Normalize | If True, normalize Alice's embeddings (NetMF only). | False | Qiu et al. (NetMF) |
| EveAlgo | Algorithm to use for embedding Eve's data. Must be "Node2Vec" or "NetMF". | "Node2Vec" | |
| EveQuantile | Drop edges with EveQuantile lowest edgeweights in Eve's similarity graph. Must be >=0 (keep all) and < 1. | 0.9 | Chapter 5 |
| EveDiscretize | If True, sets all edgeweights in Eve's similarity graph to 1. | False | |
| EveDim | Dimensionality of Eve's embeddings. | 128 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| EveContext | Context size to use when calculating Eve's embeddings | 10 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| EveNegative | Number of negative samples during training of Eve's embeddings (NetMF only). | 1 | Grover and Leskovec (Node2Vec) Qiu et al. (NetMF) |
| Eve Normalize | If True, normalize Eve's embeddings (NetMF only). | False | Qiu et al. |
Additional Parameters for Node2Vec Embedding
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| AliceWalkLen | Length of the random walks performed on Alice's similarity graph | 100 | Grover and Leskovec |
| AliceNWalks | Number of random walks performed per node in Alice's similarity graph | 20 | Grover and Leskovec |
| AliceP | "Return Parameter" for governing random walks performed on Alice's similarity graph | 250 | Grover and Leskovec |
| AliceQ | "In-Out-Parameter" for governing random walks performed on Alice's similarity graph | 300 | Grover and Leskovec |
| AliceEpochs | Number of epochs Alice's embeddings are trained for | 5 | Grover and Leskovec |
| AliceSeed | Seed to initialize the (pseudo) random number generators used for calculating Alice's embeddings | 42 | |
| EveWalkLen | Length of the random walks performed on Eve's similarity graph | 100 | Grover and Leskovec |
| EveNWalks | Number of random walks performed per node in Eve's similarity graph | 20 | Grover and Leskovec |
| EveP | "Return Parameter" for governing random walks performed on Eve's similarity graph | 250 | Grover and Leskovec |
| EveQ | "In-Out-Parameter" for governing random walks performed on Eve's similarity graph | 300 | Grover and Leskovec |
| EveEpochs | Number of epochs Eve's embeddings are trained for | 5 | Grover and Leskovec |
| EveSeed | Seed to initialize the (pseudo) random number generators used for calculating Eve's embeddings | 42 |
Argument Name: ALIGN_CONFIG
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| RegWS | Regularization Parameter for Sinkhorn solver | max(0.1, Overlap/2) | Grave et al. |
| RegInit | Regularization Parameter for convex initialization | 1 | Grave et al. |
| Batchsize | Batchsize for Wasserstein Procrustes. If <=1 interpreted as share of overall data, if >1 interpreted as number of examples | 1 | Grave et al. |
| LR | Learning rate for optimization | 200 | Grave et al. |
| LRDecay | Learning rate decay. LR is multiplied by this factor after every epoch. | 1 | |
| NIterInit | Number of iterations during convex initialization | 5 | Grave et al. |
| NIterWS | Number of iterations per epoch of optimization | 100 | Grave et al. |
| NEpochWS | Number of optimization epochs | 100 | Grave et al. |
| Sqrt | If True, compute alignment on the square root of embeddings | True | Grave et al. |
| EarlyStopping | Terminate optimization if loss hasn't improved for this many epochs | 10 | Chapter 5 |
| Selection | Algorithm to select the records used for alignment. If "GroundTruth", then matrices are constructed using ground truth, i.e. embeddings representing the same records are stored in the same rows. If "Random", random subsampling is performed. If None, a random permutation of all available records are used. | None | |
| MaxLoad | Specifies the number of elements to use for alignment if Selection is not None. Must be smaller than the number of records in the smaller dataset. | None | |
| Wasserstein | If True, uses unsupervised Wasserstein Procrustes for alignment. If False, uses supervised closed-form Procrustes. | True |
Argument Name: BLOCKING_CONFIG
Blocking is only performed in the re-implementation of Vidanage et al.'s attack.
Thus, the BLOCKING_CONFIG parameter is not present otherwise
| Parameter Name | Description | Default | Reference |
|---|---|---|---|
| Disable | If true, then the blocking step is skipped | False | |
| PlainSampleSize | The length of the min-hash bands. | 4 | Broder |
| PlainNumSamples | The number of LSH bands. | 50 | Broder |
| AliceRandomSeed | Random Seed used for blocking Alice's data | 42 | |
| EveRandomSeed | Random Seed used for blocking Eve's data | 17 |