Skip to content

Latest commit

 

History

History
162 lines (139 loc) · 27.5 KB

File metadata and controls

162 lines (139 loc) · 27.5 KB

Parameters for Graph Matching Attack


It is necessary to decide on a number of parameters before running the Graph Matching Attack. The choice of parameter values can significantly impact attack performance, both in terms of attack duration and success rate.

We have tried to come up with reasonable defaults that proved useful in our experiments. However, your specific experiment might work better with other values.

The tables below describe the individual parameters, along with their default values and references to further information.

The run-Method in main.py expects four dictionaries as arguments that specify the parameters for different stages of the attack. main.py (Line 667 onwards) as well as the benchmarking scripts already contain the required dictionaries, which are filled with default values. You may edit the values freely.


Global Configuration

Argument Name: GLOBAL_CONFIG

Parameter Name Description Default Reference
Data Dataset to run the attack on. "./data/titanic_full.tsv"
Overlap The share of overlapping records between the attacker's and the victim's data. Must be >0 and <=1. 1 Chapter 5
DropFrom Which dataset should records be dropped from to achieve the desired overlap? One of "Eve" (Attacker), "Alice" (Victim) or "Both". "Alice" Chapter 5
DevMode If True, similarity graphs (edgelists) and serialized embedding models are stored in the ./dev/ directory. False
BenchMode If True, performance metrics and parameter details are stored as tab-separated values in ./data/benchmark.tsv False
Verbose If True, prints detailed status messages True
MatchingMetric Similarity metric to be computed on aligned embeddings during bipartite graph matching. Must be available in scikit learn. "cosine" Chapter 4.3
Matching Matching algorithm for bipartite graph matching. Must be "MinWeight", "Stable", "Symmetric" or "NearestNeighbor". "MinWeight" Chapter 4.3
Workers Number of cores used in multiprocessing. If >= 1 defines the number of cores. If -1, all available cores are used. If < -1, all available cores except (value-1) ones are used. -1
StoreAliceEncs Stores a pickled dictionary containing UIDs as keys and encodings as values in ./data/encoded/ for Alice's (victim) dataset. False
StoreEveEncs Stores a pickled dictionary containing UIDs as keys and encodings as values in ./data/encoded/ for Eve's (attacker) dataset. False

Encoding Configuration

Argument Name: ENC_CONFIG

Parameter Name Description Default Reference
AliceAlgo Algorithm used for encoding Alice's data. One of "BloomFilter", "TabMinHash", "TwoStepHash" or None (No Encoding). "TwoStepHash" Appendix A
AliceSecret Secret (seed for hash function selection/salt) used when encoding Alice's data. Can be String or Integer. "SuperSecretSalt1337" Appendix A
AliceN Size of N-grams used for encoding Alice's data. 2
AliceMetric Similarity metric to be computed during similarity graph generation on Alice's data "dice" Chapter 3.1
EveAlgo Algorithm used for encoding Eve's data. One of "BloomFilter", "TabMinHash", "TwoStepHash" or None (No Encoding). None Appendix A
EveSecret Secret (seed for hash function selection/salt) used when encoding Eve's data. Can be String or Integer. "ATotallyDifferentString42" Appendix A
EveN Size of N-grams used for encoding Eve's data. 2
EveMetric Similarity metric to be computed during similarity graph generation on Eve's data "dice" Chapter 3.1

Additional Parameters for Bloom Filter Encoding

Parameter Name Description Default Reference
AliceBFLength Length of the Bloom Filters created for Alice's data. Must be a power of 2. 1024 Appendix A
AliceBits Number of hash functions to populate the Bloom Filter, i.e. bits per N-Gram 10 Appendix A
AliceDiffuse If True, adds diffusion layer to Bloom Filter encoding False Armknecht et al.
AliceT Diffusion parameter t, i.e. number of bit positions in Alice's encodings to be XORed when creating ELDs. 10 Armknecht et al.
AliceEldLength Length of the ELD, i.e. BF with applied diffusion, for Alice's encodings. 1024 Armknecht et al.
EveBFLength Length of the Bloom Filters created for Eve's data. Must be a power of 2. 1024 Appendix A
EveBits Number of hash functions to populate the Bloom Filter, i.e. bits per N-Gram 10 Appendix A
EveDiffuse If True, adds diffusion layer to Bloom Filter encoding False Armknecht et al.
EveT Diffusion parameter t, i.e. number of bit positions in Eve's encodings to be XORed when creating ELDs. 10 Armknecht et al.
EveEldLength Length of the ELD, i.e. BF with applied diffusion, for Eve's encodings. 1024 Armknecht et al.

Additional Parameters Tabulation MinHash Encoding

Parameter Name Description Default Reference
AliceNHash Number of (tabulation-based) hash functions to use during MinHashing of Alice's data. 1024 Smith
AliceNHashBits Number of bits to be generated per hash function during MinHashing of Alice's data. Must be 8, 16, 32 or 64. 64 Smith
AliceNSubKeys Number of sub-keys to be generated from the initial 64-bit hash during MinHashing of Alice's data. Must be a divisor of 64. 8 Smith
Alice1BitHash If True, applies LSB hashing, i.e. returns only the least significant bit of the MinHash results. True Smith
EveNHash Number of (tabulation-based) hash functions to use during MinHashing of Eve's data. 1024 Smith
EveNHashBits Number of bits to be generated per hash function during MinHashing of Eve's data. Must be 8, 16, 32 or 64. 64 Smith
EveNSubKeys Number of sub-keys to be generated from the initial 64-bit hash during MinHashing of Eve's data. Must be a divisor of 64. 8 Smith
Eve1BitHash If True, applies LSB hashing, i.e. returns only the least significant bit of the MinHash results. True Smith

Additional Parameters for Two-Step-Hash Encoding

Parameter Name Description Default Reference
AliceNHashFunc Number of hash function, i.e. number of bits per N-gram, to use when populating the intermediate BFs on Alice's data 10 Ranbaduge et al.
AliceNHashCol Number of columns (length) of intermediate BFs computed on Alice's data 1000 Ranbaduge et al.
AliceRandMode Algorithm to be used for column-wise hashing of Alice's intermediate encodings. Either "PNG" for a pseudo-random number generator or "SHA" for SHA-256. "PNG" Ranbaduge et al.
EveNHashFunc Number of hash function, i.e. number of bits per N-gram, to use when populating the intermediate BFs on Eve's data 10 Ranbaduge et al.
EveNHashCol Number of columns (length) of intermediate BFs computed on Eve's data 1000 Ranbaduge et al.
EveRandMode Algorithm to be used for column-wise hashing of Eve's intermediate encodings. Either "PNG" for a pseudo-random number generator or "SHA" for SHA-256. "PNG" Ranbaduge et al.

Embedding Configuration

Argument Name: EMB_CONFIG

Parameter Name Description Default Reference
AliceAlgo Algorithm to use for embedding Alice's data. Must be "Node2Vec" or "NetMF". "Node2Vec"
AliceQuantile Drop edges with AliceQuantile lowest edgeweights in Alice's similarity graph. Must be >=0 (keep all) and < 1. 0.9 Chapter 5
AliceDiscretize If True, sets all edgeweights in Alice's similarity graph to 1. False
AliceDim Dimensionality of Alice's embeddings. 128 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
AliceContext Context size to use when calculating Alice's embeddings 10 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
AliceNegative Number of negative samples during training of Alice's embeddings (NetMF only). 1 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
Alice Normalize If True, normalize Alice's embeddings (NetMF only). False Qiu et al. (NetMF)
EveAlgo Algorithm to use for embedding Eve's data. Must be "Node2Vec" or "NetMF". "Node2Vec"
EveQuantile Drop edges with EveQuantile lowest edgeweights in Eve's similarity graph. Must be >=0 (keep all) and < 1. 0.9 Chapter 5
EveDiscretize If True, sets all edgeweights in Eve's similarity graph to 1. False
EveDim Dimensionality of Eve's embeddings. 128 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
EveContext Context size to use when calculating Eve's embeddings 10 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
EveNegative Number of negative samples during training of Eve's embeddings (NetMF only). 1 Grover and Leskovec (Node2Vec)
Qiu et al. (NetMF)
Eve Normalize If True, normalize Eve's embeddings (NetMF only). False Qiu et al.

Additional Parameters for Node2Vec Embedding

Parameter Name Description Default Reference
AliceWalkLen Length of the random walks performed on Alice's similarity graph 100 Grover and Leskovec
AliceNWalks Number of random walks performed per node in Alice's similarity graph 20 Grover and Leskovec
AliceP "Return Parameter" for governing random walks performed on Alice's similarity graph 250 Grover and Leskovec
AliceQ "In-Out-Parameter" for governing random walks performed on Alice's similarity graph 300 Grover and Leskovec
AliceEpochs Number of epochs Alice's embeddings are trained for 5 Grover and Leskovec
AliceSeed Seed to initialize the (pseudo) random number generators used for calculating Alice's embeddings 42
EveWalkLen Length of the random walks performed on Eve's similarity graph 100 Grover and Leskovec
EveNWalks Number of random walks performed per node in Eve's similarity graph 20 Grover and Leskovec
EveP "Return Parameter" for governing random walks performed on Eve's similarity graph 250 Grover and Leskovec
EveQ "In-Out-Parameter" for governing random walks performed on Eve's similarity graph 300 Grover and Leskovec
EveEpochs Number of epochs Eve's embeddings are trained for 5 Grover and Leskovec
EveSeed Seed to initialize the (pseudo) random number generators used for calculating Eve's embeddings 42

Alignment Configuration

Argument Name: ALIGN_CONFIG

Parameter Name Description Default Reference
RegWS Regularization Parameter for Sinkhorn solver max(0.1, Overlap/2) Grave et al.
RegInit Regularization Parameter for convex initialization 1 Grave et al.
Batchsize Batchsize for Wasserstein Procrustes. If <=1 interpreted as share of overall data, if >1 interpreted as number of examples 1 Grave et al.
LR Learning rate for optimization 200 Grave et al.
LRDecay Learning rate decay. LR is multiplied by this factor after every epoch. 1
NIterInit Number of iterations during convex initialization 5 Grave et al.
NIterWS Number of iterations per epoch of optimization 100 Grave et al.
NEpochWS Number of optimization epochs 100 Grave et al.
Sqrt If True, compute alignment on the square root of embeddings True Grave et al.
EarlyStopping Terminate optimization if loss hasn't improved for this many epochs 10 Chapter 5
Selection Algorithm to select the records used for alignment. If "GroundTruth", then matrices are constructed using ground truth, i.e. embeddings representing the same records are stored in the same rows. If "Random", random subsampling is performed. If None, a random permutation of all available records are used. None
MaxLoad Specifies the number of elements to use for alignment if Selection is not None. Must be smaller than the number of records in the smaller dataset. None
Wasserstein If True, uses unsupervised Wasserstein Procrustes for alignment. If False, uses supervised closed-form Procrustes. True

Blocking Configuration

Argument Name: BLOCKING_CONFIG

Blocking is only performed in the re-implementation of Vidanage et al.'s attack. Thus, the BLOCKING_CONFIG parameter is not present otherwise

Parameter Name Description Default Reference
Disable If true, then the blocking step is skipped False
PlainSampleSize The length of the min-hash bands. 4 Broder
PlainNumSamples The number of LSH bands. 50 Broder
AliceRandomSeed Random Seed used for blocking Alice's data 42
EveRandomSeed Random Seed used for blocking Eve's data 17