You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,11 +105,12 @@ As mentioned, the current implementation is a proof-of-concept prototype.To eval
105
105
#### Dataset Size
106
106
The source code of those test programs can be found in the root path of the project, namely `SDSSECQ.cpp` and `SDSSECQS.cpp`. The code in this repository inserts 1000 files with two keywords "Alice" and "Bob", deletes 100 files (10% deletion), and then executes the conjunctive query ("Alice" AND "Bob"). To enlarge the size of dataset, one can modify the above two files by increasing the numbers of insertions/deletions or adding more keywords.
107
107
108
-
Besides, as the number of keyword-id pairs increases, we should use a larger Bloom filter to keep the XSet for conjunctive queries. Hence, the `XSET_SIZE` and `XSET_HASH` in `Util
109
-
/CommonUtil.h` should be updated accordingly. Note that the current parameters `XSET_SIZE=2875518`and `XSET_HASH=20` can support conjunctive queries against a dataset with 100k keyword-id pairs with less than 10^-7 false positive rate. We would refer our readers to [here](https://hur.st/bloomfilter/) to compute the new Bloom filter parameters as required.
108
+
Besides, as the number of keyword-id pairs increases, we should use a larger Bloom filter to keep the XSet for conjunctive queries. Hence, the `MAX_DB_SIZE`, `XSET_FP`, and `XSET_HASH` in `Util
109
+
/CommonUtil.h` should be updated accordingly. Note that the current parameters `MAX_DB_SIZE=100000`, `XSET_FP=0.0000001`and `XSET_HASH=20` can support conjunctive queries against a dataset with 100k keyword-id pairs with less than 10^-7 false positive rate.
110
110
111
111
#### Deletion
112
-
Since the deletion is also based on Bloom filter, there are another two Bloom filter parameters, i.e., `GGM_SIZE` and `HASH_SIZE` to be set with the increasing number of deletion operations. The current parameters are `GGM_SIZE=579521` and `HASH_SIZE=5`, which are sufficient for 100 deletions (with only 10^-21 false positive rate) in the test code. Please also update these two parameters when the number of deletion increases by referring to the above Bloom filter calculator.
112
+
Since the deletion is also based on Bloom filter, there are another two Bloom filter parameters, i.e., `GGM_FP` and `HASH_SIZE` to be set to help the code automatically derive the Bloom filter size for the GGM tree.
113
+
The current parameters are `GGM_FP=0.0001` and `HASH_SIZE=5`, which is designed to achieve 10^-4 false positive rate with the 5 hash functions in the test code. Please also update these two parameters if you are looking for a smaller false positive rate or GGM tree size (more hash functions lead to a smaller tree).
0 commit comments