FAQ.md 2.53 KB
Newer Older
Martin Perdacher's avatar
Martin Perdacher committed
1 2
### AVX-512 instructions

Martin Perdacher's avatar
Martin Perdacher committed
3
_What is the correct setting of parameters for uniform distributed data?_
Martin Perdacher's avatar
Martin Perdacher committed
4
There are two parameters, _KBLOCK_ and _stripes_. Throughout our experiments we have always used _KBLOCK=4_ and _stripes=14_ which are the default settings for any kind of distribution. For uniform distributed data _KBLOCK=16_ is faster, but the variation of _stripes_ has no effect. 
Martin Perdacher's avatar
Martin Perdacher committed
5

Martin Perdacher's avatar
Martin Perdacher committed
6
_Why do you use AVX-512 instructions?_
Martin Perdacher's avatar
Martin Perdacher committed
7
If we apply our Hilbert-curve in Intel or GNU compilers, auto-vectorization will get eliminated. Writing code with AVX instructions simulates the behaviour of having an implemented auto-vectorized approach. Nevertheless, we belive that future compilers will profit from the locality assumptions of the Hilbert curve.
Martin Perdacher's avatar
Martin Perdacher committed
8

Martin Perdacher's avatar
Martin Perdacher committed
9
_What are the Parameters KBLOCK and STRIPES?_
Martin Perdacher's avatar
Martin Perdacher committed
10

Martin Perdacher's avatar
Martin Perdacher committed
11
- KBLOCK: check after KBLOCK dimensions, whether the $`\varepsilon`$-distance is already exeeded. 
Martin Perdacher's avatar
Martin Perdacher committed
12
- STRIPES: How many EGO-Stripes are used. See Section _"2.2. Determining the bounds"_ in our paper. 
Martin Perdacher's avatar
Martin Perdacher committed
13

Martin Perdacher's avatar
Martin Perdacher committed
14
_How to set KBLOCK and STRIPES?_
Martin Perdacher's avatar
Martin Perdacher committed
15

Martin Perdacher's avatar
Martin Perdacher committed
16 17 18
KBLOCK should be smaller then the dimension of the dataset. Within our distance calculation, we check after KBLOCK dimensions whether we have exceeded $`\varepsilon^2`$ or not. 

Best fitting values for active dimesnions are $`0,1,2,3,4,5`$ which corresponds to $`1,2,5,14,41 (=((3^j)+1)/2)`$ stripes, (for more details see paper Section 3.1 "Determination of the Bounds")
Martin Perdacher's avatar
Martin Perdacher committed
19

Martin Perdacher's avatar
Martin Perdacher committed
20
In our experiments (see paper) we _always_ use the following setting:
Martin Perdacher's avatar
Martin Perdacher committed
21
- KBLOCK=4
Martin Perdacher's avatar
Martin Perdacher committed
22
- active dimensions=3, which are exactly 14 stripes
Martin Perdacher's avatar
Martin Perdacher committed
23 24 25

For uniform data we suggest to use the following parameter settings:
- KBLOCK=16
Martin Perdacher's avatar
Martin Perdacher committed
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
- STRIPES=1

_What does the output mean?_

Here an example output. 

N;D;JPPP;THREADS;EPSILON;STRIPES;KBLOCK;TIME;ALGTIME;SORTTIME;INDEXTIME;REORDERTIME;COUNTS;LOADPERCENT;WH
200000;64;0.000000;64;0.20000000000000;14;4;0.794607;0.579982;0.130889;0.514304;0.083736;0;0.061758;0.000000

- N ... number of objects
- D ... dimensionality (number of features)
- JPPP ... join-partners per point _nSelectivity_ (see Section 4.1.3).
- THREADS ... number of threads used
- STRIPES ... bounds (Section 3.1 in paper)
- KBLOCK ... check after each _KBLOCK_ objects, whether we have exceeded epsilon distance. 
- TIME ... time spent for the total algorithm
- ALGTIME ... time spent for join
- SORTTIME ... time spent for sorting
- INDEXTIME ... time spent for determining bounds (Section 3.1)
- REORDERTIME ... time spent for reordering the dimensions (proposed by Super-EGO)
- COUNTS ... cardinaities
- LOADPERCENT ... load in percent
- WH ... energy in watthours (currently turned off)