# Great Galloping Cuckoos
## Algorithms Faster than $log(n)$
### [Codemash](http://codemash.org) v2.0.1.6 - 8 January 2016
## Matt Williams
[matt@matthewkwilliams.com](mailto:matt@matthewkwilliams.com)
[@matt_k_williams](http://twitter.com/matt_k_williams)
[http://matthewkwilliams.com](http://matthewkwilliams.com)
[http://20slides.com](http://20slides.com)
[http://github.com/aetherical](http://github.com/aetherical)
Note
: The lovely picture is by my daughter Althea
## Background aka Why am I Up Here?
* Raspberry Pi Addict
* Search for "real" work for Pi's
* In the process I've discovered lots of new and nifty algorithms
## Disclaimers
1. I am not a Mathmetician
2. I am not a Chemist (this will make sense later)
3. I am not an expert
4. I have become passionate about this topic
## Good Algorithms Matter
> The real reason behind choosing good algorithms to solve a problem
> is that the good ones either break the problem into smaller problems
> or are myopic; focusing only on the steps needed to reach an
> answer. All else is superfluous.
## Real World™ Considerations
Each algorithm has its "sweet spot", but the considerations tend to be among the following:
* Space
* Runtime
* Data Access
> Know thy data
> -- "Not Socrates"
> Like sand through the hourglass, so is the data of our lives
> -- Also "Not Socrates"
## The Big $O$
- $O()$ refers to the complexity of an algorithm.
- You can also think of it in terms of the number of cycles/comparisons/steps to complete its task.
## $O(n)$
### Linear Time
Size | Cycles
----:|---------:
1 | 1
2 | 2
10 | 10
100 | 100
1000 | 1000
## $O(n^2)$
## Quadratic Time
Size | Cycles
----:|---------:
1 | 1
2 | 4
10 | 100
100 | 10000
1000 | 1000000
## $O(nlog(n))$
Size | Cycles
----:|----------:
1 | 0
2 | 1
10 | 23
100 | 460
1000 | 6907
## $O(log(n))$
### Logarithmic Time
Size | Cycles
----:|----------:
1 | 0
2 | 1
10 | 3
100 | 7
1000 | 10
## $O(log(i))$
### Logarithmic Time
Size | Cycles
----:|----------:
1 | 1
2 | 1
10 | 1
100 | 2
1000 | 3
## $O(log(log(n)))$
### "Logalog"
Size | Cycles
----:|------------:
100000 | 2
1000000 | 2
10000000 | 2
100000000 | 2
1000000000 | 3
## $O(1)$
### Constant Time
Size | Cycles
----:|---------:
1 | 1
2 | 1
10 | 1
100 | 1
1000 | 1
## Searches
Looking for 812...
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Linear
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Binary
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Binary
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Binary
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Binary
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Galloping Search
- First published in 1976!!!!
- Sliding Window
- Good for cases where
- The target is at beginning or end
- The bounds of the search space are not closed (like log file)
- Approaches $O(log(i))$
## Galloping Search
First window is $2^1$ long.
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^ ^
| |
'-----'
## Galloping Search
Second window is $2^2$ long.
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^ ^
| |
'-----------------'
## Galloping Search
Third window is $2^3$ (or to the end of the array) wide. When the
proper window is found, then do a binary search on the window.
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^ ^
| |
'-----------------------------'
## Where it Beats Binary Search
- If 22 is the target, binary search takes 3 checks. Galloping search finds it on a window boundary.
* * Galloping
| o Binary
.-----.
| |
v v
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^ ^ ^
| | |
o o o
3 2 1
## Where it Beats Binary Search
- If looking for 13, then binary search takes 4 checks to find it isn't a match. Galloping finds it isn't in the array immediately.
* * Galloping
| o Binary
.--+--.
| |
v v
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^ ^ ^ ^
| | | |
o o o o
4 3 2 1
## Variants on Galloping Search
- Start at other end
- Trot -- step grows linearly (1,2,3,...) so that the indexes are
(1,3,6,10,15,...). Once the value is overshot, then hone in with
binary search. $O(\sqrt n)$
## Interpolation Search
## Interpolation Search
* Similar to how we look up a word in a dictionary.
* Instead of splitting in $1/2$ each time, make a guess where it should be in range of values
* Assumes an uniform distribution of points
* Approaches $O(log(log(n)))$
## Interpolation Search
### Predicted Location
$(seeking/max)/size$
## Interpolation Search
First check point: $(812 / 931) * 12 = 10$
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
^
|
## Search Summary
Searching for 812
Algorithm | # of Iterations
----------|---------------:
Linear | 10
Binary | 4
Galloping | 4
Interpolation | 1
> "What the hell is this monstrosity? And why the hell did he name it corned__beef?"
>
> "Jerry says the name is probably some kind of rotten pun. What does it do?"
>
> "Basically it takes the value of the characters of a demon's name, multiplies them by a number, adds another number and then divides the result by 65,353. Then it uses that result as a subscript in some kind of an array." She shook her head again. "Why 65,353? Jesus! You know, if this guy doesn't come back we may never understand some of this stuff." -- [The Wizardry Compiled by Rick Cook](http://www.baenebooks.com/showproduct.aspx?ProductID=1632&SEName=the-wizardry-compiled)
## Hash
* Used for sparsely populated universes
* Quick lookup of values
* Often implemented with a space the size of a (largeish) prime. 65353 is a large prime near the maximum size of a 16 bit unsigned Integer.
## Hash
.-----.
| |
+-----+
| |
+-----+
| |
+-----+
| |
+-----+
| |
'-----'
## Hash
Hashing function maps entity to limited space in hash
$(CRC(entity) modulus ARRAYLENGTH)$
.-----.
| |
+-----+
| 5 |
+-----+
| |
+-----+
| 9 |
+-----+
| |
'-----'
## Hash Collision
Try to insert 23; but it collides with 5.
.-----.
| |
+-----+
| 5 |
+-----+
| |
+-----+
| 9 |
+-----+
| |
'-----'
## Handling Collisions
* Rehash with larger hash table
* Buckets
## Hash: Bucket Hash
.-----.
| |
+-----+ .-----.
| 5 +-->| 23 |
+-----+ '-----'
| |
+-----+
| 9 |
+-----+
| |
'-----'
## Hash: Bucket Hash
.-----.
| A |
+-----+ .-----. .-----.
| 5 +-->| 23 +-->| 72 |
+-----+ '-----' '-----'
| |
+-----+ .-----. .-----. .-----.
| 9 +-->| 101 +-->| bee +-->| X |
+-----+ '-----' '-----' '-----'
| b |
'-----'
## Cuckoo Hash
A common cuckoo chick pushes one of its host’s eggs out of the nest. Detail of figure 1 from [Anderson et al. (2009)](http://dx.doi.org/10.1371/journal.pone.0007725)
## Cuckoo Hash
A common cuckoo chick pushes one of its host’s eggs out of the nest. Detail of figure 1 from [Anderson et al. (2009)](http://dx.doi.org/10.1371/journal.pone.0007725)
## Cuckoo Hash
Trying to insert 23 again, it collides with 5
.-----. .-----.
| | | |
+-----+ +-----+
| 5 | | |
+-----+ +-----+
| | | |
+-----+ +-----+
| 9 | | |
+-----+ +-----+
| | | |
'-----' '-----'
## Evict 5
Each hash has a separate hashing function
.-----. .-----.
| | | |
+-----+ +-----+
| 23 +----+ | |
+-----+ | +-----+
| | | | |
+-----+ | +-----+
| 9 | +---->+ 5 |
+-----+ +-----+
| | | |
'-----' '-----'
## Time Passes
.-----. .-----.
| | | M |
+-----+ +-----+
| 23 | | X |
+-----+ +-----+
| B | | |
+-----+ +-----+
| 9 | | 5 |
+-----+ +-----+
| | | |
'-----' '-----'
## Multiple Evictions
Adding 101, collides with B & B collides with X
.-----. .-----.
| | | M |
+-----+ +-----+
| 23 | | X |
+-----+ +-----+
| B | | |
+-----+ +-----+
| 9 | | 5 |
+-----+ +-----+
| | | |
'-----' '-----'
## Multiple Evictions
.-----. .-----.
| | | M |
+-----+ +-----+
| 23 | +------->+ B +--+
+-----+ | +-----+ |
| 101 +-+ | | |
+-----+ +-----+ |
| 9 | | 5 | |
+-----+ +-----+ |
| X +<-+ | | |
'-----' | '-----' |
+----------------+
## Advantages of Cuckoo Hash
* Approaches $O(1)$ for inserts
* $O(1)$ for lookups
## Limitations of Cuckoo Hash
* Needs a certain amount of empty space; if not an eviction cycle can occur
* May need to resize & rehash to keep evictions from looping
> Obi-Wan: "That Hash is our only hope."
>
> Yoda: "No.... there is another."
## Peacock Hash
* Handles collisions by adding extra smaller hashes
* Originally devised for network routers to store IP/Mac Addresses
* Realtime, constant time
## Peacock Hash
.-----.
| |
+-----+ .-----.
| 23 | | X |
+-----+ +-----+ .-----.
| B | | | | M |
+-----+ +-----+ '-----'
| 9 | | 5 |
+-----+ '-----'
| |
'-----'
## Unordered Set
* Can be implemented as a list or as a hash
* Inserts are cheap; retrievals may not be
.-.
.-+ |
.--+ 5 '--.
| X |
.+ A Z 3 +.
| Pi |
'+ cat 2.7 +'
| 99 |
'--+ 101 .--'
'-+ |
'-'
## Ordered Sets
* Elements stored sorted; inserts can be expensive
.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.-----.
| 1 | 22 | 47 | 135 | 239 | 333 | 497 | 764 | 799 | 812 | 876 | 931 |
'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'-----'
## Stored as Tree
.---.
| 333 |
'---'
/ \
/ \
/ \
/ \
.---. .---.
| 47 | | 764 |
'---' '---'
/ \ / \
/ \ / \
.---. .---. .---. .---.
| 22 | | 135 | | 497 | | 812 |
'---' '---' '---' '---'
## Set Intersection
## Set Intersection
* Surveyed implementations are $O(n)$ for intersection
* Ruby
* Java
* Go
## Galloping Intersection
## Comparisons of Set Intersections
Excerpted from [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf)
Algorithm | # of comparisons
-----------|----------------:
Sequential | 119479075
Adaptive | 83326341
Small Adaptive | 68706234
Interpolation Sequential | 55275738
Interpolation Adaptive | 58558408
## Comparisons of Set Intersections
Excerpted from [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf)
Algorithm | # of comparisons
-----------|----------------:
Sequential | 119479075
Interpolation Small Adaptive | 44525318
Extrapolation Small Adaptive | 50018852
Extrapolate Many Small Adaptive | 44087712
Extrapolate Ahead Small Adaptive | 43930174
## Case Study
* Search for Real Work™ for my Pi Cluster
* Pi 2B
* 4 Cores / 900 Mhz
* 1G Ram
* I/O on USB 2 Bus (480 MHz)
## Case Study
* Substructure Search
* PubChem from NIH
* 150GB compressed download
* Uncompressed, 25K substances is 120MB; whole ~3TB
* ~69 Million substances
## Smiles
* Text Description of Chemical Structures
* C1CCCCC1 is a benzene ring
* Typically Hydrogen is left out
* Can be more than one way to describe the same structure
"Benzene-aromatic-3D-balls" by Benjah-bmm27 - Own work. Licensed under Public Domain via Commons.
## Naive Solution
> SMILES is text.
> What do we use to search text?
Image by [EpicBusinessman](https://www.reddit.com/user/EpicBusinessman); from [I drew this GREP thing and kinda forgot about it.](https://www.reddit.com/r/gamegrumps/comments/1yffou/i_drew_this_grep_thing_and_kinda_forgot_about_it/)
## Naive Solution
* 5 Pi 2B, searching a subset of data
* ID, Smiles, Name
* Docker Swarm to distribute workload
## Naive Solution
Structure | Results | Time
----------|--------:|-----:
C1CCCCC1=O | ~2500 | 5s
C1CCCCC1O | ~17500 | 15s
## Fingerprints
* Pubchem has a defined bitmap to describe a molecule
* 880 bits wide
* Has >2 Carbon, >4 Carbon, Boron, A ring
## Lightbulb
* SMILES can be converted to Fingerprint with Chemistry Dev Kit
* If I create sets of substances which have a particular bit set, I
can take the intersections of those sets to find the substances
which match the fingerprint.
## Sets
* 810 sets, one for each bit in the bitmap
* One file per bit in bitmap containing a list of substances which have the bit set.
* Using unsigned 32bit ints, the sets are anywhere from 4 bytes to 225MB in size.
* Median 9.3M (2326026 substances)
* Average 40M
* 33GB total
## Optimizations
Memoization FTW!
: If I've solved it once, why solve it again?
## Optimizations
* Start from the smallest set and compare it with other sets
* In cases where the majority of the substances have this property, consider inverting the set -- checking against the substances which don't have the property
## If it's a bitmap... why not use bitmap tools
* In order to fit the universe, 10000 x 10000 1 bit PNG
* 12K to 6.8M
* Median 1.7M
* Average 2.4M
## Bitmaps
* ImageMagick on a 3.5 GHz box requires 10s/image intersection using a single core.
* `convert FILE1 FILE2 -compose Darken -composite out.png`
* Using 5 cores can do 30/minute
* Pre-Calculating all of the paths will take ~20 years.
## Credits
- Presentation created with Reveal.js
- Markdown and Diagram rendering via Markdeep
## Resources: General Topics
- [Big O notation](https://en.wikipedia.org/wiki/Big_O_notation)
- [SIMD](https://en.wikipedia.org/wiki/SIMD)
## Resources: Bitmaps
- [Better bitmap performance with Roaring bitmaps](http://arxiv.org/pdf/1402.6407v9.pdf)
- [A General SIMD-based Approach to Accelerating Compression Algorithms](http://arxiv.org/abs/1502.01916)
- [Roaring Bitmap](https://github.com/lemire/RoaringBitmap)
## Resources: Cuckoos
- [Cuckoo Hashing Visualization](http://www.lkozma.net/cuckoo_hashing_visualization/)
- [Cuckoo Hashing](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.4189&rep=rep1&type=pdf) (the original paper)
- [cuckoo - GoDoc](https://godoc.org/github.com/tildeleb/cuckoo) (go library)
- [garethm/cuckoo](https://github.com/garethm/cuckoo) (ruby)
## Resources: Searches
* [Binary search](http://www.programming-algorithms.net/article/40119/Binary-search)
* [Linear search](http://www.programming-algorithms.net/article/40168/Linear-search)
* [Prune and search](http://www.programming-algorithms.net/article/41235/Prune-and-search)
## Resources: Galloping Search
- [Beating the binary search algorithm – interpolation search, galloping search](http://blog.teamleadnet.com/2014/06/beating-binary-search-algorithm.html)
- [AN ALMOST OPTIMAL ALGORTTHM FOR UNBOUNDED SEARCHING](http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-1679.pdf)
## Resources: Interpolation
- [Interpolation search](http://www.programming-algorithms.net/article/40348/Interpolation-search)
## Resources: Peacock Hashes
- [Approximately-Perfect Hashing: Improving Network Throughput through Efficient Off-chip Routing Table Lookup](http://www.cise.ufl.edu/~zhuang/doc/infocom11.pdf)
- [Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking](http://www.arl.wustl.edu/~sailesh/papers/Peacoc%20Hash%20v2.doc)
## Resources: Sets
- [A Fast Set Intersection Algorithm for Sorted Sequences](http://www.cs.ucr.edu/~stelo/cpm/cpm04/25_Baeza-yates.pdf)
- [Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences](https://cs.uwaterloo.ca/~ajsaling/papers/paper-spire.pdf)
- [Experimental Comparison of Set Intersection Algorithms for Inverted Indexing](http://ceur-ws.org/Vol-1003/58.pdf)
- [Fast Set Intersection in Memory](http://research.microsoft.com/pubs/142850/p255-dingkoenig.pdf)
- [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf)
- [Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions](http://www.vldb.org/pvldb/vol8/p293-inoue.pdf)
- [SIMD Compression and the Intersection of Sorted Integers](http://arxiv.org/abs/1401.6399)
## Resources: Cheminformatics
* [The PubChem Project](https://pubchem.ncbi.nlm.nih.gov/)
* [SDF Toolkit](http://cactus.nci.nih.gov/SDF_toolkit/) (perl)
* [Chemistry Development Kit](http://sourceforge.net/projects/cdk/) (java)
* [Simplified molecular-input line-entry system (SMILES)](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system)
## Resources: Raspberry Pi
* [http://raspberrypi.org](http://raspberrypi.org)
* [http://blog.hypriot.org](http://blog.hypriot.org)
* [http://matthewkwilliams.com](http://matthewkwilliams.com)