Lets start with the basics. What is a dictionary? A dictionary provides a mapping of key, value pairs. They allow a programmer to store a value under a key and look it up later.
A naive implementation could be done with two arrays, one to store the keys, and one to store the values. For managing a few keys this is fine, but once the number of keys grows large enough, this will begin to suffer because a lookup will have to inspect all elements, resulting in a computation complexity of O(n).
A more performant implementation could take advantage of the red-black binary tree data structure used behind the scene to store variant attributes. This significantly outperforms linear lookups with O(log n) operations.
This brings us to the meat of this blog post. O(log n) is certainly better than O(n), but we can do better. How does O(1) sound? Well, it sounded good to me, so I set out to make it happen.
The classic data structure for this is a 'Hash Table'. This works by passing the key through a hash function in order to map it to an array index. This provides the ability to perform an operation on the key which translates it into a number, and then the value is either inserted or retrieved from that location in the array. This unfortunately has some issues, chief of which is what happens when two keys hash to the same value. There are several strategies to handle this. The one usually taught in an undergraduate algorithms course is chaining with a linked list. Unfortunately LabVIEW doesn't have pointers, so you have to jump through some hoops to implement linked lists. Another common strategy is chaining in an array. This could be done easilly enough in LabVIEW by making an array of clusters of array. There are also other standard hash collision strategies, like linear probing, or quadratic probing, or double hashing.
While trying to decide how I would handle hash collisions, an idea struck me. What if instead of performing a linear search in the event of a collision, I perform a binary search? I decided to run with that idea and see where it took me.
The first implementation was significantly slower than using variant attributes. So this sent me down the road of benchmarking and optimizing.
Everything was really slow, taking 1000-10000x as long as using variant attributes. My first thought was that maybe debugging is causing issues, but this did not appear to be the case. My next thought was that LabVIEW has to pass this data structure into a subVI through the stack, and this could cause the slow down. Inlining brought the execution times down, but still about 10-100x too slow.
At this point, I was still concerned with memory copies. So, I went crazy with inplace element structures. This helped, but then I went back through and strategically removed many of them. Now performance was at least in the right order of magnitude. However by this point I was sure that I could make this faster, I just needed to figure out where the issue was.
After some uneffective searching, I began to question how well the hash function was working, it was pretty basic after all. Sure enough, many of the test samples were piling up in one or two of the hash locations. I set out to research various hashing methods used elsewhere. I found a few different methods around the web, and implemented one, which provided the final bit needed to have performance.
This Dictionary implementation performs faster lookups and reassignments than the variant attribute, and inserts are comparable most of the time to the variant attributes, though some times the internal array will nee to resize. So, I added one last feature, the initialize VI can be used to preallocate, and start off with an array the size you will need for your application. Using the pre-allocation feature all operations are always faster than variant attribute stores.
Source Code for this can be found here
- ShamusP's blog
- Log in or register to post comments
Comments
source
Any plan on posting the source code for this?
Is the source available for
Is the source available for this implementation?
Source code available?
Kindly share the source code for this if it's not confidential.