Dict views index performance comparison - Results

Return to listing

Overview

The following tests are designed to assess the performance of a proposed change to the python core language to support direct indexing of dict_keys, dict_items and dict_views objects.

The change would allow the results of dict.keys(), dict.items() and dict.values() to behave much more like lists.

Currently, it's only possible to implement dictionary indexing in O(n) time, as dict deletions leave 'holes' in the compact in-memory dictionary structure. This linear cost, however, is not large.

The following graphs show the time taken for python to evaluate several variations of a number of simple dictionary operations. These examples are designed to be instuctive, and representative of only the simplest operations.

Some of the code being tested is code that should never be used in real code, as it exhibits strongly non-linear performance characteristics. They have been included her to demonstrate this fact.

If there are any examples missing, or alternate syntaxes that may be more performant, please submit an issue or pull request here: github.com/stestagg/dict_index

Code annotated with "(*)" are snippets that do not run without the patches included in this change. They demonstrate the performance of allowing direct indexing of dict views.

Test Environment

Property Value
Run Time 2020-08-01 12:40:08
memory Total: 2.0 GB Available at start: 1.9 GB
release 5.4.51-1-ARCH
system Linux
processor ARMv7 Processor rev 3 (v7l)
python_compiler GCC 9.3.0
python_build ['remotes/origin/HEAD-dirty:5c3270939c', 'Aug 1 2020 09:41:11']
python_implementation CPython
python_revision 5c3270939c

Tests

Get first key

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    """
    Calling list on the dict
    """
    return list(self.values)[0]
use_iter
@Test.method
def use_iter(self):
    """
    Calling iter on the keys
    """
    return next(iter(self.values.keys()))
proposed(*)
@Test.method
def proposed(self):
    """
    Using keys indexing
    """
    return self.values.keys()[0]
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
use_iter 7.19 µs 6.45 µs 7.28 µs 8.59 µs 10.04 µs 19.91 µs 30.02 µs
proposed 5.74 µs 6.80 µs 7.44 µs 10.76 µs 9.02 µs 18.15 µs 27.33 µs
use_list 7.89 µs 7.35 µs 11.11 µs 32.11 µs 251.50 µs 4,411.30 µs 28,502.41 µs

Get first item

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    """
    Calling list on the dict
    """
    return list(self.values.items())[0]
use_iter
@Test.method
def use_iter(self):
    """
    Calling iter on the view
    """
    return next(iter(self.values.items()))
proposed(*)
@Test.method
def proposed(self):
    """
    Using view indexing
    """
    return self.values.items()[0]
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
proposed 6.55 µs 6.35 µs 7.20 µs 6.59 µs 9.94 µs 19.20 µs 27.78 µs
use_list 7.69 µs 9.96 µs 26.15 µs 125.39 µs 2,850.20 µs 28,888.07 µs 147,791.81 µs
use_iter 5.87 µs 7.37 µs 7.15 µs 7.76 µs 16.35 µs 22.67 µs 31.26 µs

Get last key

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    return list(self.values)[-1]
use_iter
@Test.method
def use_iter(self):
    return next(reversed(self.values.keys()))
proposed(*)
@Test.method
def proposed(self):
    return self.values.keys()[-1]
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
use_iter 8.13 µs 10.70 µs 14.89 µs 11.17 µs 13.46 µs 25.96 µs 36.44 µs
proposed 5.52 µs 6.30 µs 6.43 µs 10.43 µs 27.96 µs 593.50 µs 2,051.35 µs
use_list 7.74 µs 7.43 µs 9.11 µs 32.54 µs 270.31 µs 4,440.89 µs 27,958.98 µs

Iterate over all keys

Best of 5 runs, over dictionaries with 1 - 500,000 items.
list_index
@Test.method
def list_index(self):
    """ This is a bad way of doing this """
    keys = list(self.values.keys())
    for i in range(len(keys)):
        last_key = keys[i]
    return last_key
list_iter
@Test.method
def list_iter(self):
    """ This is a bad way of doing this """
    keys = list(self.values.keys())
    for key in keys:
        last_key = key
    return last_key
direct_iter
@Test.method
def direct_iter(self):
    for key in self.values.keys():
        last_key = key
    return last_key
keys_index(*)
@Test.method
def keys_index(self):
    """
    Using keys indexing. This approach is *not* recommended,
    but included here to assess the performance of sitations
    where this may be done in 'error'.
    """
    keys = self.values.keys()
    for i in range(len(keys)):
        last_key = keys[i]
    return last_key
iter_keys
@Test.method
def iter_keys(self):
    """
    Using the sensible iter()
    """
    for key in iter(self.values.keys()):
        last_key = key
    return last_key
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
direct_iter 5.93 µs 7.59 µs 11.13 µs 59.39 µs 519.20 µs 5,399.50 µs 26,108.91 µs
list_iter 9.06 µs 12.85 µs 15.24 µs 79.42 µs 692.54 µs 9,948.80 µs 55,367.35 µs
iter_keys 7.09 µs 7.89 µs 11.70 µs 59.17 µs 528.43 µs 5,846.00 µs 29,138.72 µs
keys_index 9.02 µs 10.37 µs 32.70 µs 1,166.68 µs 103,624.30 µs 11,277,766.37 µs 326,436,172.96 µs
list_index 11.98 µs 10.72 µs 26.15 µs 167.24 µs 1,648.78 µs 19,421.50 µs 100,304.69 µs

Iterate over all items

Best of 5 runs, over dictionaries with 1 - 500,000 items.
list_index
@Test.method
def list_index(self):
    """ This is a bad way of doing this """
    items = list(self.values.items())
    for i in range(len(items)):
        last_item = items[i]
    return last_item
list_iter
@Test.method
def list_iter(self):
    """ This is a bad way of doing this """
    items = list(self.values.items())
    for item in items:
        last_item = item
    return last_item
direct_iter
@Test.method
def direct_iter(self):
    for item in self.values.items():
        last_item = item
    return last_item
items_index(*)
@Test.method
def items_index(self):
    """
    Using items indexing. This approach is *not* recommended,
    but included here to assess the performance of sitations
    where this may be done in 'error'.
    """
    items = self.values.items()
    for i in range(len(items)):
        last_item = items[i]
    return last_item
iter_items
@Test.method
def iter_items(self):
    """
    Using the sensible iter()
    """
    for item in iter(self.values.items()):
        last_item = item
    return last_item
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
list_index 9.30 µs 18.43 µs 37.74 µs 257.72 µs 4,044.68 µs 44,513.06 µs 224,318.55 µs
list_iter 8.70 µs 10.67 µs 29.87 µs 168.46 µs 3,154.07 µs 32,412.46 µs 174,551.00 µs
items_index 9.98 µs 12.57 µs 45.35 µs 1,322.93 µs 106,041.41 µs 11,223,158.32 µs 327,562,007.42 µs
direct_iter 6.57 µs 13.56 µs 17.00 µs 103.30 µs 979.82 µs 10,927.85 µs 49,079.93 µs
iter_items 7.87 µs 10.74 µs 17.39 µs 104.59 µs 989.80 µs 10,337.35 µs 52,068.20 µs

Get a value from the middle

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    """
    Calling list on the dict
    """
    return list(self.values.values())[self.index]
use_iter
@Test.method
def use_iter(self):
    """
    Calling iter on the keys
    """
    return next(itertools.islice(
        self.values.values(), 
        self.index, 
        self.index+1)
    )
iter_2
@Test.method
def iter_2(self):
    """
    Count over the iter results
    """
    left = self.index
    for value in self.values.values():
        if left == 0:
            return value
        left -= 1
proposed(*)
@Test.method
def proposed(self):
    """
    Using keys indexing
    """
    return self.values.values()[self.index]
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
iter_2 5.98 µs 9.46 µs 17.56 µs 96.63 µs 974.39 µs 10,096.37 µs 52,651.04 µs
proposed 6.98 µs 7.26 µs 7.59 µs 7.20 µs 18.43 µs 371.18 µs 1,266.70 µs
use_iter 13.39 µs 13.13 µs 15.31 µs 20.48 µs 121.24 µs 1,052.48 µs 5,163.83 µs
use_list 12.20 µs 9.24 µs 11.31 µs 33.63 µs 237.76 µs 2,275.04 µs 14,447.63 µs

Retrieve random sample of items from dict

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    return random.sample(list(self.values.items()), self.sample_size)
proposed(*)
@Test.method
def proposed(self):
    return random.sample(self.values.items(), self.sample_size)
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
proposed 63.91 µs 91.76 µs 296.80 µs 2,276.65 µs 7,790.83 µs 35,813.24 µs 160,066.48 µs
use_list 46.61 µs 72.48 µs 258.28 µs 2,164.52 µs 7,558.31 µs 32,519.76 µs 151,574.83 µs

Retrieve random item from dict

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    return random.choice(list(self.values.items()))
proposed(*)
@Test.method
def proposed(self):
    return random.choice(self.values.items())
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
use_list 15.54 µs 24.13 µs 25.13 µs 134.50 µs 2,808.85 µs 28,782.44 µs 146,900.63 µs
proposed 14.61 µs 12.56 µs 15.26 µs 19.54 µs 29.61 µs 33.70 µs 626.48 µs

Access 100k items by index

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    items = list(self.values.items())
    for i in self.indices:
        last = items[i]
    return last
proposed(*)
@Test.method
def proposed(self):
    items = self.values.items()
    for i in self.indices:
        last = items[i]
    return last
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
use_list 12,167.55 µs 12,509.94 µs 12,337.35 µs 13,766.24 µs 22,801.50 µs 62,553.13 µs 193,707.15 µs
proposed 23,829.02 µs 26,460.50 µs 35,223.33 µs 132,260.37 µs 1,062,860.33 µs 11,459,335.83 µs 66,932,503.87 µs

Get last key from a dict that has had entries removed (branch misses)

Best of 5 runs, over dictionaries with 1 - 500,000 items.
use_list
@Test.method
def use_list(self):
    return list(self.values)[-1]
use_iter
@Test.method
def use_iter(self):
    return next(reversed(self.values.keys()))
proposed(*)
@Test.method
def proposed(self):        
    return self.values.keys()[-1]
Best Times
Method Dict Size
1 10 100 1,000 10,000 100,000 500,000
proposed 6.13 µs 6.44 µs 7.80 µs 9.35 µs 41.74 µs 694.11 µs 2,141.33 µs
use_iter 10.39 µs 12.57 µs 11.13 µs 8.11 µs 16.09 µs 28.57 µs 30.83 µs
use_list 10.67 µs 8.15 µs 9.39 µs 36.15 µs 298.94 µs 4,940.56 µs 25,989.85 µs