TL;DR: I needed to collect and compare keys across several large, overlapping datasets downloaded as JSON. Step one was extracting the keys from each JSON object to see how the datasets differed and which ones I actually needed. As usual with Python, there was more than one way to do it. So I put two methods to the test.

Solution 1: Set comprehension

Two for clauses: the first loops through each dictionary, the second iterates over each dictionaries keys and passes them into the new set.

key_set = {key for d in d list_of_dicts for key in d}

Solution 2: Unpacking with set.union()

Unpacking the list of dictionaries passes each dictionary as a separate argument to set.union(). Because iterating over a dictionary yields its keys, set.union() merges all the keys into the new set.

key_set = set().union(*list_of_dicts)

Comparison

Which is better? It depends.

For readability, I prefer set.union().

To compare performance, I tested both approaches with timeit: once on a list containing two tiny dictionaries, and again a list with 1,000 dictionaries of 1,000 keys each.

Small data

500000 loops, best of 5: 902 nsec per loop   # comprehension
200000 loops, best of 5: 956 nsec per loop   # union

Large data

1 loop, best of 5: 208 msec per loop   # comprehension
2 loops, best of 5: 165 msec per loop  # union

With two dicts, comprehension was 5.6% faster. With 1,000 × 1,000 dicts, set.union() pulled ahead by 20%.

The reason: comprehension happens directly in python language and avoids the upfront cost of unpacking the list items into a tuple. But once scale kicks in, the C-level speed of set.union() dominates.

Takeaway

This post isn’t about shaving microseconds, it's about understanding how Python works I can make informed, intentional decisions. If I’m skimming keys from a handful of dicts, either method is fine. As the structures increase in size, set.union() starts to pull away as the superior choice. Either way, testing beats guessing.

tags: #applied-learning, #Python, #performance-testing, #data-science

Python: Set Comprehension vs. set.union()