SECRET OF CSS

Using Heaps to Speed Up Code Performance in Go | by Lucas Pereyra | Sep, 2022


A quick example guided introduction

0*lVFrDfQZbVnJ9Fln 300w
Photo by Davide Cantelli on Unsplash

A binary heap is a special kind of tree-based data structure that has proven useful under certain scenarios, like repeated calculations using the lowest/highest value in a collection of numbers. In this article, we’ll take a closer look at heaps through an (I hope) easy-to-understand example.

Min binary heap example. Each node’s value should be lower than its children’s.
Min binary heap example. Each node’s value should be lower than its children’s.

I often try to solve some programming exercises to keep my programming skills warm and learn new things that I can add to my developer toolkit. A couple of weeks ago, I found this interesting challenge proposed on the HackerRank platform. The problem, described simply with no metaphors, can be stated as follow:

Given a collection of N integers and an integer K, calculate the number of iterations that need to be executed until every integer in the collection is ≥ K. On each iteration, the collection is shrunk as a result of applying the following operation: take the two lowest integers and calculate their replacement as minInteger + 2*secondMinInteger. If the condition can never be met, then the result should be -1.

I tried a first custom approach that seemed to work correctly, but it failed due to performance issues. Normally, the test cases proposed by this platform not only check that your code works for specific input sets but also ensure that it meets specific execution time restrictions when large inputs are provided. My solution was correct for all the cases, but it took ages to finish some of them.

I tried some custom optimizations in my solution and got an acceptable final version, but it didn’t work either. After giving up, I went to the comments section and found that most devs suggested using a heap data structure. I adopted the heap approach and tried it, passing all the test cases 100% successfully. Not content with this, I researched to find out why the heap-based approach was so superior.

Throughout this article, I’ll share both my approach and the heap-based one, comparing them to highlight performance differences and showcasing some experiments I did on my own that helped me understand the power of heaps.

Here I share my first approach to the solution without using heaps. The algorithm I wrote was more or less like this:

  1. Sort all the integers in ascending order
  2. Take out both the lowest and the second lowest integers, calculate the new one
  3. Find the most suitable position to insert the new integer in the collection (remember that it is already sorted), then insert it
  4. Repeat from 2 to 3 until the first integer in the collection is ≥ K (which means that all items meet this condition since it is always sorted)

And my implementation in the Go programming language would be like the following:

Custom solution approach without heaps

First of all, if you’re unfamiliar with the term heap or binary heap, here is a simple resource you can look at to be all on the same page. Now, by using a heap, the algorithm turned out to be simpler:

  1. Initialize the heap
  2. Take out both the lowest and the second lowest integers by popping out from the heap, then push the new integer into it
  3. Repeat step 2 until the lowest integer is ≥ K

Again, here is the Go code I wrote for this:

Heap-based solution for the problem

This code makes use of the pro_heap.MinHeap data type, which is a heap.Interface implementation in Go, and it’s defined as follows:

Go’s heap.Interface implementation required by the Go’s heap package functions

To check why the heap-based solution seemed much faster than mine, I did some performance experiments to compare both.

Experiment time!

First, I generated a bunch of samples (here, collections of integers) to be provided as the input of each algorithm. I tried with different N values (here, the size of the integers collection), including 1k, 10k, 100k, and 1M, but using the same K value of 700. Each integer belonging to the collection was a randomly generated number in the (0, 1000] range.

After creating the samples, I run each version of the solution five times against each one and documented its total execution time. Below are the results I got:

Algorithms execution time comparison

As you can see, the bigger the collection is, the higher execution times we have, and the wider the execution time difference between both solutions becomes. But what makes such a remarkable difference?

The snippets I shared above show time complexity calculations for both approaches using the big-O notation on each relevant line of code. Hence, you can see that my custom solution has a time complexity of O(n²), whereas the heap-based one has a time complexity of O(n).

What’s more, the time complexity of O(n) is mostly determined by the heap initialize instruction: heap.Init(h). That O(n) comes from the heap package documentation, but I’m pretty sure that behind the scenes, it is a fully optimized function that hardly ever reaches an O(n) behavior. Also, looking at the experimentation samples, each integer is in the (0, 1000] range, so when N ≥ 1000, we’ll have repeated values.

Surely, Go implements certain optimizations in its heap functions to performantly deal with collections of repeated values, as it does in the sort.Slice function by means of the pdqsort strategy.

If we ignore the heap.Init(h) statement, we see that the time complexity of the heap-based algorithm becomes O(log n), which is much faster than O(n). That’s strongly related to how heaps work and how they can be implemented using just ordinary arrays. Let’s take a look at the following image:

A bynary max heap graphical representation, and how can it be stored easily in an ordinary array, without the need of linked lists or pointers. In this example, 11 is the root value, its left child is 9, and its right child is 4. The left child of 9 is 7 and the right child of 9 is 8. The left child of 4 is 3 its right child is 1. The left child of 7 is 2 and its right child is 5, and finally, 8 has only a left child that is 6.
Since there is a formula to easily determine each node’s parent, left child, and right child, a heap can be represented in an array without needing linked lists or pointers.

Supposing we need to retrieve the highest integer of the collection, we know that being in the 0 index will be a direct access operation. Now, suppose we need to add the number 12 to this max heap. We’ll start appending a child to 8. Then we’ll switch 8 and 12, 9 and 12, and finally, 11 and 12. We could add the new item with only three compare and switch operations. If we see these operations in terms of indexes to traverse in an array, we get this:

-> Added new item in index 10 (added number 12 as child of 8)
-> Compare and switch with index 4 (switch 8 with 12)
-> Compare and switch with index 1(switch 9 with 12)
-> Compare and switch with index 0(switch 11 with 12)

We’re skipping two indexes from index 1 to index 4, and from index 4 to index 10, we’re skipping five indexes. That’s the magic trick with heaps: you’ll always traverse the collection much faster since you’re skipping indexes to compare and switch. Moreover, the bigger the collection is, the bigger the gaps between indexes to traverse are.

Undoubtedly, heaps are handy for specific scenarios where you may need to iterate through a collection’s highest or lowest values. However, they might not be suitable for many others. Always pay special attention to your context and requirements and ask yourself if you’re better off with a heap-based approach.

Go’s heap implementation seems faster and more performant, but this might not be the case for other technologies and providers. Carefully navigate the library’s docs and do some proofs of concept to assess their performance.

Hopefully, this article not only gave you a good idea of heaps and its common use-case scenarios. I hope it also encouraged you to learn and study different data structures that exist in programming and that were designed to solve specific issues like these.

Arrays and Sets are good but knowing advanced data structures and their usage could help you make a difference at work and definitely improve your company’s product.

That’s all for now. Thank you very much for your reading.

Stay connected for more experiments like this!



News Credit

%d bloggers like this: