How to Hold a Whole Matrix With One Tree

In 1989 the Russian Researcher Boris Ryabko has invented the Binary Index Tree Data Structure; Today this invention is utilized by large-scale systems to efficiently and correctly range sum matrices, as well as to challenge candidates in programming interviews. In the following post, I demystify this ingenious invention and interview the inventor about the inspiration for this creative idea.

Aviv Yaniv
Published in
7 min readMar 14, 2021

--

Even if you feel well prepared for your next interview after solving well over 5o questions on LeetCode on backtracking, dynamic programming, design, and graphs — this question is about to surprise you 😀

This post will add a new solving method for your toolbox, so first we shall start with a 1-dimensional version of the question and then continue with the 2-dimensional method, and finally, the inventor of this ingenious data-structure shares with us the background and motivation for this creative idea.

How to Range Sum a 1-Dimensional Mutable Array Efficiently

Our problem is LeetCode #307; given a numbers array we shall support three operations: initialization, updating a specific index with a value, and range summing subarray of given indices.

This question can be solved naively with O(n) time complexity for the range sum operation, but for real-life large-scale systems, it’s too much.

Can we do better?

Let’s first examine how we would range sum efficiently if the array was immutable, as in LeetCode #303;

We can avoid the reoccurring computation of subarray sum by keeping the accumulated sum of the array:

In the accumulated sum array the 0-index will contain 0 (to denote no elements have been summed), and from there on the i’th index holds the sum of the original subarray [0:i-1]:

So to calculate the range sum of the original subarray [i:j] we shall return:

This way we can answer the range sum query in O(1).

However, in the mutable version, there is no use for accumulated sum, as updates would invalidate it.

How can we maintain the accumulated sum in a way which would be fast both for update and querying?

Here the ingenious invention of the Binary Indexed Tree comes in handy!

Let’s create another array, which shall be called BIT, that should hold accumulated sum — but each index won't hold the whole subarray sum from the beginning — but just specific parts of it.

Which parts?

The Binary Indexed Tree would be 1-based, and its i’th index holds the sum of elements from the original array that are defined by the binary representation of that index.

How that's actually done?

The algorithm sounds abstract — so be confident you shall understand it with the detailed example is followed!

When we update a value in the original array, we propagate the change to Binary Indexed Tree:

  1. We calculate the difference between the previous value (initially 0) and the updated value- shall be called difference from now on— that’s the amount we have to add to the nodes of the Binary Indexed Tree.
  2. Set index_to_update to the index of the update.

3. While index_to_update is less than or equals to the array total size.

3.1. Add difference to Binary Indexed Tree at index_to_update index.

3.2. Add to index_to_updatethe Least Significant Bit value ofindex_to_update.

e.g.

Given an array of size 16, we shall update the 11'th item (1-based indexed) to the value of 7 — and reflect that change to the Binary Indexed Tree which holds the partial sums.

(1) The initial value is 0 (when adding an item), so the value to be propagated is diffrence=7–0=7.

(2) Set index_to_update = 11 (01011)

(3) index_to_update = 11 <= 16

(3.1) Add to Binary Indexed Tree at index index_to_update = 11 the diffrence=7

(3.2) The Least Significant Bit value of index_to_update = 11 (01011) is 1 (00001), so the new index_to_update=11+1=12

(3) index_to_update = 12 <= 16

(3.1) Add to Binary Indexed Tree at index index_to_update = 12 the diffrence=7

(3.2) The Least Significant Bit value of index_to_update = 12 (01100) is 4 (00100), so the new index_to_update=12+4=16

(3) index_to_update = 16 <= 16

(3.1) Add to Binary Indexed Tree at index index_to_update = 16 the diffrence=7

(3.2) The Least Significant Bit value of index_to_update = 16 (10000) is 16 (10000), so the new index_to_update=16+16=32

(3) index_to_update = 32 IS NOT <= 16 and we finish.

When we want the sum from the beginning till a specific end-index of the original array, we sum the Binary Indexed Tree in the following manner:

  1. Set index_to_sum to the index to the end-index of the range to sum.
  2. While index_to_sumis more than zero.

2.1. Add to sum the value of Binary Indexed Tree at index_to_sum

2.2. Subtract fromindex_to_sum the Least Significant Bit value ofindex_to_sum.

e.g.

Given an array of size 16, we shall sum the subarray from the beginning till the 11'th item (1-based indexed).

(1) Set index_to_sum = 11 (01011)

(2) index_to_sum = 11 > 0

(2.1) Add to sum the value of Binary Indexed Tree at index_to_sum = 11 (which holds the sum of the 11'th item in the original array), so currently:

subarray_sum = arr[11]

(2.2) Subtract fromindex_to_sum = 11 (01011)the Least Significant Bit value is 1 (00001), so the new index_to_sum=11-1=10

(2) index_to_sum = 10 > 0

(2.1) Add to sum the value of Binary Indexed Tree at index_to_sum = 10 (which holds the sum of the 9'th and 10'th items in the original array), so currently:

subarray_sum = 
arr[9]+arr[10]+
arr[11]

(2.2) Subtract fromindex_to_sum = 10 (01010)the Least Significant Bit value is 2 (00010), so the new index_to_sum=10-2=8

(2) index_to_sum = 8 > 0

(2.1) Add to sum the value of Binary Indexed Tree at index_to_sum = 8 (which holds the sum of all items till the 8'th item in the original array), so currently:

subarray_sum = arr[1]+arr[2]+arr[3]+arr[4]+arr[5]+arr[6]+arr[7]+arr[8]+
arr[9]+arr[10]+
arr[11]

(2.2) Subtract fromindex_to_sum = 8 (01000)the Least Significant Bit value is 8 (01000), so the new index_to_sum=8-8=0

(2) index_to_sum = 0 IS NOT > 0 and we finish.

If you are interested in how it looks like as a tree, this image is better than a 1024 words:

Source: HackerHearth

The formal definition is less intuitive, but for sake of completeness:

Each element whose index i is a power of 2 contains the sum of the first i elements. Elements whose indices are the sum of two (distinct) powers of 2 contain the sum of the elements since the preceding power of 2. In general, each element contains the sum of the values since its parent in the tree, and that parent is found by clearing the least-significant bit in the index.

You probably think, OK that's a clever idea, but how is it feasible to implement in an interview setting?!

Luckily the code can be written simply and elegantly, using the fact that we can advance up (when setting a value) or down (when summing a subarray) the Binary Indexed Tree by adding or subtracting the Least Significant Bit.

The Least Significant Bit can be easily obtained by this magical formula:

This formula is correct due to the fact that numbers in the computer are represented by Two’s complement, as the negative value is calculated by inverting the digits and adding one, so when ANDing the number with its negative — only the Least Significant Bit survives.

Voilà! The following code range sums a subarray in O(log n)!

How to Range Sum a 2-Dimensional Mutable Array Efficiently

Now that we have familiarized ourselves with Binary Indexed Tree, we can face the more complex problem of how to support range sum in a matrix.

This notorious Hard question is known as LeetCode #308;

Given a 2D matrix matrix, find the sum of the elements inside the rectangle defined by its upper left corner (row1, col1) and lower right corner (row2, col2).

The above rectangle (with the red border) is defined by (row1, col1) = (2, 1) and (row2, col2) = (4, 3), which contains sum = 8.

We shall build the Binary Indexed Tree as a 2-dimensional matrix, and each time advance both rowise and columnwise to update and the query:

How the Binary Indexed Tree was Born — Interview with the Inventor

The Binary Indexed Tree was invented back in 1989 by the Russian Researcher Boris Ryabko.

When I asked him about the inspiration for this data structure he recalled:

I have known about AVL tree and other search trees since 1970 (when I was a student) and have researched source coding (or data compression) since the late 1970s. So the binary indexed tree is based on this background. Indeed, it was an epiphany, as it is difficult to add something.

This is another example, of how two seemly unrelated domains are deeply connected and can be used to solved complex problems elegantly.

Enjoyed this article? Feel free to long-press the 👏 button below 😀

--

--

Aviv Yaniv

Senior Software Development Engineer 🖥️ Economist 📈 Beer Brewer 🍻 Photographer 📷 ~ “Curiosity is my drug”