I recently developed an API to query Yelp restaurant dataset for the city of Toronto. It took the input as the business ID of a restaurant and returned the best closest restaurants to it. The dataset had more than 20,000 restaurants and comparing to each one of them would have been idiotic. Yes, I did that too and the query returned result in more than 60 seconds! So instead, I built up a QuadTree to query in a specific region, and guess what — it resulted in a much smaller time, only 20ms!
QuadTrees — What are they?
It is a data structure that is used to organize data into quadrants, where each node has exactly 4 children. Each node is usually a rectangular region, although it can be of any shape, and it stores one point.
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a “unit of interesting spatial information”.From Wikipedia – https://en.wikipedia.org/wiki/Quadtree
How it helps is that you can define a region of space and store (x,y) coordinates in that. Whenever the region reaches its maximum capacity for storing data, it is divided into four quadrants. Thus, there will be more subdivision in the areas where there are more points and less where there are less point of interests.
Now the interesting part is that, the time to search a specific point in a Quadree is O(logN). Imagine the impact it will have when you have to search in thousands or even millions of coordinates to find one!
How to build a QuadTree
Let us first understand the algorithm behind it.
You first have an empty cell like this
Now each cell has a capacity of 1. This means that if you insert more than 2 elements in the cell, it will self-divide into 4 cells.
Now how do you select which cell to insert a point in? It is based on the co-ordinates of the point in consideration. So if the point has a co-ordinate (20, 60), it will be inserted to top-left sub-cell. If the co-ordinates are (105, 110), it will be discarded as it does not lie in the span of the given cell.
So a rough algorithm is-
Given a point with co-ordinates (x,y) and a cell
1. Check if (x,y) lies in the limits of the cell. If not, discard else proceed.
2. If the current cell is empty, insert the point and return.
3. Else if it is not empty, check if it has any children. If the cell does not have any children, then subdivide it into 4 sub-cells.
4. Repeat recursively for each of the sub-cell.
You can have two types of searching operations in a quadtree – finding a point, or find all points given in a search cell. A search cell is a cell you define and is not related to the given quad tree.
Algorithm to search for a given point (x,y)
1. Check if the point lies in the boundary of current cell. If not, return. Else proceed.
2. If the cell has children, recurse for all its children.
3. Else, return the current cell.