Index Scan: Utilizing Rockset’s Search Index to Velocity up Vary Scans Over a Particular Subject

Index Scan: Utilizing Rockset’s Search Index to Velocity up Vary Scans Over a Particular Subject
Index Scan: Utilizing Rockset’s Search Index to Velocity up Vary Scans Over a Particular Subject


Lately, InfoWorld’s Martin Heller described Rockset as a “one-of-a-kind database for operational analytics.” After testing Rockset with quite a lot of queries on a big assortment, Heller rated Rockset 4.5 out of 5 stars. Heller’s evaluate of Rockset might be discovered here.

Solely one of many take a look at queries timed out:

SELECT * FROM commons."twitter-firehose"
ORDER BY "twitter-firehose".favorite_count DESC LIMIT 10

For context, twitter-firehose is one in every of Rockset’s demo collections. It incorporates 30 million paperwork and represents Twitter posts over one month. Heller defined that the question’s timeout was unavoidable as a result of, “there isn’t any technique to make that question run quick with Rockset’s converged index, or any indexing scheme I can consider: It requires a full scan and a world type.“

However what if I instructed you that there IS a technique to make this question run quick? ? Flummoxed? Preserve studying to learn to greatest leverage Rockset’s Converged Indexing and make queries run sooner.

Behold, Index Scan

Should you don’t already understand how Converged Indexing works, take a look at our earlier weblog, Converged Indexing: The Secret Sauce Behind Rockset’s Fast Queries. At a excessive degree, Rockset indexes every column of each doc in a number of methods. To hurry up this explicit question, the reply lies in how Rockset makes use of the search index.


search index

The search index is optimized for finding which paperwork comprise subject foo with worth bar. For the reason that search index is sorted by (subject, worth), then it’s also optimized for vary scans over a selected subject. A spread scan is when index entries are learn sequentially, starting at some begin worth and ending at some finish worth.

Rockset’s Index Scan entry path is particularly designed for performing vary scans over the search index. Utilizing Index Scan, we will simply discover the paperwork the place subject foo has values inside a sure vary. Naturally, we will additionally use Index Scan to scan over all values of subject foo in ascending or descending order; this is identical as scanning over the vary [MIN, MAX].


index scan

Bringing the main target again to our unique question, we will drastically pace up the question by utilizing Index Scan to fetch the values for the sphere favorite_count in descending order. This fashion, we are going to solely must fetch 10 rows from the gathering twitter-firehose, as an alternative of performing a full scan of the gathering and doing a world type. With the doc IDs retrieved by Index Scan, we will fetch the values for the remaining fields within the assortment from both the row-based retailer or column-based retailer to finish the question consequence set. Thus, utilizing the Index Scan entry path minimizes the quantity of information fetched from the index, which dramatically cuts the question latency. Question compute price can be lowered on account of making use of Index Scan to this question, which is fairly superior.


query execution

Unlocking the Index Scan Magic

To unlock the Index Scan magic, we have to add a touch to the question so it turns into:

SELECT * FROM commons."twitter-firehose"
ORDER BY "twitter-firehose".favorite_count DESC LIMIT 10
HINT(access_path=index_scan, index_scan_sort_field=favorite_count)

Index Scan is comparatively new and has not been included into Rockset’s question optimizer but. So for now, we should specify by way of question trace that we need to use the Index Scan entry path, and we should specify the sphere that Index Scan will scan over. Sooner or later, the question optimizer will determine the optimum entry path to make use of, so this question trace is not going to be mandatory.

What Sorts of Queries Profit from Index Scan?

Basically, any question that should do a variety scan over a selected subject can profit from utilizing Index Scan.

For instance, contemplate operating an ecommerce website. On the web page the place the gadgets being bought are displayed, customers can select to use viewing filters. A typical filter is to permit customers to view the gadgets in both ascending or descending order of worth. On this case, a question utilizing Index Scan can shortly fetch the wanted outcomes:

SELECT item_name, worth FROM gadgets
ORDER BY worth ASC -- or DESC
LIMIT 30
HINT(access_path=index_scan, index_scan_sort_field=worth)

One other instance might be discovered within the backend of an IoT software. Wearable health gadgets monitor an individual’s coronary heart fee, variety of steps taken, and different helpful metrics for assessing an individual’s bodily exercise over time. Figuring out patterns in these metrics can present insights into methods to enhance an individual’s health routine. One helpful sample could be to establish the instances of day when an individual works out (has increased coronary heart fee).

SELECT time, heart_rate FROM gadgets
WHERE heart_rate > 72 -- resting coronary heart fee
HINT(access_path=index_scan, index_scan_sort_field=heart_rate)

To reiterate, Rockset’s Index Scan entry path can dramatically lower question latency and compute prices for queries doing vary scans over a selected subject. That is simply one of many many enhancements we now have been making to Rockset’s question engine, so keep tuned for extra to return!



Leave a Reply

Your email address will not be published. Required fields are marked *