SQream

Build a highly scalable cost optimization service

Headquarters

New York

Industry

Software Development

Technology Used

High performance, Analytics, Big data, Genome Research, Finance, IoT, Telecom, Ad Tech, GPU, Database, Data Science, GPU Database, and Financial Services, SQream DB.

SQream provides an analytics platform that minimizes Total Time to Insight (TTTI) for time-sensitive data, on-prem and on-the-cloud. Designed for tera-to-petascale data, the GPU-powered platform enables enterprises to rapidly ingest and analyze their growing data – providing full-picture visibility for improved customer experience, operational efficiency, and previously unobtainable business insights.

Impact

Challenges

SqreamDB is a distributed database and it is quite complex to model the costs of query execution.
Need of an industry-standard SQL parser/validator which can be enhanced to keep up the pace with the competitors.
The backend engine needs to execute queries efficiently on huge volumes of data.
A particular complex SQL can generate many query plans so the cost calcutaion of each and every query plan is expensive, hence there should be a balance between the query cost and cost of selecting the best plan.

Approach

Sqream DB has a Rule-based query optimizer, but it is impossible to correctly optimize the query without considering the data, data skewness, workloads, and the variety of data. The rule-based optimizer applies different rules to optimize the query, so to cover all the plans, more rules are needed, which results in many complex rules as well as we have to add hints to make those rules work. Still, the required query efficiency can not be achieved for complex queries. Hence, the idea of Cost-Based Optimization looks lucrative, but it is a hard problem to solve in distributed databases. Since the need was urgent, we narrowed it down to Calcite which has a lot of inbuilt rules, as well as a cost-based planner known as Volcano. We wanted to leverage as much functionality from Volcano to enable us to build a Cost-Based Optimizer for Sqream DB. Calcite offers you a framework to build your own rules as well the framework is highly customizable to support all the needs that we have.

Solution

The primary goals for us were to

To achieve these goals the following High-Level Architecture was proposed to build a highly scalable cost optimization service as described in the diagram below.To achieve these goals the following High-Level Architecture was proposed to build a highly scalable cost optimization service as described in the diagram below.

01
Parsing

The whole process starts with parsing. A query, to be understood by the database engine, must first be parsed using an SQL parser, which takes a string of characters and tries to deduce its syntactic structure in the form of a parse tree. It uses a set of syntax rules called language grammar, which defines how an SQL query must look to be considered valid and acceptable to the query engine.

For instance, a rule for parsing SQL SELECT statements might look like this:
select:

<SELECT> expressionList
[<FROM> table]
[<WHERE> condition]
[<GROUP> <BY> groupingList]
[<HAVING> condition]

It declares that a SELECT statement must start with the keyword SELECT, followed by a list of fields and/or expressions to select (could be only one), and optionally any, or all, of the following: a FROM clause specifying the source of data from which to do a select (i.e. a table or a subquery); a WHERE clause filtering selected rows based on a Boolean condition; a GROUP BY clause aggregating rows together based on some keys; and a HAVING clause filtering out some groups if a condition isn't satisfied.

02
Validation

The purpose of validation is to test the semantic correctness of the query, i.e. whether a query written by a user makes sense or not. Validation ensures every object mentioned in the query exists and every operation specified can be done. If it isn't so, the process halts with an explanatory error message.

Another important purpose of validation is to correctly identify to which exact object (field, table, function) any identifier named in the query really refers and also to assign correct data types to every field, row, or expression that therein occurs.

So how we do the validations:

03
Relational algebra

Relational algebra deals with abstract transformations over sets of data, such as.

Conversion to a Relational Tree:

AST(abstract syntax tree) is not convenient for query optimization because the semantics of its nodes are too complicated. It is much more convenient to perform query optimization on a tree of relational operators, defined by the RelNode subclasses, such as Scan, Project, Filter, Join, etc. We use SqlToRelConverter, another monstrous class of Apache Calcite, to convert the original AST into a relational tree.

04
Query optimization

Query optimization is a process in which the original query is transformed according to a set of rules into some other, equivalent query that is faster to run and/or requires less computational resources for its execution.

Optimization is a process of conversion of a relation tree to another relational tree. You may do rule-based optimization with heuristic or cost-based planners, HepPlanner and VolcanoPlanner respectively. You may also do any manual rewrite of the tree without a rule. Apache Calcite comes with several powerful rewriting tools, such as RelDecorrelator and RelFieldTrimme.

Results

Improved DB performance by at least 30% for all queries by providing much-optimized execution plans. For complex queries, the optimization was more than 300% as CBO removed all the redundant and repetitive operations from the SQL
Reduced the overall build and deployment time by more than 50% by separating these concerns from the DB engine
Enabled integration into the Scala ecosystem by writing Sbt tasks and plugins. This allowed them to leverage tools and frameworks to achieve high concurrency and fault tolerance

The result was a highly secured, performant, extensible compiler frontend service that successfully optimized all the queries and generate the corresponding physical tree.

I want to thank Knoldus for their great contribution in building the CBO for Sqream enabling us to successfully integrate calcite into our system.

Gill Cohen (Sqream Project Manager)

Explore latest Case Studies

Osttra is equipped with complete knowledge of new applications with… …
OSTTRA Read More »
Knoldus helped Amway management with robust, highly available, secure, and… …
Amway Read More »
The performance and user experience benefits have translated into amazing… …
Verizon Read More »

Services

High performance systems

Cloud Engineering

Data Engineering, Strategy and Analytics

Intelligence Driven Decisioning - AI/ML

Architecture Strategy, Audit & Academy

Accelerators

Platforms

KDP

KDSP

Products

Premon

Studio9

Tech Hub

Industries

Travel

Media and Publishing

Healthcare

Retail

Consumer Internet

Finance

Hi-tech & IoT

Insights

Case Studies

OS contributions

Knolx

Blogs

Books

Community

Resources

Webinars

SQream

Build a highly scalable cost optimization service

Headquarters

Industry

Technology Used

Impact

Challenges

Approach

Solution

01Parsing

02Validation

03Relational algebra

Conversion to a Relational Tree:

04Query optimization

Results

Explore latest Case Studies

Ready to gain a competitive advantage with Future Ready Emerging Technologies?

OUR OFFERINGS

COMPANY

LEARN

CONNECT

Follow us Here:

Awards & Recognitions

Partners

© 2023 Knoldus, Inc. All Rights Reserved.

Privacy Policy | Sitemap

01
Parsing

02
Validation

03
Relational algebra

04
Query optimization