Our SQL Inference Engine

WHY?

Our design goal for optimal(xl) was for you to be able to copy a SQL select statement from Snowflake's Web UI into an Excel cell, press {ctrl + shift + Q}, and get your data, without any fuss.

Snowflake's Web UI allows you to set query context, through UI dropdowns or use directives in its SQL editor, e.g.,

use schema
snowflake_sample_data.tpcds_sf10tcl
select ... from catalog_sales

We needed a way to infer this context, so you could simply copy

select ... from catalog_sales

into your Excel sheet and get the data without having to edit your in-cell SQL to include fully-qualified table references, e.g.,

select ... from
snowflake_sample_data
.tpcds_sf10tcl.catalog_sales

Fortunately, because Snowflake makes elaborate metadata highly accessible, query context is typically easy to infer.


HOW IT WORKS

To make optimal(xl) a productive SQL playground, we interpret your in-cell SQL using a heavily-customized adaptation of Microsoft's TSql130Parser SQL parser. This enables us to infer warehouse, database and schema context so that you don't have to state it explicitly or fiddle with connection settings like you would with Power Query or VBA.

To aid our inferences, we use Snowflake metadata queries like show databases. These metadata queries are carefully crafted and cached to consume a negligible amount of compute — and are tagged so that you can easily confirm this to be the case, e.g.,

select * from
table(information_schema.query_history())
where QUERY_TAG = 'oxl_inference_engine'

We never issue metadata queries unless you already have a real query in-flight, i.e., optimal(xl) won't awaken a suspended warehouse by itself or build a metadata cache in the background.

These metadata queries are only necessary the first time your query is run or when you edit your SQL to include a new, unqualified table reference.

If your in-cell SQL is already fully qualified, optimal(xl)'s SQL interpreter will recognize this and skip metadata queries.

Importantly, optimal(xl) never guesses warehouse/database/schema context. If it can't uniquely resolve your unqualified table references into fully-qualified ones, you will be prompted — the first time you run your query — with a list of possibilities from which to select.

We add all inferred information to the structured metadata that accompanies your in-cell SQL as a cell comment.


MORE DETAILS

Are you joining tables that are in different databases or schemas? No problem. Optimal(xl) will resolve this in-cell SQL:

select *
from schema_a.tbl_a as t1
join schema_b.tbl_b as t2
on t1.sym = t2.sym

into

select *
from database_a.schema_a.tbl_1 as t1
join database_b.schema_b.tbl_2 as t2
on t1.sym = t2.sym

and resolve completely unqualified table names as well:

select *
from tbl_1 as t1
join tbl_2 as t2
on t1.sym = t2.sym

into

select *
from database_a.schema_a.tbl_1 as t1
join database_b.schema_b.tbl_2 as t2
on t1.sym = t2.sym

Optimal(xl) will resolve unqualified table references wherever it finds them, whether in a normal FROM clause or a column expression subquery or a WHERE clause subquery.

If you're using SQL variables, in Snowflake's Web UI, to make your table name dynamic, e.g.,

set tbl_1 = 'scenario_net_mv_0'
select * from TABLE($tbl_1)

and are implicitly using the current database and current schema rather than using a fully-qualified table reference, you can set a 'tbl_1' named range in Excel, enter scenario_net_mv_0 into that cell and then copy your select statement into another cell, e.g.,

scenario_net_mv_0
select * from TABLE($tbl_1)

When you run this query (ctrl + shift + Q), optimal(xl) will fully resolve $tbl_1. For example, the SQL issued to Snowflake would look like:

select * from
TABLE(
'database_a.schema_a.scenario_net_mv_0')

As we mentioned above, if we can't uniquely solve for a fully-qualified table reference, we'll pop up a listbox, the first time you run the query, with potential warehouse, database and schema matches for you to select.