Using Excel Ranges as SQL Variables

TWO APPROACHES

There are two ways to use an Excel cell (or range of cells) as an input variable for an in-cell SQL statement.

  1. Text-based:

    Enter SQL into a cell, as text, and use a $ prefix to reference named ranges or structured references:

    select * from $inp.tbl
    where sym = $inp.symbol
  2. Formula-based:

    Use plain old Excel formulas:

    ="select * from " & A1 & "
    where sym = '" & A2 & "'"

Don't confuse the dollar sign prefix, $, used in the text-based approach ($inp.tbl, $inp.symbol) with absolute cell references in an Excel formula, e.g., $A$1. Optimal(xl) borrows the $ prefix syntax from Snowflake to indicate SQL variables that are named ranges or structured references. This allows you to copy SQL back and forth between Snowflake's Web UI and an Excel cell without modifications, as described below.

Our customers typically favor the text-based approach as this allows them to use Snowflake's Web UI to write and edit SQL, and then copy it to Excel, rather than typing SQL directly into a cell or laboriously building SQL statements with concatenation formulas.

Both the formula-based and text-based approach benefit from optimal(xl)'s SQL inference engine which simplifies your SQL by automatically resolving unqualified table names into fully-qualified ones and adding omitted warehouse, schema, and database details — as structured metadata — to the in-cell comment that accompanies your in-cell SQL.

Unlike formula-based SQL statements, the text-based approach also benefits from automatic SQL-formatting of $‑prefixed range variables, as described below.


SINGLE-CELL SQL VARIABLES

Suppose you have an input cell named start_date:

6/16/2024

and text-based in-cell SQL with a where clause that references that named range:

select sum(pnl) from perf
where dt >= $start_date

Optimal(xl) will format the input date variable into an ISO-8601 literal for you when passing the SQL to Snowflake:

select sum(pnl) from perf
where dt >= '2024-06-16'

In contrast, with a formula-based approach the formula itself must properly format cell inputs into SQL literals, i.e.,

="select sum(pnl) from perf where
dt >= " & TEXT(start_date,"'yyyy-mm-dd'")

MULTI-CELL SQL VARIABLES

Text-based SQL statements also make it easy to use a range of cells with a where clause in operator. For example, if you have a range of stock symbols named inp.syms:

SNOW
MSFT
AMZN

your in-cell SQL can simply be

select sym, sum(pnl) from perf
where sym in $inp.syms
group by sym

When passing this SQL to Snowflake, the in operation will get resolved like this:

select sym, sum(pnl) from perf
where sym in ('SNOW','MSFT','AMZN')
group by sym

When building this SQL using the formula-based approach your in-cell SQL formula would have to construct the in operation manually, i.e.,

="select sym, sum(pnl) from perf
where sym in ('" & TEXTJOIN("','",
TRUE,UNIQUE(inp.syms)) & "')"

Note that because of optimal(xl)'s SQL inference engine and client-side SQL parser, it knows the data types of your text-based SQL where constraints and will format cell inputs into the correct SQL literal syntax depending on whether they are TEXT, NUMBER, DATETIME, etc.

When your text-based query's multi-cell input is from another Excel Table, instead of defining range names you can use $-prefixed structured references like this:

select sym, sum(pnl) from perf
where sym in $Table1[tickers]
group by sym

If the Table1 Excel Table comes from another optimal(xl) in-cell SQL query, optimal(xl) will track this dependency and ensure that the Table1 query is refreshed first, before the dependent query constructs its own SQL from the Table1[tickers] column.


TEXT-BASED A1-STYLE REFERENCES NOT ALLOWED

With the text-based approach, you can not use cell address literals to reference input variables. I.e., this SQL will not work:

select * from A1
where sym = B1

Even with the $-prefix syntax, this SQL will also not work:

select * from $A1
where sym = $B1

Instead you must give the input cells names and use the $-prefix to reference them:

select * from $inp.tbl
where sym = $inp.symbol

The ability to use cell address literals (A1) rather than named ranges or structured references is a common request from our customers. Unfortunately we can't allow this because doing so would open up a dangerous avenue for silent errors. For example, if you had the in-cell query:

select * from perf where dt = A5

and you inserted and/or deleted a row on or above row 5, the A5 reference in your SQL would not update — it's just text.

It will remain

select * from perf where dt = A5

and therefore now refer to the wrong cell. In the worse case, if that wrong cell also happens to contain a date, your data refresh will appear to work but bring in a different dataset than you expect.

This is why, for text-based SQL, cell input parameters must be named. The names will update when you insert or delete rows or columns, i.e., when you name cell A5 as start_date inserting a row above A5 will repoint the name start_date towards cell A6 as you would expect.

Of course, with formula-based SQL, like any Excel formula, cell address literals will update themselves when rows or columns are inserted or deleted. So when you insert a row at the very top of a sheet, the SQL formula

="select * from " & A1 & "
where sym = '" & A2 & "'"

will become

="select * from " & A2 & "
where sym = '" & A3 & "'"