There are two ways to use an Excel cell (or range of cells) as an input variable for an in-cell SQL statement.
Text-based:
Enter SQL into a cell, as text, and use a $
prefix to reference named ranges or structured references:
select * from $inp.tbl
where sym = $inp.symbol
Formula-based:
Use plain old Excel formulas:
="select * from " & A1 & "
where sym = '" & A2 & "'"
Don't confuse the dollar sign prefix, $
, used in the text-based approach ($inp.tbl
, $inp.symbol
) with absolute cell references in an Excel formula, e.g., $A$1
. Optimal(xl) borrows the $
prefix syntax from Snowflake to indicate SQL variables that are named ranges or structured references. This allows you to copy SQL back and forth between Snowflake's Web UI and an Excel cell without modifications, as described below.
Our customers typically favor the text-based approach as this allows them to use Snowflake's Web UI to write and edit SQL, and then copy it to Excel, rather than typing SQL directly into a cell or laboriously building SQL statements with concatenation formulas.
Both the formula-based and text-based approach benefit from optimal(xl)'s SQL inference engine which simplifies your SQL by automatically resolving unqualified table names into fully-qualified ones and adding omitted warehouse
, schema
, and database
details — as structured metadata — to the in-cell comment that accompanies your in-cell SQL.
Unlike formula-based SQL statements, the text-based approach also benefits from automatic SQL-formatting of $
‑prefixed range variables, as described below.
Suppose you have an input cell named start_date
:
6/16/2024 |
and text-based in-cell SQL with a where
clause that references that named range:
select sum(pnl) from perf
where dt >= $start_date
Optimal(xl) will format the input date variable into an ISO-8601 literal for you when passing the SQL to Snowflake:
select sum(pnl) from perf
where dt >= '2024-06-16'
In contrast, with a formula-based approach the formula itself must properly format cell inputs into SQL literals, i.e.,
="select sum(pnl) from perf where
dt >= " & TEXT(start_date,"'yyyy-mm-dd'")
Text-based SQL statements also make it easy to use a range of cells with a where
clause in
operator. For example, if you have a range of stock symbols named inp.syms
:
SNOW |
MSFT |
AMZN |
your in-cell SQL can simply be
select sym, sum(pnl) from perf
where sym in $inp.syms
group by sym
When passing this SQL to Snowflake, the in
operation will get resolved like this:
select sym, sum(pnl) from perf
where sym in ('SNOW','MSFT','AMZN')
group by sym
When building this SQL using the formula-based approach your in-cell SQL formula would have to construct the in
operation manually, i.e.,
="select sym, sum(pnl) from perf
where sym in ('" & TEXTJOIN("','",
TRUE,UNIQUE(inp.syms)) & "')"
Note that because of optimal(xl)'s SQL inference engine and client-side SQL parser, it knows the data types of your text-based SQL where
constraints and will format cell inputs into the correct SQL literal syntax depending on whether they are TEXT
, NUMBER
, DATETIME
, etc.
When your text-based query's multi-cell input is from another Excel Table, instead of defining range names you can use $
-prefixed structured references like this:
select sym, sum(pnl) from perf
where sym in $Table1[tickers]
group by sym
If the Table1
Excel Table comes from another optimal(xl) in-cell SQL query, optimal(xl) will track this dependency and ensure that the Table1
query is refreshed first, before the dependent query constructs its own SQL from the Table1[tickers]
column.
With the text-based approach, you can not use cell address literals to reference input variables. I.e., this SQL will not work:
select * from A1
where sym = B1
Even with the $
-prefix syntax, this SQL will also not work:
select * from $A1
where sym = $B1
Instead you must give the input cells names and use the $
-prefix to reference them:
select * from $inp.tbl
where sym = $inp.symbol
The ability to use cell address literals (A1
) rather than named ranges or structured references is a common request from our customers. Unfortunately we can't allow this because doing so would open up a dangerous avenue for silent errors. For example, if you had the in-cell query:
select * from perf where dt = A5
and you inserted and/or deleted a row on or above row 5, the A5
reference in your SQL would not update — it's just text.
It will remain
select * from perf where dt = A5
This is why, for text-based SQL, cell input parameters must be named. The names will update when you insert or delete rows or columns, i.e., when you name cell A5
as start_date
inserting a row above A5
will repoint the name start_date
towards cell A6
as you would expect.
Of course, with formula-based SQL, like any Excel formula, cell address literals will update themselves when rows or columns are inserted or deleted. So when you insert a row at the very top of a sheet, the SQL formula
="select * from " & A1 & "
where sym = '" & A2 & "'"
will become
="select * from " & A2 & "
where sym = '" & A3 & "'"