Glossary / Notes


DATETIME VALUES BEYOND EXCEL'S LIMITS

Excel's in-cell DateTime type does not support time zone information nor fractional seconds beyond milliseconds.

By default, Snowflake DATETIME values that have time zone information or fractional seconds that exceed Excel's millisecond limit are rendered on-sheet with Excel's no-time-zone, lower precision DateTime type.

For example, a Snowflake DATETIME value is truncated on‑sheet like so:

2023‑03‑28T16:02:47.697233‑07:00
2023‑03‑28T16:02:47.697

And also:

Tue, 28 Mar 2023 17:55:15 GMT
Tue, 28 Mar 2023 17:55:15

Dates before Excel's cell-entry epoch (1900‑01‑01, or 1904‑01‑01 when using Excel's alternative 1904 date system) are rendered as #VALUE! errors.

This default behavior can be modified by-column, by-query, by-workbook or globally via the optimal(xl) menu:

optimal(xl)SettingsData type defaults

If you would like to see the complete DATETIME information in Excel, as a string, without visiting your settings, simply wrap the column in your SQL with Snowflake's to_varchar function and prefix the date/time format with an escaped apostrophe. This will land your data on-sheet as text.

For textual date/time values that are imported into Snowflake as TEXT or VARIANT columns, optimal(xl) recognizes more than 12000 date/time formats and cautiously converts them to on-sheet dates for you. Read more →


EXCEL NAMED RANGES

In an Excel spreadsheet, a cell or group of cells can be given a name which can then be used by formulas in place of the familiar A1 or R1C1 notation. E.g.,=model_start_date.

If you are new to the concept of named ranges, you can learn about their basic mechanics from Microsoft's documentation.

As an aside, there has been a long-running debate over the use of named ranges in Excel. The debate often erupts in the investment banking industry, where every year freshly minted MBA's begin New Banker Training and immediately become aghast when the trainers instruct them to never, ever, use named ranges in Excel formulas.

Not only have the trainees been explicitly taught to use named ranges in their MBA programs, but anyone can plainly see that =enterprise_value/ebitda is better in every way than =C13/C47. More expressive, more readable, easier to spot errors, etc.

The trainers argue that in an MBA program you build your models, submit them, and move on to the next assignment, so you don't get a view of the longer-term problems that arise from named ranges. What does =ebitda actually tell you? Is that the last quarter's actual number? Is it the current estimate of the next fiscal quarter? Is it a 12-month blended forward estimate?

And as you try to resolve those questions with =ebitda_est_next_fiscal_qtr you often end up with two more problems: an unambiguous name for you can unambiguously mean something else to someone else and as you update your model over time, if you forget to update your named range references (and you will forget), you have formulas that look right, but are wrong, e.g., =ebitda_est_next_fiscal_qtr now refers to an actual reported ebitda number, not next quarter's estimate.

C47, while not telling you much at all, is unambiguous. It doesn't look right or wrong. It can't mislead you. If you want to know what it is, go look at cell C47.

The new MBA's respond: "Wait a second, C47 could now point to the wrong data as well, and it could be mislabeled with a stale row or column header."

And the trainers reply, "Exactly, and when you go to see what C47 represents, rather than assuming it from the given name, you will have the context to recognize those errors and fix them."

Regardless of the debate over whether or not to use named ranges in formulas, named ranges are essential for both VBA code and optimal(xl)'s text-based in-cell SQL.

To write VBA like this invites disaster:

  ThisWorkbook.Worksheets("inp").Range("D7").Value = "YTD"

Unlike using a cell address literal in a formula, i.e., =D7, which will automatically update itself when you insert or delete rows or columns that move that cell, cell address literals within VBA code do not update themselves.

In this example, if you insert or delete a row on or above row 7, or a column on or before column D, your VBA code will now reference the wrong cell.

This is why it is critical to define names for any cells that will be referenced from VBA code:

  ThisWorkbook.Worksheets("inp").Range("period").Value = "YTD"

Now you can insert or delete rows/columns/cells and your period named range will adjust to the correct cell automatically. And if you delete cell D7 your VBA code will throw a noisy error rather than silently pointing to the incorrect cell.

This same reasoning explains why optimal(xl) requires text-based in-cell SQL to be parameterized with range names (or structured references) and not cell address literals. You can read more about that here →


EXCEL STRUCTURED REFERENCES

To understand structured references it's useful to understand the problem they solve.

The Problem:

In the old days, when you had variable-sized data coming into a sheet — from a query and/or a VBA macro — there were two common methods for referencing that data in on-sheet formulas.

  1. You could oversize formulas to reference a much larger range than necessary. I.e., if you expected a maximum of a couple hundred rows, you would write formulas that could handle much more than that:

    =SUM(B8:B1000)

    That way if the data grew or shrunk, your oversized formulas would still capture the full scope of data.

    There are several problems with this approach.

    1. If you change your model and bring in more data than expected — perhaps you adjust the periodicity from quarterly to daily or extend a lookback period from 5 years to 20 years — your formulas may silently miss the expanded data range if you are not careful to extend them.

    2. Referencing the last value in a column, a common need, becomes cumbersome and is often achieved by using performance-killing volatile functions like INDIRECT and OFFSET.

    3. Because your formula are referencing a much larger range of cells than necessary, Excel has to do unnecessary work during recalculations which will slow down your model, especially as it grows.

  2. You could use VBA to define precisely-sized names for your data columns as you write the dataset to a range.

    This solves all the problems of the former approach.

    1. Formulas like =SUM(B8:B1000) have their oversized address literal replaced with a VBA-defined named range, =SUM(col.pnl), such that they always reference the exact data range whether the dataset grows or shrinks.

    2. Referencing the last value in a column becomes simple:

      =INDEX(col.dates,ROWS(col.dates))

    3. And because formulas reference named ranges that resize to match the exact number of rows in the incoming dataset, Excel wastes no computation time processing empty buffer rows.

    The downside of this approach is that it requires considerable work and maintenance to get right.

    Consider a niave implementations like this:

      Dim rs As ADODB.Recordset
    '...
    rs.CursorLocation = ADODB.CursorLocationEnum.adUseClient
    '...
    Set sh = ThisWorkbook.Worksheets("trades")
    Set r = sh.Range("th.first")
    cntRows = r.Offset(1).CopyFromRecordset(rs)

    For i = 0 To rs.Fields.Count - 1
    sh.Names.Add "col." & rs.Fields(i).Name, r.Offset(1, i).Resize(cntRows)
    Next

    This code is extremely brittle; if you remove or rename columns in your SQL the formulas that referenced those respective named ranges won't error out, they'll now point to the wrong data. And this code will break as soon as you have database column names or aliases that don't conform to Excel's requirements for range name syntax.

    Tackling these issues with code that does a good job sanitizing & managing named ranges is a lot of work.

The Solution:

Option #2 above — using VBA to define names for your data columns — is a good solution but is hard to implement.

With structured references, and the data table object to which they apply, Excel implements this for us.

Instead of oversizing your formulas, =SUM(B8:B1000), or defining your own column names when landing data on sheet, =SUM(col.pnl), you can use Excel's special syntax — based on your column names — to reference individual data columns. E.g.,

Sum all the values in the pnl column:

=SUM(Table1[pnl])

Get the last value in the biz date column:

=INDEX(Table1[biz date],ROWS(Table1[biz date]))

For more details, see Microsoft's documentation for Structured References.


PRO PLAN SUPPORT DETAILS

As a Pro Plan user, you will receive the same support as triage level 3 from the Enterprise Plan support details. I.e., email support requests will typically get a response within 24 business hours. A 'business hour' is a New York City 9AM-5PM hour on any non-US-holiday weekday.


STABLE TEXT

Stable text is in-cell text which is impervious to Excel's aggressive auto-formatter. This is text that will not trigger an auto-format when you double-click the cell or when you put the cell into edit mode {F2} and then exit via the {Enter} key. This is achieved by prefixing, with an apostrophe, any in-cell text that Excel will try to format.

Most Data Analysts, even those using Excel on-the-side as a secondary tool, are aware of Excel's auto-formatting conventions which can often mangle cell data. What is less well known is that common workarounds, like formatting a column as Text, places your on-sheet data in a precarious state.

For example, when you use the Power Query UI to designate a column as 'Text', certain values can be unstable and booby-trapped whereby simply entering and exiting the cell changes the cell value.

Suppose these gene abbreviations have come in from Power Query as a 'Text' column:

SEPT1
MARCH1

Double-clicking these cells will mangle their text by auto-converting them to dates in the current year:

1-Sep
1-Mar

In contrast, optimal(xl) will land these values on sheet as stable text. It checks whether Excel will apply an auto-format, and if so, prevents it by prefixing the in-cell text with Excel's special text-designation character, the apostrophe.

If you're scrolling through your data and inadvertently double-click or enter/exit a cell with an 'investment conviction' score of 8‑9 your data won't transmogrify into 8/9/2024. With optimal(xl), the text you see in Snowflake is the same text you'll get in Excel.


THE EXCEL EPOCH

The Excel Epoch is often considered to be 1899‑12‑30. This is the value that is returned from this zero-date VBA statement:

VBA.Format(VBA.CDate(0),"yyyy-mm-dd")

However, this date can not be entered into a cell.

When you type 0 into a cell and format it as a date you will also not get a valid date, i.e.,

1/0/1900

This is why we refer to Excel's cell-entry epoch to indicate the first date that can actually be entered into a cell: 1900‑01‑01

Confusingly, Excel's epoch can be changed, via 'Options' → 'Advanced', on a per-workbook basis to 1904‑01‑01. Even though this setting is rarely used today, it is surprisingly common to find workbooks with this alternate date system in the wild. This is because the 1900‑01‑01 system has a known bug (it thinks 1900‑02‑29 is a valid date) which long ago motivated some Excel users to adopt the 1904‑01‑01 system. And we all know that Excel workbooks can, within companies, spawn descendants practically indefinitely.

These days, a 1904‑01‑01 workbook is often uncovered when Power Query brings data into the workbook and confoundingly adjusts all DATETIME values by four years. Power Query assumes the 1900‑01‑01 date system is in force regardless of the workbook's settings.

For example, if you have a DATETIME value in a Snowflake table, 2017‑12‑31, that value will land on-sheet as 2022‑01‑01.

This won't happen with optimal(xl); we check your workbook's date system before converting Snowflake DATETIME values into Excel date/times.