Thursday, April 1, 2021

Spreadsheet is a software development paradigm

Spreadsheet is a software development paradigm

Spreadsheet tools like Excel are often used to create simple (and not so simple) business applications. This poor man’s IDE, database and BI tool is actually very common in big enterprises, so the price is probably not the only reason to use spreadsheets.

In this post, I will list some pros and cons of using Excel for software development compared to the traditional approach of using a general purpose programming language. I will also introduce CubeWeaver, a novel spreadsheet application which mitigates the drawbacks of modern spreadsheets, like error-proneness and code duplication. CubeWeaver can be used as a standalone multi-user spreadsheet application or as a compiler, which produces fully functional Excel models.

Is it possible to develop software with Excel?

A typical computer program gathers input data from a user, processes it and presents the results back to the user. A typical spreadsheet model has input sheets to gather the data, intermediate sheets which do the calculations and output sheets for presentation of results.

There are Turing machine examples based only on Excel formulas, which means that any algorithm that can be implemented using a computer can also be implemented in Excel. Practical convenience and efficiency of such an implementation may vary from one task to another.

Spreadsheets and programming languages can both be used to create software. Let us compare these two software development approaches:

Advantages of spreadsheets

  1. Intuitive and easy to use: Spreadsheets are designed to be used by non-professionals. An end user can add a small feature to a spreadsheet model in a matter of minutes, instead of opening a change request and waiting for months for the IT department to understand and implement it. This can be a game-changer in many situations.
  2. A transparent and static state is easy to debug: you typically have to use breakpoints and step through a program to find a bug. It is hard, because the state of a program changes after each line of code. The state of a spreadsheet is more transparent and does not change after the computation has finished. Just look for the first cell with a wrong value to find a bug.
  3. Reactive and declarative computations: we just define formulas and enter input values, and the system recomputes dependent cells in the correct order all by itself. This concept, which has recently become popular in UI development, has been the foundation of spreadsheets from the very beginning.
  4. Standardized user interface: many users are just used to spreadsheets. Accountants in particular seem to think in spreadsheet terms. Even custom Java-based software often looks like a bad Excel clone in the accounting departments.

Disadvantages of spreadsheets

  1. Weakly typed and semi-structured data model: Spreadsheets use a three-dimensional array of untyped cells as a data model. This is better than unstructured text, but worse than a normalized and strongly typed relational database structure. A user can always save a string into a numeric cell. There is no distinction between header and value cells. Dependencies between tables are stored in a non-obvious way inside the parameters of the VLOOKUP function (and break if a column is added or removed).
  2. High code duplication: the DRY (Don’t repeat yourself) principle has proven itself well among programmers. Programming languages offer lots of ways (like loops, functions or generics) to reduce code duplication. Spreadsheet formulas are just copied from cell to cell. This sounds convenient and easy at first, but such models become very hard to support with time.
  3. Bad scalability: large models just stop working at some point. The document becomes very slow. A file shared over a network share is locked most of the time when many users try to access it simultaneously.
  4. Lack of interface interactivity: spreadsheets do not let you change the way data is displayed. It is not possible to create pre-programmed logic, triggered by pressing a button, for example.

How does CubeWeaver solve the issues?

Here is a list of innovations that address the disadvantages, while trying to maintain the benefits of spreadsheets:

Multidimensional data model

The multidimensional data model is widely used in Business Intelligence and OLAP systems. In short, the data is stored inside the cells of a multidimensional cube, with edges labeled by names of business objects:

The application does not display the entire multidimensional cube, but a 2D slice corresponding to the selected combination of filters. It works the same way as Excel’s pivot tables.

Relational BI systems often use the snowflake schema to store multidimensional data. Cubes are implemented by fact tables, and edge labels along with some additional data are stored in dimension tables. In CubeWeaver, cubes are called worksheets, and labels on the edges of cubes are called list items.

The multidimensional data model brings structure to a spreadsheet. The list items are in fact standard relational tables with strongly typed columns and referential integrity. CubeWeaver worksheet cells are also strongly typed. A user cannot save a string into a numeric cell.

Stable cell references

A cell of a multidimensional worksheet is uniquely referenced by a combination of list items along each dimension of the worksheet. The value 935, in the image above, is uniquely referenced by: Bikes, 2020, Paris.

In CubeWeaver, list items have an ID and a Caption field. Cell references use the IDs, and the cell reference above might look like this: [PROD: 23, YEAR: 2020, CITY: 24]

Such references are much more stable than ordinal spreadsheet references like F783. A unique new ID is generated when a new list item is created. This means that the ID 23 will still reference “Bikes”, even if a new product is inserted before it.

Relational operations

Spreadsheets make use of the VLOOKUP function to perform simple join operations between tables. As already mentioned before, the VLOOKUP joins are fragile and intransparent.

CubeWeaver supports explicit relations between list items. An item can reference other items via reference attributes. It works like foreign key references between relational tables. A list of cities could have a “COUNTRY” attribute, which points to an item from the country list. A user would see a dropdown for each row of the city list:

Formulas can navigate such relations in both directions using the join() function.

The area of ​​effect of a formula

Each CubeWeaver formula specifies a part of a worksheet where the formula is applied. The area of effect is specified as a set of items for each dimension of the worksheet.

The formula from the example below is applied to the measure “Units sold”, to all years but the year “2021” and to all products. The resulting area is highlighted in blue on the worksheet:

The “area of ​​effect” concept eliminates the necessity to copy formulas. The formulas will be automatically applied to new list items as those are added to a list.

Interactivity through formulas

Apart from the standard cell formats (like numbers, dates and text), CubeWeaver also supports some interactive cell formats, like buttons, checkboxes and dropdowns.

CubeWeaver formulas are also more than just a way to compute a value of a cell. Formulas can be used to dynamically change other properties of a cell, like formatting, color and cell access level (hidden, read-only, editable).

Thus, for example, a checkbox or picklist in one sheet can show, hide, or lock cells in another sheet.

Cell buttons can be used to create fairly complex operations with cell values. Each button has a set of update operations attached to it. Each update consists of the destination cell range and a formula, which is executed one time for each of the destination cells.

Such buttons can be used:

  • To copy values from one list item to another (f.e. from previous to the next year)
  • To increase or decrease a set of cell by a certain percentage
  • To model one-way actions, like marking some sheets as “submitted”

A spreadsheet compiler

Excel is a standard business modelling and reporting format. Most banks, for example, will not accept a business model in any other format than Excel. As a consequence, it’s popularity has also become it’s curse. Unlike programming languages, spreadsheets have not changed much in the last 30 years (the principles by which modern spreadsheets work were introduced by VisiCalc in 1979 and Boeing Calc in 1985).

A Screenshot from VisiCalc 1979

Programming languages could evolve so fast, because a compiler translates them to machine code (like x86), which, in turn, is an industry standard and does not change much over time. Much like a compiler, CubeWeaver can translate a multidimensional computation model into a plain spreadsheet file. A user can specify the order and orientation of the dimensions by creating an export view for each of the worksheets. After that, the .XLSX export dialog can be used to export data and formulas to a standard Excel workbook.

Motivation behind the project

Lotus Improv (released in 1991, discontinued in 1996) was the first multidimensional spreadsheet. Nowadays, applications with a similar concept are called corporate performance management software. The tools I have worked with are relatively complex and expensive, making them not affordable for smaller companies and projects. Apart from that, most of such tools are specialized for certain tasks, like budgeting or financial forecasting. CubeWeaver is an attempt to create a simple, affordable and user friendly general purpose tool for multidimensional modelling. Below is a list of current design goals and non-goals for CubeWeaver:

  • Simplicity of the framework: a small number of powerful modelling concepts.
  • Scalability: support for large and complex models. Fast, in-memory, .NET based computation engine.
  • Long-term maintainability of the model: features, which help to avoid excessive repetitions inside the model. Tracing of formula dependencies (trace precedents/dependents). Ways to add documentation to formulas, like comments or descriptions.
  • General purpose tool: no functionality, which is special only to certain application areas. All special functionality can be modeled inside the system.
  • Extensibility: developers can add features using a REST API, user defined functions and possibly at a later point via add-in interfaces.
  • Availability: the software and documentation is accessible without registration.
  • Documentation: features are well documented. Example models and best practice recommendations are available.

Further information

Additional information about the project is available at https://cubeweaver.com/.



from Hacker News https://ift.tt/2OcU8Xs

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.