Saturday, July 11, 2020

MariaDB Temporal Data Tables

MariaDB now supports temporal data tables in the form of system-versioning tables (allowing you to query and operate on historic data), application-time periods (allow you to query and operate on a temporal range of data), and bitemporal tables (which combine both system-versioning and application-time periods).

System-Versioning Tables


System-versioned tables store the history of all changes, not only data which is currently valid. This allows data analysis for any point in time, auditing of changes and comparison of data from different points in time. Typical uses cases are:

  • Forensic analysis & legal requirements to store data for N years.
  • Data analytics (retrospective, trends etc.), e.g. to get your staff information as of one year ago.
  • Point-in-time recovery - recover a table state as of particular point in time.

System-versioned tables were first introduced in the SQL:2011 standard.

Creating a System-Versioned Table


The CREATE TABLE syntax has been extended to permit creating a system-versioned table. To be system-versioned, according to SQL:2011, a table must have two generated columns, a period, and a special table option clause:

CREATE TABLE t(
   x INT,
   start_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
   end_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
   PERIOD FOR SYSTEM_TIME(start_timestamp, end_timestamp)
) WITH SYSTEM VERSIONING;

In MariaDB one can also use a simplified syntax:

CREATE TABLE t (
   x INT
) WITH SYSTEM VERSIONING;

In the latter case no extra columns will be created and they won't clutter the output of, say, SELECT * FROM t. The versioning information will still be stored, and it can be accessed via the pseudo-columns ROW_START and ROW_END:

SELECT x, ROW_START, ROW_END FROM t;

Adding or Removing System Versioning To/From a Table

An existing table can be altered to enable system versioning for it.

CREATE TABLE t(
  x INT
);
ALTER TABLE t ADD SYSTEM VERSIONING;
SHOW CREATE TABLE t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `x` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING

Similarly, system versioning can be removed from a table:

ALTER TABLE t DROP SYSTEM VERSIONING;
SHOW CREATE TABLE t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `x` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

One can also add system versioning with all columns created explicitly:

ALTER TABLE t ADD COLUMN ts TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
              ADD COLUMN te TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
              ADD PERIOD FOR SYSTEM_TIME(ts, te),
              ADD SYSTEM VERSIONING;
SHOW CREATE TABLE t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `x` int(11) DEFAULT NULL,
  `ts` timestamp(6) GENERATED ALWAYS AS ROW START,
  `te` timestamp(6) GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (`ts`, `te`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING

Querying Historical Data

SELECT

To query the historical data one uses the clause FOR SYSTEM_TIME directly after the table name (before the table alias, if any). SQL:2011 provides three syntactic extensions:

  • AS OF is used to see the table as it was at a specific point in time in the past:
SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP'2016-10-09 08:07:06';
  • BETWEEN start AND end will show all rows that were visible at any point between two specified points in time. It works inclusively, a row visible exactly at start or exactly at end will be shown too.
SELECT * FROM t FOR SYSTEM_TIME BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW();
  • FROM start TO end will also show all rows that were visible at any point between two specified points in time, including start, but excluding end.
SELECT * FROM t FOR SYSTEM_TIME FROM '2016-01-01 00:00:00' TO '2017-01-01 00:00:00';

Additionally MariaDB implements a non-standard extension:

  • ALL will show all rows, historical and current.
SELECT * FROM t FOR SYSTEM_TIME ALL;

If the FOR SYSTEM_TIME clause is not used, the table will show the current data, as if one had specified FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP.

Views and Subqueries

When a system-versioned tables is used in a view or in a subquery in the from clause, FOR SYSTEM_TIME can be used directly in the view or subquery body, or (non-standard) applied to the whole view when it's being used in a SELECT:

CREATE VIEW v1 AS SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP'2016-10-09 08:07:06';

Or

CREATE VIEW v1 AS SELECT * FROM t;
SELECT * FROM v1 FOR SYSTEM_TIME AS OF TIMESTAMP'2016-10-09 08:07:06';

Use in Replication and Binary Logs

Tables that use system-versioning implicitly add the row_end column to the Primary Key. While this is generally not an issue for most use cases, it can lead to problems when re-applying write statements from the binary log or in replication environments, where a master retries an SQL statement on the slave.

Specifically, these writes include a value on the row_end column containing the timestamp from when the write was initially made. The re-occurrence of the Primary Key with the old system-versioning columns raises an error due to the duplication.

To mitigate this with MariaDB Replication, set the secure_timestamp system variable to YES on the slave. When set, the slave uses its own system clock when applying to the row log, meaning that the master can retry as many times as needed without causing a conflict. The retries generate new historical rows with new values for the row_start and row_end columns.

Transaction-Precise History in InnoDB

A point in time when a row was inserted or deleted does not necessarily mean that a change became visible at the same moment. With transactional tables, a row might have been inserted in a long transaction, and became visible hours after it was inserted.

For some applications — for example, when doing data analytics on one-year-old data — this distinction does not matter much. For others — forensic analysis — it might be crucial.

MariaDB supports transaction-precise history (only for the InnoDB storage engine) that allows seeing the data exactly as it would've been seen by a new connection doing a SELECT at the specified point in time — rows inserted before that point, but committed after will not be shown.

To use transaction-precise history, InnoDB needs to remember not timestamps, but transaction identifier per row. This is done by creating generated columns as BIGINT UNSIGNED, not TIMESTAMP(6):

CREATE TABLE t(
   x INT,
   start_trxid BIGINT UNSIGNED GENERATED ALWAYS AS ROW START,
   end_trxid BIGINT UNSIGNED GENERATED ALWAYS AS ROW END,
   PERIOD FOR SYSTEM_TIME(start_trxid, end_trxid)
) WITH SYSTEM VERSIONING;

These columns must be specified explicitly, but they can be made INVISIBLE to avoid cluttering SELECT * output.

When one uses transaction-precise history, one can optionally use transaction identifiers in the FOR SYSTEM_TIME clause:

SELECT * FROM t FOR SYSTEM_TIME AS OF TRANSACTION 12345;

This will show the data, exactly as it was seen by the transaction with the identifier 12345.

Storing the History Separately

When the history is stored together with the current data, it increases the size of the table, so current data queries — table scans and index searches — will take more time, because they will need to skip over historical data. If most queries on that table use only current data, it might make sense to store the history separately, to reduce the overhead from versioning.

This is done by partitioning the table by SYSTEM_TIME. Because of partition pruning optimization, all current data queries will only access one partition, the one that stores current data.

This example shows how to create such a partitioned table:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME (
    PARTITION p_hist HISTORY,
    PARTITION p_cur CURRENT
  );

In this example all history will be stored in the partition p_hist while all current data will be in the partition p_cur. The table must have exactly one current partition and at least one historical partition.

Partitioning by SYSTEM_TIME also supports automatic partition rotation. One can rotate historical partitions by time or by size. This example shows how to rotate partitions by size:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME LIMIT 100000 (
    PARTITION p0 HISTORY,
    PARTITION p1 HISTORY,
    PARTITION pcur CURRENT
  );

MariaDB will start writing history rows into partition p0, and when it reaches a size of 100000 rows, MariaDB will switch to partition p1. There are only two historical partitions, so when p1 overflows, MariaDB will issue a warning, but will continue writing into it.

Similarly, one can rotate partitions by time:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK (
    PARTITION p0 HISTORY,
    PARTITION p1 HISTORY,
    PARTITION p2 HISTORY,
    PARTITION pcur CURRENT
  );

This means that the history for the first week after the table was created will be stored in p0. The history for the second week — in p1, and all later history will go into p2. One can see the exact rotation time for each partition in the INFORMATION_SCHEMA.PARTITIONS table.

It is possible to combine partitioning by SYSTEM_TIME and subpartitions:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME
    SUBPARTITION BY KEY (x)
    SUBPARTITIONS 4 (
    PARTITION ph HISTORY,
    PARTITION pc CURRENT
  );

Removing Old History

Because it stores all the history, a system-versioned table might grow very large over time. There are many options to trim down the space and remove the old history.

One can completely drop the versioning from the table and add it back again, this will delete all the history:

ALTER TABLE t DROP SYSTEM VERSIONING;
ALTER TABLE t ADD SYSTEM VERSIONING;

It might be a rather time-consuming operation, though, as the table will need to be rebuilt, possibly twice (depending on the storage engine).

Another option would be to use partitioning and drop some of historical partitions:

ALTER TABLE t DROP PARTITION p0;

Note, that one cannot drop a current partition or the only historical partition.

And the third option; one can use a variant of the DELETE statement to prune the history:

DELETE HISTORY FROM t;

or only old history up to a specific point in time:

DELETE HISTORY FROM t BEFORE SYSTEM_TIME '2016-10-09 08:07:06';

or to a specific transaction (with BEFORE SYSTEM_TIME TRANSACTION xxx).

To protect the integrity of the history, this statement requires a special DELETE HISTORY privilege.

The TRUNCATE TABLE statement drops all historical records from a system-versioned-table.

Excluding Columns From Versioning

Another MariaDB extension allows to version only a subset of columns in a table. This is useful, for example, if you have a table with user information that should be versioned, but one column is, let's say, a login counter that is incremented often and is not interesting to version. Such a column can be excluded from versioning by declaring it WITHOUT VERSIONING

CREATE TABLE t (
   x INT,
   y INT WITHOUT SYSTEM VERSIONING
) WITH SYSTEM VERSIONING;

A column can also be declared WITH VERSIONING, that will automatically make the table versioned. The statement below is equivalent to the one above:

CREATE TABLE t (
   x INT WITH SYSTEM VERSIONING,
   y INT
);

Application-time Periods

MariaDB starting with 10.4.3

Support for application-time period-versioning was added in MariaDB 10.4.3.

Extending system-versioning tables, MariaDB 10.4 supports application-time period tables. Time periods are defined by a range between two temporal columns. The columns must be of the same temporal data type, such as DATE or TIMESTAMP, (though not TIME, which is currently unsupported), and of the same width.

Using time periods implicitly defines the two columns as NOT NULL. It also adds a constraint to check whether the first value is less than the second value. The constraint is invisible to SHOW CREATE TABLE statements. The name of this constraint is prefixed by the time period name, to avoid conflict with other constraints.

Creating Tables with Time Periods

To create a table with a time period, use a CREATE TABLE statement with the PERIOD table option.

CREATE TABLE test.t1(
   name VARCHAR(50), 
   date_1 DATE,
   date_2 DATE,
   PERIOD FOR date_period(date_1, date_2));

This creates a table with a time_period period and populates the table with some basic temporal values.

Examples are available in the MariaDB Server source code, at mysql-test/suite/period/r/create.result.

Adding and Removing Time Periods

The ALTER TABLE statement now supports syntax for adding and removing time periods from a table. To add a period, use the ADD PERIOD clause.

CREATE TABLE test.t2 (
   id INT PRIMARY KEY,
   time_1 TIMESTAMP(6),
   time_2 TIMESTAMP(6)
);


ALTER TABLE test.t2 ADD PERIOD FOR time_period(time_1, time_2);

To remove a period, use the DROP PERIOD clause:

ALTER TABLE test.t2 DROP PERIOD FOR time_period;

Both ADD PERIOD and DROP PERIOD clauses include an option to handle whether the period already exists:

ALTER TABLE test.t2 ADD PERIOD IF NOT EXISTS FOR time_period(time_1, time_2);

ALTER TABLE test.t2 DROP PERIOD IF EXISTS FOR time_period;

Deletion by Portion

In addition to updates, you can also remove rows that fall within certain time periods.

When MariaDB executes a DELETE FOR PORTION statement, it removes the row:

  • When the row period falls completely within the delete period, it removes the row.
  • When the row period overlaps the delete period, it shrinks the row, removing the overlap from the first or second row period value.
  • When the delete period falls completely within the row period, it splits the row into two rows. The first row runs from the starting row period to the starting delete period. The second runs from the ending delete period to the ending row period.

To test this, first populate the table with some data to operate on:

TRUNCATE test.t1;

INSERT INTO test.t1 (date_1, date_2)
VALUES ('1999-01-01', '2018-12-12'),
       ('1999-01-01', '2017-01-01'),
       ('2017-01-01', '2019-01-01'),
       ('1998-01-01', '2018-12-12'),
       ('1997-01-01', '2015-01-01'),
       ('2016-01-01', '2020-01-01'),
       ('2010-01-01', '2015-01-01');


SELECT * FROM test.t1;
+------+------------+------------+
| name | date_1     | date_2     |
+------+------------+------------+
| NULL | 1999-01-01 | 2018-12-12 |
| NULL | 1999-01-01 | 2017-01-01 |
| NULL | 2017-01-01 | 2019-01-01 |
| NULL | 1998-01-01 | 2018-12-12 |
| NULL | 1997-01-01 | 2015-01-01 |
| NULL | 2016-01-01 | 2020-01-01 |
| NULL | 2010-01-01 | 2015-01-01 |
+----+------+------------+------------+

Then, run the DELETE FOR PORTION statement:

DELETE FROM test.t1
FOR PORTION OF date_period
    FROM '2001-01-01' TO '2018-01-01';


SELECT * FROM test.t1;
+------+------------+------------+
| name | date_1     | date_2     |
+------+------------+------------+
| NULL | 1999-01-01 | 2001-01-01 |
| NULL | 1999-01-01 | 2001-01-01 |
| NULL | 2018-01-01 | 2019-01-01 |
| NULL | 1998-01-01 | 2001-01-01 |
| NULL | 1997-01-01 | 2001-01-01 |
| NULL | 2018-01-01 | 2020-01-01 |
| NULL | 2018-01-01 | 2018-12-12 |
| NULL | 2018-01-01 | 2018-12-12 |
+------+------------+------------+

Here, second row, with values ranging from 1999 to 2017 has been shrunk to 1999 to 2001. The first row, with a range from 1999 to 2018 split into two rows 1999 to 2000 and 2018-01 to 2018-12. The fifth row, ranging from 2010 to 2015 was deleted entirely.

The DELETE FOR PORTION statement has the following restrictions

  • The FROM...TO clause must be constant
  • Multi-delete is now supported

If there are DELETE or INSERT triggers, it works as following: any matched row is deleted, and then one or two rows are inserted. If the record is deleted completely, nothing is inserted.

Updating by Portion

The UPDATE syntax now supports UPDATE FOR PORTION, which modifies rows based on their occurrence in a range:

To test it, first populate the table with some data:

TRUNCATE test.t1;

INSERT INTO test.t1 (date_1, date_2)
VALUES ('1999-01-01', '2018-12-12'),
       ('1999-01-01', '2017-01-01'),
       ('2017-01-01', '2019-01-01'),
       ('1998-01-01', '2018-12-12'),
       ('1997-01-01', '2015-01-01'),
       ('2016-01-01', '2020-01-01'),
       ('2010-01-01', '2015-01-01');

Then run the update:

UPDATE test.t1 
FOR PORTION OF date_period
    FROM '2000-01-01' TO '2018-01-01'
SET name = "Generation Z";

SELECT * FROM test.t1;
+--------------+------------+------------+
 name         | date_1     | date_2     |
+--------------+------------+------------+
| Generation Z | 2000-01-01 | 2018-01-01 |
| Generation Z | 2000-01-01 | 2017-01-01 |
| Generation Z | 2017-01-01 | 2018-01-01 |
| Generation Z | 2000-01-01 | 2018-01-01 |
| Generation Z | 2000-01-01 | 2015-01-01 |
| Generation Z | 2016-01-01 | 2018-01-01 |
| Generation Z | 2010-01-01 | 2015-01-01 |
| NULL         | 1999-01-01 | 2000-01-01 |
| NULL         | 2018-01-01 | 2018-12-12 |
| NULL         | 1999-01-01 | 2000-01-01 |
| NULL         | 2018-01-01 | 2019-01-01 |
| NULL         | 1998-01-01 | 2000-01-01 |
| NULL         | 2018-01-01 | 2018-12-12 |
| NULL         | 1997-01-01 | 2000-01-01 |
| NULL         | 2018-01-01 | 2020-01-01 |
+--------------+------------+------------+

The UPDATE FOR PORTION statement has the following limitations:

  • Operation cannot modify the two temporal columns used by the time period
  • Operation cannot reference period values in the SET expression
  • FROM...TO expressions must be constant
  • Multi-delete is not supported

Using Bitemporal Tables

MariaDB 10.4 also supports bitemporal tables. These are tables that use versioning both at the system and application-time period levels. To create a bitemporal table, use

CREATE TABLE test.t3 (
   date_1 DATE,
   date_2 DATE,
   row_start TIMESTAMP(6) AS ROW START INVISIBLE,
   row_end TIMESTAMP(6) AS ROW END INVISIBLE,
   PERIOD FOR application_time(date_1, date_2),
   PERIOD FOR system_time(row_start, row_end))
WITH SYSTEM VERSIONING;

Note that, while system_time here is also a time period, it cannot be used in DELETE FOR PORTION or UPDATE FOR PORTION statements.

DELETE FROM test.t3 
FOR PORTION OF system_time 
    FROM '2000-01-01' TO '2018-01-01';
ERROR 42000: You have an error in your SQL syntax; check the manual that corresponds 
  to your MariaDB server version for the right syntax to use near
  'of system_time from '2000-01-01' to '2018-01-01'' at line 1

System Variables

There are a number of system variables related to system-versioned tables:

system_versioning_alter_history

  • Description: SQL:2011 does not allow ALTER TABLE on system-versioned tables. When this variable is set to ERROR, an attempt to alter a system-versioned table will result in an error. When this variable is set to KEEP, ALTER TABLE will be allowed, but the history will become incorrect — querying historical data will show the new table structure. This mode is still useful, for example, when adding new columns to a table.
  • Commandline: --system-versioning-alter-history=value
  • Scope: Global, Session
  • Dynamic: Yes
  • Type: Enum
  • Default Value: ERROR
  • Valid Values: ERROR, KEEP
  • Introduced: MariaDB 10.3.4

system_versioning_asof

  • Description: If set to a specific timestamp value, an implicit FOR SYSTEM_TIME AS OF clause will be applied to all queries. This is useful if one wants to do many queries for history at the specific point in time. Set it to DEFAULT to restore the default behavior. Has no effect on DML, so queries such as INSERT .. SELECT and REPLACE .. SELECT need to state AS OF explicitly.
  • Commandline: None
  • Scope: Global, Session
  • Dynamic: Yes
  • Type: Varchar
  • Default Value: DEFAULT
  • Introduced: MariaDB 10.3.4

system_versioning_innodb_algorithm_simple

  • Description: Never fully implemented and removed in the following release.
  • Commandline: --system-versioning-innodb-algorithm-simple[={0|1}]
  • Scope: Global, Session
  • Dynamic: Yes
  • Type: Boolean
  • Default Value: ON
  • Introduced: MariaDB 10.3.4
  • Removed: MariaDB 10.3.5

Limitations

  • Versioning clauses can not be applied to generated (virtual and persistent) columns.
  • mysqldump does not read historical rows from versioned tables, and so historical data will not be backed up. Also, a restore of the timestamps would not be possible as they cannot be defined by an insert/a user.

See Also



from Hacker News https://ift.tt/2CrqZl7

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.