Snapshot Isolation

Snapshot Isolation is a concurrency control technique used in databases. It was designed to provide a high degree of concurrency while minimizing the chances of conflicts among transactions.

The idea behind snapshot isolation is that each transaction works with a "snapshot" of the database at the start of the transaction. That is, it sees a consistent view of the data as it existed at the point in time when the transaction started. This means that:

  1. Reading: When a transaction reads data, it sees only the data that was in the database when the transaction began, even if other transactions commit changes to that data while the first transaction is still in progress. This eliminates the possibility of non-repeatable reads and dirty reads.

  2. Writing: If a transaction wants to modify some data, it does so on its snapshot. However, before the changes are committed, the system checks to see if any other transactions have modified the same data after the snapshot was taken. If they have (a condition known as a "write conflict"), the commit fails. This ensures that transactions do not overwrite each other's changes.

Snapshot Isolation is a very effective level of isolation for many applications, as it allows a high degree of concurrency while providing each transaction with a consistent view of the database. However, it's important to note that it doesn't completely prevent conflicts. In particular, snapshot isolation is susceptible to the phantom read problem, where new rows can appear in the results of a query if another transaction inserts new rows between the execution of the query and the transaction's commit.

Moreover, snapshot isolation may allow a phenomenon known as "write skew," which can happen when two transactions read the same data, make decisions based on that data, and then both try to update the data. In some databases, a stricter version of snapshot isolation called "Serializable Snapshot Isolation" (SSI) is used to prevent this.

Snapshot Isolation can be implemented in several ways. One common approach is through Multi-Version Concurrency Control (MVCC), where the database maintains multiple versions of each data item to support concurrent transactions.