Directory structure

This directory structure applies to version 20160314 and above

Store directory structure

<storeName>
  |
  |--- config.xml
  |--- sha1sum.txt
  |
  |--- sstable
  |       |
  |       |--- p-<period begin VT (0 padded)>_<yyyy-MM-	dd>-<a/r>
  |       |       |
  |       |       |--- <level (0 padded)>-<nb (0 padded)>-<version (no version if == 0)>
  |       |       |       |
  |       |       |       |--- blob.bin
  |       |       |       |--- data.bin
  |       |       |       |--- index.bin
  |       |       |       |--- sha1sum.txt
  |       |       .
  |       |       .
  |       .
  |       .
  |       .
  |
  |--- checkpoint
  |       |
  |       |--- <tt (0 padded)>
  |       |       |
  |       |       |--- alive.bin
  |       |       |--- amemtable.bin
  |       |       |--- rmemtable.bin
  |       |       |--- sstable.bin
  |       |       |--- sstablenumbers.txt
  |       |       |--- filelist.txt
  |       |       |--- catalog.xml
  |       |       |--- sha1sum.txt
  |       |       |--- (locked)
  |       .
  |       .
  |       .
  |
  |--- orphaned
	  |--- sstable
	  |       |
	  |       |--- p-<period begin VT (0 padded)>_<yyyyMMdd>-<a/r>
	  |	  |       |
	  |	  |       .
	  |	  .       .
	  |	  .       
	  |
	  |--- checkpoint
	          |
	          |--- <tt (0 padded)>
	          |       |
	          |       |--- alive.bin
	   	  |       .
	   	  .       .
	   	  .       

Sha1sum files

These files verify file integrity with standard tools. They are used by the embedded database when it starts a store to check for corrupted files.
See Sha1sum (wikipedia) and Md5sum (wikipedia) for file structure information.

When a store is opened, all checkpoints files are verified for corruption. If a corrupted file is found in a checkpoint directory, the checkpoint is discarded (and the directory is moved to the orphaned directory).

Sstable level directories

The contents of level directories are read-only.
Flush and merge operations may only create new level directories with adequate content.
A SSTable garbage collection operation may only delete level directories and eventually, period directories if they no longer contain any level directories themselves.

Checkpoint directories

The contents of theses directory are read-only, except for the locked file that can be present or not. If it exists, it is an empty file.
Checkpoint operations may only create new checkpoint directories with updated content.

The sstable.bin file contains the list of required sstable for this checkpoint. Its format is internal to the embedded database.

The sstablenumbers.txt contains the last used sstable-number for each level in each period. This is done in order to have always incremental sstable-numbers despite sstables that can be removed (due to merge or purge).

The filelist.txt  file contains the list of every file of the store that is required to query and check the integrity of the store at the state of this checkpoint. It can be viewed as an external representation of the sstable.bin file that can be used by backup scripts.
Each file name is written on a line, with a relative path from the <storeName> directory. It contains the following files:

  • config.xml
  • sha1sum.txt
  • for every needed sstable/p-<period begin VT (0 padded)>_<yyyy-MM-dd>-<a/r>/<level (0 padded)>-<nb (0 padded)>-<version (no version if == 0)>, the files:
    • blob.bin
    • data.bin
    • index.bin
    • sha1sum.txt
  • the current checkpoint files checkpoint/<current checkpoint tt>:
    • alive.bin
    • amemtable.bin
    • rmemtable.bin
    • sstable.bin
    • catalog.xml
    • sha1sum.txt
    • sstablenumbers.txt

Orphaned directory

This directory contains all invalid/corrupted files found by the embedded database on start-up. To avoid losing data, the embedded database does not delete files at start-up, it only moves files into this special directory.
The structure of the orphaned directory is similar to the structure of the store directory. 

The embedded database raises a warning if you open a store that already contains a orphaned directory

Database operations from a filesystem view

Flush

A flush consists in writing a new level with level=0 directory in a sstable period directory from in-memory data.
First, the directory is created with the .tmp extension.
Then, all files inside it are written.
Finally, the new directory is renamed without the .tmp extension.

Merge

The merge consists in writing a new level with level>0 directory in sstable period from other immediate lower _level directories.
The sequence of operation is the same than for the flush operation.

Merge operation doesn't include the deletion of the merge source sstable. This is done by GenerationalTableSpace.

Checkpoint

A checkpoint consists in creating a new directory in the checkpoint directory.
First, the directory is created with the .tmp extension.
Then, all files inside it are written (at this time the locked file is also created).
Finally, the new directory is renamed without the .tmp extension.

Sstable garbage collection

It consists in removing the entire level directories that are not required by any checkpoint.
This can also lead to removing any period directory that remains empty after having deleted all the remaining level directory inside them.

Startup recovery

  1. Every *.tmp directory is deleted.
  2. Every non-locked checkpoint directory is deleted.
  3. The contents of the filelist.txt file of every remaining directory are merged.
    1. every file under sstable that is not present inside the merged list is deleted.
    2. every empty level directory is deleted.
    3. every empty period directory is deleted.

This is performed by the embedded database, but can also be done with scripts when necessary.

About purge

The purge could be simply implemented by checkpoint and SSTable GC operations.
When creating a new checkpoint, you must take into account the expiration date of the sstable to determine whether they still have to be included into the checkpoint or not.
Then, the SSTable files that are no longer used are deleted by the SSTable GC when no more checkpoint references them.

Related Links