1 files changed, 208 insertions, 0 deletions
diff --git a/docs/storage.md b/docs/storage.md
new file mode 100644
index 0000000..3a8d74a
--- /dev/null
+++ b/docs/storage.md
@@ -0,0 +1,208 @@
+## Store access
+Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like:
+```
+Event {
+  startDate: QDateTime
+  subject: QString
+  ...
+}
+```
+This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data.
+Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers.
+## Storage Model
+The storage model is simple:
+```
+Entity {
+  Id
+  Revision {
+      Revision-Id,
+      Property*
+  }+
+}*
+```
+The store consists of entities that have each an id and a set of properties. Each entity can have multiple revisions.
+A entity is uniquely identified by:
+* Resource + Id
+The additional revision identifies a specific instance/version of the entity.
+Uri Scheme:
+  akonadi://resource/id:revision
+## Store Entities
+Each entity can be as normalized/denormalized as useful. It is not necessary to have a solution that fits everything.
+Denormalized:
+* priority is that mime message stays intact (signatures/encryption)
+* could we still provide a streaming api for attachments?
+```
+Mail {
+  id
+  mimeMessage
+}
+```
+Normalized:
+* priority is that we can access individual members efficiently.
+* we don't care about exact reproducability of e.g. ical file
+```
+Event {
+  id
+  subject
+  startDate
+  attendees
+  ...
+}
+```
+Of course any combination of the two can be used, including duplicating data into individual properties while keeping the complete struct intact. The question then becomes though which copy is used for conflict resolution (perhaps this would result in more problems than it solves).
+#### Optional Properties
+For each domain type, we want to define a set of required and a set of optional properties. The required properties are the minimum bar for each resource, and are required in order for applications to work as expected. Optional properties may only be shown by the UI if actually supported by the backend.
+However, we'd like to be able to support local-only storage for resources that don't support an optional property. Each entity thus has a "local" buffer that provides default local only storage. This local-only buffer provides storage for all properties of the respective domain type.
+Each resource can freely define how the properties are split, while it wants to push as many as possible into the left side so they can be synchronized. Note that the resource is free to add more properties to it's synchronized buffer even though they may not be required by the specification.
+The advantage of this is that a resource only needs to specify a minimal set of properties, while everything else is taken care of by the local-only buffer. This is supposed to make it easier for resource implementors to get something working.
+### Value Format
+Each entity-value in the key-value store consists of the following individual buffers:
+* Metadata: metadata that is required for every entity (revision, ....)
+* Resource: the buffer defined by the resource (synchronized properties, values that help for synchronization such as remoteId's)
+* Local-only: default storage buffer that is domain-type specific.
+## Database
+### Storage Layout
+Storage is split up in multiple named databases that reside in the same database environment.
+```
+ $DATADIR/akonadi2/storage/$RESOURCE_IDENTIFIER/$BUFFERTYPE.main
+                                                $BUFFERTYPE.index.$INDEXTYPE
+```
+The resource can be effectively removed from disk (besides configuration),
+by deleting the `$RESOURCE_IDENTIFIER` directory and everything it contains.
+#### Design Considerations
+* The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data.
+### Revisions
+Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store.
+#### Design Considerations
+By having one revision for the complete store instead of one per type, the change replay always works across all types. This is especially important in the write-back
+mechanism that replays the changes to the source.
+### Database choice
+By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes
+SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system.
+### Requirements
+* portable; minimally: Linux, Windows, MacOS X
+* multi-thread and multi-process concurrency with single writer and multiple readers.
+    * This is required so we don't block clients while a resource is writing and deemed essential for performance and to reduce complexity.
+* Reasonably fast so we can implement all necessary queries with sufficient performance
+* Can deal with large amounts of data
+* On disk storage with ACID properties.
+* Memory consumption is suitable for desktop-system (no in-memory stores).
+Other useful properties:
+* Is suitable to implement some indexes (the fewer tools we need the better)
+* Support for transactions
+* Small overhead in on-disk size
+### Contenders
+* LMDB
+    * support for mmapped values
+    * good read performance, ok write performance
+    * fairly complex API
+    * Up to double storage size due to paging (with 4k pagesize 4001 bytes provide the worst case)
+    * size limit of 4GB on 32bit systems, virtually no limit on 64bit.  (leads to 2GB of actual payload with write amplification)
+    * limited key-search capabilities
+    * ACID transactions
+    * MVCC concurrency
+    * no compaction, database always grows (pages get reused but are never freed)
+* berkeley db (bdb)
+    * performance is supposedly worse than lmdb (lmdb was written as successor to bdb for openldap)
+    * oracle sits behind it (it has an AGPL licence though)
+* rocksdb
+    * => no multiprocess
+* kyotocabinet http://fallabs.com/kyotocabinet/
+    * fast, low on-disk overhead, simple API
+    * => no multiprocess
+    * GPL
+* hamsterdb
+    * => no multiprocess
+* sqlite4
+    * not yet released
+* bangdb
+    * not yet released opensource, looks promising on paper
+* redis
+    * => loads everything into memory
+    * => not embeddable
+* couchdb
+    * MVCC concurrency
+    * document store
+    * not embeddable (unless we write akonadi in erlang ;)
+* https://github.com/simonhf/sharedhashfile
+    * not portable (i.e. Windows; it's a mostly-Linux thing)
+* http://sphia.org/architecture.html
+    * => no multiprocess
+* leveldb
+    * => no multiprocess
+* ejdb http://ejdb.org/#ejdb-c-library
+    * modified version of kyoto cabinet
+    * => multiprocess requires locking, no multiprocess
+    * Is more of a document store
+    * No updates since September 2013
+* http://unqlite.org
+    * bad performance with large database (looks like O(n))
+    * like lmdb roughly 2*datasize
+    * includes a document store
+    * mmapped ready access
+    * reading about 30% the speed of lmdb
+    * slow writes with transactions
+## Indexes
+### Index Choice
+Additionally to the primary store, indexes are required for efficient lookups.
+Since indexes always need to be updated they directly affect how fast we can write data. While reading only a subset of the available indexes is typically used, so a slow index doesn't affect all quries.
+#### Contenders
+* xapian:
+    * fast fulltext searching
+    * No MVCC concurrency
+    * Only supports one writer at a time
+    * If a reader is reading blocks that have now been changed by a writer, it throws a DatabaseModifiedException. This means most of the Xapian code needs to be in wihle (1) { try { .. } catch () } blocks and needs to be able to start from scratch.
+    * Wildcard searching (as of 2015-01) isn't ideal. It works by expanding the word into all other words in the query and that typically makes the query size huge. This huge query is then sent to the database. Baloo has had to configure this expanding of terms so that it consumes less memory.
+    * Non existent UTF support - It does not support text normalization and splitting the terms at custom characters such as '_'.
+* lmdb:
+    * sorted keys
+    * sorted duplicate keys
+    * No FTS
+    * MVCC concurrency
+* sqlite:
+    * SQL
+    * updates lock the database for readers
+    * concurrent reading is possible
+    * Requires duplicating the data. Once in a column as data and then in the FTS.
+* lucenePlusPlus
+    * fast full text searching
+    * MVCC concurrency
+## Useful Resources
+* LMDB
+    * Wikipedia for a good overview: https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
+    * Benchmarks: http://symas.com/mdb/microbench/
+    * Tradeoffs: http://symas.com/is-lmdb-a-leveldb-killer/
+    * Disk space benchmark: http://symas.com/mdb/ondisk/
+    * LMDB instead of Kyoto Cabinet as redis backend: http://www.anchor.com.au/blog/2013/05/second-strike-with-lightning/

diff --git a/docs/storage.md b/docs/storage.md new file mode 100644 index 0000000..3a8d74a --- /dev/null +++ b/docs/storage.md
@@ -0,0 +1,208 @@
	1	## Store access
	2	Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like:
	3	```
	4	Event {
	5	startDate: QDateTime
	6	subject: QString
	7	...
	8	}
	9	```
	10
	11	This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data.
	12
	13	Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers.
	14
	15	## Storage Model
	16	The storage model is simple:
	17	```
	18	Entity {
	19	Id
	20	Revision {
	21	Revision-Id,
	22	Property*
	23	}+
	24	}*
	25	```
	26
	27	The store consists of entities that have each an id and a set of properties. Each entity can have multiple revisions.
	28
	29	A entity is uniquely identified by:
	30	* Resource + Id
	31	The additional revision identifies a specific instance/version of the entity.
	32
	33	Uri Scheme:
	34	akonadi://resource/id:revision
	35
	36	## Store Entities
	37	Each entity can be as normalized/denormalized as useful. It is not necessary to have a solution that fits everything.
	38
	39	Denormalized:
	40	* priority is that mime message stays intact (signatures/encryption)
	41	* could we still provide a streaming api for attachments?
	42	```
	43	Mail {
	44	id
	45	mimeMessage
	46	}
	47	```
	48
	49	Normalized:
	50	* priority is that we can access individual members efficiently.
	51	* we don't care about exact reproducability of e.g. ical file
	52	```
	53	Event {
	54	id
	55	subject
	56	startDate
	57	attendees
	58	...
	59	}
	60	```
	61
	62	Of course any combination of the two can be used, including duplicating data into individual properties while keeping the complete struct intact. The question then becomes though which copy is used for conflict resolution (perhaps this would result in more problems than it solves).
	63
	64	#### Optional Properties
	65	For each domain type, we want to define a set of required and a set of optional properties. The required properties are the minimum bar for each resource, and are required in order for applications to work as expected. Optional properties may only be shown by the UI if actually supported by the backend.
	66
	67	However, we'd like to be able to support local-only storage for resources that don't support an optional property. Each entity thus has a "local" buffer that provides default local only storage. This local-only buffer provides storage for all properties of the respective domain type.
	68
	69	Each resource can freely define how the properties are split, while it wants to push as many as possible into the left side so they can be synchronized. Note that the resource is free to add more properties to it's synchronized buffer even though they may not be required by the specification.
	70
	71	The advantage of this is that a resource only needs to specify a minimal set of properties, while everything else is taken care of by the local-only buffer. This is supposed to make it easier for resource implementors to get something working.
	72
	73	### Value Format
	74	Each entity-value in the key-value store consists of the following individual buffers:
	75	* Metadata: metadata that is required for every entity (revision, ....)
	76	* Resource: the buffer defined by the resource (synchronized properties, values that help for synchronization such as remoteId's)
	77	* Local-only: default storage buffer that is domain-type specific.
	78
	79	## Database
	80	### Storage Layout
	81	Storage is split up in multiple named databases that reside in the same database environment.
	82
	83	```
	84	$DATADIR/akonadi2/storage/$RESOURCE_IDENTIFIER/$BUFFERTYPE.main
	85	$BUFFERTYPE.index.$INDEXTYPE
	86	```
	87
	88	The resource can be effectively removed from disk (besides configuration),
	89	by deleting the `$RESOURCE_IDENTIFIER` directory and everything it contains.
	90
	91	#### Design Considerations
	92	* The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data.
	93
	94	### Revisions
	95	Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store.
	96
	97	#### Design Considerations
	98	By having one revision for the complete store instead of one per type, the change replay always works across all types. This is especially important in the write-back
	99	mechanism that replays the changes to the source.
	100
	101
	102	### Database choice
	103	By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes
	104	SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system.
	105
	106	### Requirements
	107	* portable; minimally: Linux, Windows, MacOS X
	108	* multi-thread and multi-process concurrency with single writer and multiple readers.
	109	* This is required so we don't block clients while a resource is writing and deemed essential for performance and to reduce complexity.
	110	* Reasonably fast so we can implement all necessary queries with sufficient performance
	111	* Can deal with large amounts of data
	112	* On disk storage with ACID properties.
	113	* Memory consumption is suitable for desktop-system (no in-memory stores).
	114
	115	Other useful properties:
	116	* Is suitable to implement some indexes (the fewer tools we need the better)
	117	* Support for transactions
	118	* Small overhead in on-disk size
	119
	120	### Contenders
	121	* LMDB
	122	* support for mmapped values
	123	* good read performance, ok write performance
	124	* fairly complex API
	125	* Up to double storage size due to paging (with 4k pagesize 4001 bytes provide the worst case)
	126	* size limit of 4GB on 32bit systems, virtually no limit on 64bit. (leads to 2GB of actual payload with write amplification)
	127	* limited key-search capabilities
	128	* ACID transactions
	129	* MVCC concurrency
	130	* no compaction, database always grows (pages get reused but are never freed)
	131	* berkeley db (bdb)
	132	* performance is supposedly worse than lmdb (lmdb was written as successor to bdb for openldap)
	133	* oracle sits behind it (it has an AGPL licence though)
	134	* rocksdb
	135	* => no multiprocess
	136	* kyotocabinet http://fallabs.com/kyotocabinet/
	137	* fast, low on-disk overhead, simple API
	138	* => no multiprocess
	139	* GPL
	140	* hamsterdb
	141	* => no multiprocess
	142	* sqlite4
	143	* not yet released
	144	* bangdb
	145	* not yet released opensource, looks promising on paper
	146	* redis
	147	* => loads everything into memory
	148	* => not embeddable
	149	* couchdb
	150	* MVCC concurrency
	151	* document store
	152	* not embeddable (unless we write akonadi in erlang ;)
	153	* https://github.com/simonhf/sharedhashfile
	154	* not portable (i.e. Windows; it's a mostly-Linux thing)
	155	* http://sphia.org/architecture.html
	156	* => no multiprocess
	157	* leveldb
	158	* => no multiprocess
	159	* ejdb http://ejdb.org/#ejdb-c-library
	160	* modified version of kyoto cabinet
	161	* => multiprocess requires locking, no multiprocess
	162	* Is more of a document store
	163	* No updates since September 2013
	164	* http://unqlite.org
	165	* bad performance with large database (looks like O(n))
	166	* like lmdb roughly 2*datasize
	167	* includes a document store
	168	* mmapped ready access
	169	* reading about 30% the speed of lmdb
	170	* slow writes with transactions
	171
	172	## Indexes
	173
	174	### Index Choice
	175	Additionally to the primary store, indexes are required for efficient lookups.
	176
	177	Since indexes always need to be updated they directly affect how fast we can write data. While reading only a subset of the available indexes is typically used, so a slow index doesn't affect all quries.
	178
	179	#### Contenders
	180	* xapian:
	181	* fast fulltext searching
	182	* No MVCC concurrency
	183	* Only supports one writer at a time
	184	* If a reader is reading blocks that have now been changed by a writer, it throws a DatabaseModifiedException. This means most of the Xapian code needs to be in wihle (1) { try { .. } catch () } blocks and needs to be able to start from scratch.
	185	* Wildcard searching (as of 2015-01) isn't ideal. It works by expanding the word into all other words in the query and that typically makes the query size huge. This huge query is then sent to the database. Baloo has had to configure this expanding of terms so that it consumes less memory.
	186	* Non existent UTF support - It does not support text normalization and splitting the terms at custom characters such as '_'.
	187	* lmdb:
	188	* sorted keys
	189	* sorted duplicate keys
	190	* No FTS
	191	* MVCC concurrency
	192	* sqlite:
	193	* SQL
	194	* updates lock the database for readers
	195	* concurrent reading is possible
	196	* Requires duplicating the data. Once in a column as data and then in the FTS.
	197	* lucenePlusPlus
	198	* fast full text searching
	199	* MVCC concurrency
	200
	201	## Useful Resources
	202	* LMDB
	203	* Wikipedia for a good overview: https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database
	204	* Benchmarks: http://symas.com/mdb/microbench/
	205	* Tradeoffs: http://symas.com/is-lmdb-a-leveldb-killer/
	206	* Disk space benchmark: http://symas.com/mdb/ondisk/
	207	* LMDB instead of Kyoto Cabinet as redis backend: http://www.anchor.com.au/blog/2013/05/second-strike-with-lightning/
	208