12 files changed, 298 insertions, 262 deletions
diff --git a/docs/building.md b/docs/building.md
index 907827d..17ef54b 100644
--- a/docs/building.md
+++ b/docs/building.md
@@ -85,3 +85,15 @@ mkdir build && cd build
 cmake ..
 make install
 ```
+# Dependencies
+* ExtraCmakeModules >= 0.0.10
+* Qt >= 5.2
+* KF5::Async >= 0.1
+* flatbuffers >= 1.0
+* libgit2
+* readline
+## Maildir Resource
+* KF5::Mime
diff --git a/docs/clientapi.md b/docs/clientapi.md
index 219f972..be8ff19 100644
--- a/docs/clientapi.md
+++ b/docs/clientapi.md
@@ -13,16 +13,6 @@ The client API consists of:
 * property-level on-demand loading of data
 * streaming support for large properties (attachments)
-## Domain Types
-A set of standardized domain types is defined. This is necessary to decouple applications from resources (so a calendar can access events from all resources), and to have a "language" for queries.
-The definition of the domain model directly affects:
-* granularity for data retrieval (email property, or individual subject, date, ...)
-* queriable properties for filtering and sorting (sender, id, ...)
-The purpose of these domain types is strictly to be the interface and the types are not necessarily meant to be used by applications directly, or to be restricted by any other specifications (such as ical). By nature these types will be part of the evolving interface, and will need to be adjusted for every new property that an application must understand.
 ## Store Facade
 The store is always accessed through a store specific facade, which hides:
@@ -52,118 +42,12 @@ Each modification is associated with a specific revision, which allows the synch
 ### Conflict Resolution
 Conflicts can occur at two points:
-* While i.e. an editor is open and we receive an update for the same entity
+* In the client: While i.e. an editor is open and we receive an update for the same entity
-* After a modification is sent to the synchronizer but before it's processed
+* In the synchronizer: After a modification is sent to the synchronizer but before it's processed
 In the first case the client is repsonsible to resolve the conflict, in the latter case it's the synchronizer's responsibility.
 A small window exists where the client has already started the modification (i.e. command is in socket), and a notification has not yet arrived that the same entity has been changed. In such a case the synchronizer may reject the modification because it has the revision the modification refers to no longer available.
-This design allows the synchronizer to be in control of the revisions, and keeps it from having to wait for all clients to update until it can drop revisions.
-## Query System
-The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources.
-The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated.
-Queries should are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution.
-Queries can be kept open (live) to receive updates as the store changes.
-### Query
-The query consists of:
-* a set of filters to match the wanted entities
-* the set of properties to retrieve for each entity
-Queryable properties are defined by the [[Domain Types]] above.
-### Query Result
-The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity.
-The model allows to access the domain object directly, or to access individual properties directly via the rows columns.
-The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted).
-Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent.
-If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded).
-#### Enhancements
-* Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded.
-* To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application.
-#### Filter
-A filter consists of:
-* a property to filter on as defined by the [[Domain Types]]
-* a comparator to use
-* a value
-The available comparators are:
-* equal
-* greater than
-* less than
-* inclusive range
-Value types include:
-* Null
-* Bool
-* Regular Expression
-* Substring
-* A type-specific literal value (e.g. string, number, date, ..)
-Filters can be combined using AND, OR, NOT.
-#### Example
-```
-query =  {
-    offset: int
-    limit: int
-    filter = {
-        and {
-            collection = foo
-            or {
-                resource = res1
-                resource = res2
-            }
-        }
-    }
-}
-```
-possible API:
-```
-query.filter().and().property("collection") = "foo"
-query.filter().and().or().property("resource") = "res1"
-query.filter().and().or().property("resource") = "res2"
-query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime)
-```
-The problem is that it is difficult to adjust an individual resource property like that.
-### Usecases ###
-Mail:
-* All mails in folder X within date-range Y that are unread.
-* All mails (in all folders) that contain the string X in property Y.
-Todos:
-* Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection
-* Give me all the todos in that collection where their RELATED-TO field has a given value
-* Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes;
-Events:
-* All events of calendar X within date-range Y.
-Generic:
-* entity with identifier X
-* all entities of resource X
 ### Lazy Loading ###
 The system provides property-level lazy loading. This allows i.e. to defer downloading of attachments until the attachments is accessed, at the expense of having to have access to the source (which could be connected via internet).
@@ -173,12 +57,3 @@ Note: We should perhaps define a minimum set of properties that *must* be availa
 ### Data streaming ###
 Large properties such as attachments should be streamable. An API that allows to retrieve a single property of a defined entity in a streamable fashion is probably enough.
-### Indexes ###
-Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
-## Notifications ##
-A notification mechanism is required to inform clients about changes. Running queries will automatically update the result-set if a notification is received.
-Note: A notification could supply a hint on what changed, allowing clients to ignore revisions with irrelevant changes. 
-A running query can do all of that transparently behind the scenes. Note that the hints should indeed only hint what has changed, and not supply the actual changeset. These hints should be tailored to what we see as useful, and must therefore be easy to modify.
diff --git a/docs/design.md b/docs/design.md
index 4451b49..2890450 100644
--- a/docs/design.md
+++ b/docs/design.md
@@ -1,101 +1,45 @@
-# Design Goals
-## Axioms
-1. Personal information is stored in multiple sources (address books, email stores, calendar files, ...)
-2. These sources may local, remote or a mix of local and remote
-## Requirements
-1. Local mirrors of these sources must be available to 1..N local clients simultaneously
-2. Local clients must be able to make (or at least request) changes to the data in the local mirrors
-3. Local mirrors must be usable without network, even if the source is remote
-4. Local mirrors must be able to syncronoize local changes to their sources (local or remote)
-5. Local mirrors must be able to syncronize remote changes and propagate those to local clients
-6. Content must be searchable by a number of terms (dates, identities, body text ...)
-7. This must all run with acceptable performance on a moderate consumer-grade desktop system
-Nice to haves:
-1. As-close-to-zero-copy-as-possible for data
-2. Simple change notification semantics
-3. Resource-specific syncronization techniques
-4. Data agnostic storage
-Immediate goals:
-1. Ease development of new features in existing resources
-2. Ease maintenance of existing resources
-3. Make adding new resources easy
-4. Make adding new types of data or data relations easy
-5. Improve performance relative to existing Akonadi implementation
-Long-term goals:
-1. Project view: given a query, show all items in all stores that match that query easily and quickly
-Implications of the above:
-* Local mirrors must support multi-reader, but are probably best served with single-writer semantics as this simplifies both local change recording as well as remote synchronization by keeping it in one process which can process write requests (local or remote) in sequential fashion.
-* There is no requirement for a central server if the readers can concurrently access the local mirror directly
-* A storage system which requires a schema (e.g. relational databases) are a poor fit given the desire for data agnosticism and low memory copying
 # Overview
-## Client API
+Sink is a data access layer that additionally handles synchronization with external sources and indexing of data for efficient queries.
-The client facing API hides all Sink internals from the applications and emulates a unified store that provides data through a standardized interface. 
+## Store
+The client facing Store API hides all Sink internals from the applications and emulates a unified store that provides data through a standardized interface. 
 This allows applications to transparently use various data sources with various data source formats.
 ## Resource
 A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api.
-## Store
+## Storage / Indexes
 Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored.
+The store consists of revisions with every revision containing one entity.
+The store additionally contains various secondary indexes for efficient lookups.
 ## Types
 ### Domain Type
-The domain types exposed in the public interface.
+The domain types exposed in the public interface provide standardized access to the store. The domain types and their properties directly define the granularity of data retrieval and thus also what queries can be executed.
 ### Buffer Type
-The individual buffer types as specified by the resource. The are internal types that don't necessarily have a 1:1 mapping to the domain types, although that is the default case that the default implementations expect.
+The buffers used by the resources in the store may be different from resource to resource, and don't necessarily have a 1:1 mapping to the domain types.
+This allows resources to store data in a way that is convenient/efficient for synchronization, altough it may require a bit more effort when accessing the data.
+The individual buffer types are specified by the resource and internal to it. Default buffer types exist of all domain types.
+### Commands
+Commands are used to modify the store. The resource processes commands that are generated by clients and the synchronizer.
+### Notifications
+The resource emits notifications to inform clients of new revisions and other changes.
 ## Mechanisms
 ### Change Replay
 The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision.
-### Preprocessor pipeline
+### Synchronization
-Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering. The pipeline guarantees that the preprocessor steps are executed before the entity is persisted.
+The synchronizer executes a periodic synchronization that results in change commands to synchronize the store with the source.
+The change-replay mechanism is used to write back changes to the source that happened locally.
-# Tradeoffs/Design Decisions
-* Key-Value store instead of relational
-    * `+` Schemaless, easier to evolve
-    * `-` No need to fully normalize the data in order to make it queriable. And without full normalization SQL is not really useful and bad performance wise. 
-    * `-` We need to maintain our own indexes
-* Individual store per resource
-    * Storage format defined by resource individually
-        * `-` Each resource needs to define it's own schema
-        * `+` Resources can adjust storage format to map well on what it has to synchronize
-        * `+` Synchronization state can directly be embedded into messages
-    * `+` Individual resources could switch to another store technology
-    * `+` Easier maintenance
-    * `+` Resource is only responsible for it's own store and doesn't accidentaly break another resources store
-    * `-` Inter`-`resource moves are both more complicated and more expensive from a client perspective
-    * `+` Inter`-`resource moves become simple additions and removals from a resource perspective
-    * `-` No system`-`wide unique id per message (only resource/id tuple identifies a message uniquely) 
-    * `+` Stores can work fully concurrently (also for writing)
-* Indexes defined and maintained by resources
+### Command processing
-    * `-` Relational queries accross resources are expensive (depending on the query perhaps not even feasible)
+The resources have an internal persitant command queue hat is populated by the synchronizer and clients continuously processed.
-    * `-` Each resource needs to define it's own set of indexes
-    * `+` Flexible design as it allows to change indexes on a per resource level
-    * `+` Indexes can be optimized towards resources main usecases
-    * `+` Indexes can be shared with the source (IMAP serverside threading)
-* Shared domain types as common interface for client applications
+Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering, and through which every command goes before it enters the store. The pipeline guarantees that the preprocessor steps are executed on any command before the entity is persisted.
-    * `-` yet another abstraction layer that requires translation to other layers and maintenance
-    * `+` decoupling of domain logic from data access
-    * `+` allows to evolve types according to needs (not coupled to specific application domain types)
-# Risks
-* key-value store does not perform with large amounts of data
-* query performance is not sufficient
-* turnaround time for modifications is too high to feel responsive
-* design turns out similarly complex as Akonadi
diff --git a/docs/designgoals.md b/docs/designgoals.md
new file mode 100644
index 0000000..4ffeeac
--- /dev/null
+++ b/docs/designgoals.md
@@ -0,0 +1,39 @@
+# Design Goals
+## Axioms
+1. Personal information is stored in multiple sources (address books, email stores, calendar files, ...)
+2. These sources may local, remote or a mix of local and remote
+## Requirements
+1. Local mirrors of these sources must be available to 1..N local clients simultaneously
+2. Local clients must be able to make (or at least request) changes to the data in the local mirrors
+3. Local mirrors must be usable without network, even if the source is remote
+4. Local mirrors must be able to syncronoize local changes to their sources (local or remote)
+5. Local mirrors must be able to syncronize remote changes and propagate those to local clients
+6. Content must be searchable by a number of terms (dates, identities, body text ...)
+7. This must all run with acceptable performance on a moderate consumer-grade desktop system
+Nice to haves:
+1. As-close-to-zero-copy-as-possible for data
+2. Simple change notification semantics
+3. Resource-specific syncronization techniques
+4. Data agnostic storage
+Immediate goals:
+1. Ease development of new features in existing resources
+2. Ease maintenance of existing resources
+3. Make adding new resources easy
+4. Make adding new types of data or data relations easy
+5. Improve performance relative to existing Akonadi implementation
+Long-term goals:
+1. Project view: given a query, show all items in all stores that match that query easily and quickly
+Implications of the above:
+* Local mirrors must support multi-reader, but are probably best served with single-writer semantics as this simplifies both local change recording as well as remote synchronization by keeping it in one process which can process write requests (local or remote) in sequential fashion.
+* There is no requirement for a central server if the readers can concurrently access the local mirror directly
+* A storage system which requires a schema (e.g. relational databases) are a poor fit given the desire for data agnosticism and low memory copying
diff --git a/docs/index.md b/docs/index.md
index 3019cfd..90d04b6 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,17 +1,18 @@
-# Index
+Sink is a data access layer handling synchronization, caching and indexing.
-* Design
-    * Design Goals
+Discussion of the code should be done on the kde-pim at kde.org mailing list
-    * Overview
+or in #kontact on IRC.
-    * Client API
-    * Storage
+Note that all feature development should happen in feature branches, and that
-    * Resource
+the mainline development branch is "develop". Master is for releases. It is
-    * Facade
+recommended (though not required) to use the ["git flow" tools](https://github.com/nvie/gitflow) to make branched
-    * Logging
+development easy (and easy for others to coordinate with).
-* Extending Akoandi Next
-    * Steps to add support for new types
+For further information on the project see the [KDE Phabricator instance](https://phabricator.kde.org/project/view/5/).
-    * Steps for adding support for a type to a resource
 # Documentation
 This documentation is built using [mkdocs.org](http://mkdocs.org).
 Use `mkdocs serve` to run a local webserver to view the docs.
+The documentation is also published at [http://api.kde.org/doc/sink/](http://api.kde.org/doc/sink/) and rebuilt nightly.
diff --git a/docs/logging.md b/docs/logging.md
index a495a7a..3d5ea61 100644
--- a/docs/logging.md
+++ b/docs/logging.md
@@ -10,13 +10,34 @@ For debugging purposes a logging framework is required. Simple qDebugs() proved
    * logfiles
    * a commandline monitor tool
    * some other developer tool
+This way we get complete logs also if some resource was not started from the console (i.e. because it was already running).
 ## Debug levels
-* trace: trace individual codepaths. Likely outputs way to much information for all normal cases and likely is only ever temporarily enabled. Trace points are likely only inserted into code fragments that are known to be problematic.
+* trace: trace individual codepaths. Likely outputs way to much information for all normal cases and likely is only ever temporarily enabled for certain areas.
 * log: Comprehensive debug output. Enabled on demand
 * warning: Only warnings, should always be logged.
 * error: Critical messages that should never appear. Should always be logged.
+## Debug areas
+Debug areas split the code into sections that can be enabled/disabled as one.
+This is supposed to give finer grained control over what is logged or displayed.
+Debug areas may align with classes, but don't have to, the should be made so that they are useful.
+Areas could be:
+* resource.sync.performance
+* resource.sync
+* resource.listener
+* resource.pipeline
+* resource.store
+* resource.communication
+* client.communication
+* client.communication.org.sink.resource.maildir.identifier1
+* client.queryrunner
+* client.queryrunner.performance
+* common.typeindex
 ## Collected information
 Additionally to the regular message we want:
@@ -24,5 +45,5 @@ Additionally to the regular message we want:
 * threadid?
 * timestamp
 * sourcefile + position + function name
-* application name / resource identfier
+* application name / resource identifier
-* component identifier (i.e. resource access)
+* area (i.e. resource access)
diff --git a/docs/queries.md b/docs/queries.md
new file mode 100644
index 0000000..8676392
--- /dev/null
+++ b/docs/queries.md
@@ -0,0 +1,104 @@
+## Query System
+The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources.
+The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated.
+Queries are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution.
+Queries can be kept open (live) to receive updates as the store changes.
+### Query
+The query consists of:
+* a set of filters to match the wanted entities
+* the set of properties to retrieve for each entity
+Queryable properties are defined by the [[Domain Types]] above.
+### Query Result
+The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity.
+The model allows to access the domain object directly, or to access individual properties directly via the rows columns.
+The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted).
+Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent.
+If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded).
+#### Enhancements
+* Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded.
+* To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application.
+#### Filter
+A filter consists of:
+* a property to filter on as defined by the [[Domain Types]]
+* a comparator to use
+* a value
+The available comparators are:
+* equal
+* greater than
+* less than
+* inclusive range
+Value types include:
+* Null
+* Bool
+* Regular Expression
+* Substring
+* A type-specific literal value (e.g. string, number, date, ..)
+Filters can be combined using AND, OR, NOT.
+#### Example
+```
+query =  {
+    offset: int
+    limit: int
+    filter = {
+        and {
+            collection = foo
+            or {
+                resource = res1
+                resource = res2
+            }
+        }
+    }
+}
+```
+possible API:
+```
+query.filter().and().property("collection") = "foo"
+query.filter().and().or().property("resource") = "res1"
+query.filter().and().or().property("resource") = "res2"
+query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime)
+```
+The problem is that it is difficult to adjust an individual resource property like that.
+### Usecases ###
+Mail:
+* All mails in folder X within date-range Y that are unread.
+* All mails (in all folders) that contain the string X in property Y.
+Todos:
+* Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection
+* Give me all the todos in that collection where their RELATED-TO field has a given value
+* Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes;
+Events:
+* All events of calendar X within date-range Y.
+Generic:
+* entity with identifier X
+* all entities of resource X
diff --git a/docs/resource.md b/docs/resource.md
index defbf9a..8c87522 100644
--- a/docs/resource.md
+++ b/docs/resource.md
@@ -4,7 +4,7 @@ The resource consists of:
 * a plugin providing the client-api facade
 * a configuration setting of the filters
-# Synchronizer
+## Synchronizer
 The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source.
 Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted.
@@ -16,7 +16,15 @@ The synchronizer process has the following primary components:
 * Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly.
 * Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well.
-# Preprocessors
+A resource can:
+* provide a full mirror of the source.
+* provide metadata for efficient access to the source.
+In the former case the local mirror is fully functional locally and changes can be replayed to the source once a connection is established again.
+It the latter case the resource is only functional if a connection to the source is available (which is i.e. not a problem if the source is a local maildir on disk).
+## Preprocessors
 Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
 Usecases:
@@ -33,16 +41,29 @@ The following kinds of preprocessors exist:
 Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource.
-## Requirements
+### Requirements
 * A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
 * Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
-## Design
+### Design
 Commands are processed in batches. Each preprocessor thus has the following workflow:
 * startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
 * add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
 * endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
+### Generic Preprocessors
+Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
+It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
+The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
+It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
+### Preprocessors generating additional entities
+A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
+In such a case the preprocessor must invoke the complete pipeline for the new entity.
 ## Indexes
 Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
@@ -65,6 +86,9 @@ Index types:
    * sort indexes (i.e. sorted by date)
        * Could also be a lookup in the range index (increase date range until sufficient matches are available)
+### Default implementations
+Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
 ### Example index implementations
 * uid lookup
    * add:
@@ -106,25 +130,14 @@ Building the index on-demand is a matter of replaying the relevant dataset and u
 The indexes status information can be recorded using the latest revision the index has been updated with.
-## Generic Preprocessors
-Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
-It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
-The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
-It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
-## Preprocessors generating additional entities
-A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
-In such a case the preprocessor must invoke the complete pipeline for the new entity.
 # Pipeline
 A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed.
-# Synchronization / Change Replay
+# Synchronization
-* The synchronization can either:
+The synchronization can either:
-    * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
-    * If the source supports incremental changes the changeset can directly be generated from that information.
+* Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
+* If the source supports incremental changes the changeset can directly be generated from that information.
 The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source.
@@ -142,8 +155,12 @@ The remoteid mapping has to be updated in two places:
 * New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders).
 * Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync.
+## Change Replay
+To replay local changes to the source the synchronizer replays all revisions of the store and maintains the current replay state in the synchronization store.
+Changes that already come from the source via synchronizer are not replayed to the source again.
 # Testing / Inspection
-Resources new to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
+Resources have to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
 To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed.
diff --git a/docs/akonadish.md b/docs/sinksh.md
index 9884169..9884169 100644
--- a/docs/akonadish.md
+++ b/docs/sinksh.md
diff --git a/docs/storage.md b/docs/storage.md
index 4852131..afd55d8 100644
--- a/docs/storage.md
+++ b/docs/storage.md
@@ -1,17 +1,3 @@
-## Store access
-Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like:
-```
-Event {
-  startDate: QDateTime
-  subject: QString
-  ...
-}
-```
-This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data.
-Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers.
 ## Storage Model
 The storage model is simple:
 ```
@@ -42,8 +28,7 @@ Each entity can be as normalized/denormalized as useful. It is not necessary to
 Denormalized:
-* priority is that mime message stays intact (signatures/encryption)
+* priority is that the mime message stays intact (signatures/encryption)
-* could we still provide a streaming api for attachments?
 ```
 Mail {
@@ -55,7 +40,7 @@ Mail {
 Normalized:
 * priority is that we can access individual members efficiently.
-* we don't care about exact reproducability of e.g. ical file
+* we don't care about exact reproducability of e.g. an ical file
 ```
 Event {
  id
@@ -101,7 +86,7 @@ The resource can be effectively removed from disk (besides configuration),
 by deleting the directories matching `$RESOURCE_IDENTIFIER*` and everything they contain.
 #### Design Considerations
-* The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data.
+The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data.
 ### Revisions
 Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store.
@@ -167,6 +152,8 @@ Using regular files as the interface has the advantages:
 The copy is necessary to guarantee that the file remains for the client/resource even if the resource removes the file on it's side as part of a sync.
 The copy could be optimized by using hardlinks, which is not a portable solution though. For some next-gen copy-on-write filesystems copying is a very cheap operation.
+A downside of having a file based design is that it's not possible to directly stream from a remote resource i.e. into the application memory, it always has to go via a file.
 ## Database choice
 By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes
 SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system.
diff --git a/docs/terminology.md b/docs/terminology.md
index 1826bec..5238c79 100644
--- a/docs/terminology.md
+++ b/docs/terminology.md
@@ -13,7 +13,7 @@ It is recommended to familiarize yourself with the terms before going further in
 * resource: A plugin which provides client command processing, a store facade and synchronization for a given type of store. The resource also manages the configuration for a given source including server settings, local paths, etc.
 * store facade: An object provided by resources which provides transformations between domain objects and the store.
 * synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients.
-* Preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it)
+* preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it)
 * pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete)
 * query: A declarative method for requesting entities from one or more sources that match a given set of constraints
 * command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing
diff --git a/docs/tradeoffs.md b/docs/tradeoffs.md
new file mode 100644
index 0000000..d0e32c1
--- /dev/null
+++ b/docs/tradeoffs.md
@@ -0,0 +1,36 @@
+# Tradeoffs/Design Decisions
+* Key-Value store instead of relational
+    * `+` Schemaless, easier to evolve
+    * `-` No need to fully normalize the data in order to make it queriable. And without full normalization SQL is not really useful and bad performance wise. 
+    * `-` We need to maintain our own indexes
+* Individual store per resource
+    * Storage format defined by resource individually
+        * `-` Each resource needs to define it's own schema
+        * `+` Resources can adjust storage format to map well on what it has to synchronize
+        * `+` Synchronization state can directly be embedded into messages
+    * `+` Individual resources could switch to another store technology
+    * `+` Easier maintenance
+    * `+` Resource is only responsible for it's own store and doesn't accidentaly break another resources store
+    * `-` Inter`-`resource moves are both more complicated and more expensive from a client perspective
+    * `+` Inter`-`resource moves become simple additions and removals from a resource perspective
+    * `-` No system`-`wide unique id per message (only resource/id tuple identifies a message uniquely) 
+    * `+` Stores can work fully concurrently (also for writing)
+* Indexes defined and maintained by resources
+    * `-` Relational queries accross resources are expensive (depending on the query perhaps not even feasible)
+    * `-` Each resource needs to define it's own set of indexes
+    * `+` Flexible design as it allows to change indexes on a per resource level
+    * `+` Indexes can be optimized towards resources main usecases
+    * `+` Indexes can be shared with the source (IMAP serverside threading)
+* Shared domain types as common interface for client applications
+    * `-` yet another abstraction layer that requires translation to other layers and maintenance
+    * `+` decoupling of domain logic from data access
+    * `+` allows to evolve types according to needs (not coupled to specific application domain types)
+# Risks
+* key-value store does not perform with large amounts of data
+* query performance is not sufficient
+* turnaround time for modifications is too high to feel responsive
+* design turns out similarly complex as Akonadi