summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--docs/clientapi.md129
-rw-r--r--docs/design.md27
-rw-r--r--docs/queries.md104
-rw-r--r--docs/resource.md59
-rw-r--r--docs/storage.md23
-rw-r--r--docs/terminology.md2
-rw-r--r--mkdocs.yml7
7 files changed, 176 insertions, 175 deletions
diff --git a/docs/clientapi.md b/docs/clientapi.md
index 219f972..be8ff19 100644
--- a/docs/clientapi.md
+++ b/docs/clientapi.md
@@ -13,16 +13,6 @@ The client API consists of:
13* property-level on-demand loading of data 13* property-level on-demand loading of data
14* streaming support for large properties (attachments) 14* streaming support for large properties (attachments)
15 15
16## Domain Types
17A set of standardized domain types is defined. This is necessary to decouple applications from resources (so a calendar can access events from all resources), and to have a "language" for queries.
18
19The definition of the domain model directly affects:
20
21* granularity for data retrieval (email property, or individual subject, date, ...)
22* queriable properties for filtering and sorting (sender, id, ...)
23
24The purpose of these domain types is strictly to be the interface and the types are not necessarily meant to be used by applications directly, or to be restricted by any other specifications (such as ical). By nature these types will be part of the evolving interface, and will need to be adjusted for every new property that an application must understand.
25
26## Store Facade 16## Store Facade
27The store is always accessed through a store specific facade, which hides: 17The store is always accessed through a store specific facade, which hides:
28 18
@@ -52,118 +42,12 @@ Each modification is associated with a specific revision, which allows the synch
52### Conflict Resolution 42### Conflict Resolution
53Conflicts can occur at two points: 43Conflicts can occur at two points:
54 44
55* While i.e. an editor is open and we receive an update for the same entity 45* In the client: While i.e. an editor is open and we receive an update for the same entity
56* After a modification is sent to the synchronizer but before it's processed 46* In the synchronizer: After a modification is sent to the synchronizer but before it's processed
57 47
58In the first case the client is repsonsible to resolve the conflict, in the latter case it's the synchronizer's responsibility. 48In the first case the client is repsonsible to resolve the conflict, in the latter case it's the synchronizer's responsibility.
59A small window exists where the client has already started the modification (i.e. command is in socket), and a notification has not yet arrived that the same entity has been changed. In such a case the synchronizer may reject the modification because it has the revision the modification refers to no longer available. 49A small window exists where the client has already started the modification (i.e. command is in socket), and a notification has not yet arrived that the same entity has been changed. In such a case the synchronizer may reject the modification because it has the revision the modification refers to no longer available.
60 50
61This design allows the synchronizer to be in control of the revisions, and keeps it from having to wait for all clients to update until it can drop revisions.
62
63## Query System
64The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources.
65
66The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated.
67
68Queries should are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution.
69
70Queries can be kept open (live) to receive updates as the store changes.
71
72### Query
73The query consists of:
74
75* a set of filters to match the wanted entities
76* the set of properties to retrieve for each entity
77
78Queryable properties are defined by the [[Domain Types]] above.
79
80### Query Result
81The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity.
82
83The model allows to access the domain object directly, or to access individual properties directly via the rows columns.
84
85The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted).
86
87Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent.
88
89If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded).
90
91#### Enhancements
92* Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded.
93* To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application.
94
95#### Filter
96A filter consists of:
97
98* a property to filter on as defined by the [[Domain Types]]
99* a comparator to use
100* a value
101
102The available comparators are:
103
104* equal
105* greater than
106* less than
107* inclusive range
108
109Value types include:
110
111* Null
112* Bool
113* Regular Expression
114* Substring
115* A type-specific literal value (e.g. string, number, date, ..)
116
117Filters can be combined using AND, OR, NOT.
118
119#### Example
120```
121query = {
122 offset: int
123 limit: int
124 filter = {
125 and {
126 collection = foo
127 or {
128 resource = res1
129 resource = res2
130 }
131 }
132 }
133}
134```
135
136possible API:
137
138```
139query.filter().and().property("collection") = "foo"
140query.filter().and().or().property("resource") = "res1"
141query.filter().and().or().property("resource") = "res2"
142query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime)
143```
144
145The problem is that it is difficult to adjust an individual resource property like that.
146
147### Usecases ###
148Mail:
149
150* All mails in folder X within date-range Y that are unread.
151* All mails (in all folders) that contain the string X in property Y.
152
153Todos:
154
155* Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection
156* Give me all the todos in that collection where their RELATED-TO field has a given value
157* Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes;
158
159Events:
160
161* All events of calendar X within date-range Y.
162
163Generic:
164* entity with identifier X
165* all entities of resource X
166
167### Lazy Loading ### 51### Lazy Loading ###
168The system provides property-level lazy loading. This allows i.e. to defer downloading of attachments until the attachments is accessed, at the expense of having to have access to the source (which could be connected via internet). 52The system provides property-level lazy loading. This allows i.e. to defer downloading of attachments until the attachments is accessed, at the expense of having to have access to the source (which could be connected via internet).
169 53
@@ -173,12 +57,3 @@ Note: We should perhaps define a minimum set of properties that *must* be availa
173 57
174### Data streaming ### 58### Data streaming ###
175Large properties such as attachments should be streamable. An API that allows to retrieve a single property of a defined entity in a streamable fashion is probably enough. 59Large properties such as attachments should be streamable. An API that allows to retrieve a single property of a defined entity in a streamable fashion is probably enough.
176
177### Indexes ###
178Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
179
180## Notifications ##
181A notification mechanism is required to inform clients about changes. Running queries will automatically update the result-set if a notification is received.
182
183Note: A notification could supply a hint on what changed, allowing clients to ignore revisions with irrelevant changes.
184A running query can do all of that transparently behind the scenes. Note that the hints should indeed only hint what has changed, and not supply the actual changeset. These hints should be tailored to what we see as useful, and must therefore be easy to modify.
diff --git a/docs/design.md b/docs/design.md
index 499f527..9b64056 100644
--- a/docs/design.md
+++ b/docs/design.md
@@ -9,20 +9,37 @@ This allows applications to transparently use various data sources with various
9## Resource 9## Resource
10A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api. 10A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api.
11 11
12## Store 12## Store / Indexes
13Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored. 13Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored.
14The store consits of revisions with every revision containing one entity.
15
16The store additionally contains various secondary indexes for efficient lookups.
14 17
15## Types 18## Types
16### Domain Type 19### Domain Type
17The domain types exposed in the public interface. 20The domain types exposed in the public interface provide standardized access to the store. The domain types and their properties directly define the granularity of data retrieval and thus also what queries can be executed.
18 21
19### Buffer Type 22### Buffer Type
20The individual buffer types as specified by the resource. The are internal types that don't necessarily have a 1:1 mapping to the domain types, although that is the default case that the default implementations expect. 23The buffers used by the resources in the store may be different from resource to resource, and don't necessarily have a 1:1 mapping to the domain types.
24This allows resources to store data in a way that is convenient/efficient for synchronization, altough it may require a bit more effort when accessing the data.
25The individual buffer types are specified by the resource and internal to it. Default buffer types exist of all domain types.
26
27### Commands
28Commands are used to modify the store. The resource processes commands that are generated by clients and the synchronizer.
29
30### Notifications
31The resource emits notifications to inform clients of new revisions and other changes.
21 32
22## Mechanisms 33## Mechanisms
23### Change Replay 34### Change Replay
24The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision. 35The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision.
25 36
26### Preprocessor pipeline 37### Synchronization
27Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering. The pipeline guarantees that the preprocessor steps are executed before the entity is persisted. 38The synchronizer executes a periodic synchronization that results in change commands to synchronize the store with the source.
39The change-replay mechanism is used to write back changes to the source that happened locally.
40
41### Command processing
42The resources have an internal persitant command queue hat is populated by the synchronizer and clients continuously processed.
43
44Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering, and through which every command goes before it enters the store. The pipeline guarantees that the preprocessor steps are executed on any command before the entity is persisted.
28 45
diff --git a/docs/queries.md b/docs/queries.md
new file mode 100644
index 0000000..8676392
--- /dev/null
+++ b/docs/queries.md
@@ -0,0 +1,104 @@
1## Query System
2The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources.
3
4The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated.
5
6Queries are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution.
7
8Queries can be kept open (live) to receive updates as the store changes.
9
10### Query
11The query consists of:
12
13* a set of filters to match the wanted entities
14* the set of properties to retrieve for each entity
15
16Queryable properties are defined by the [[Domain Types]] above.
17
18### Query Result
19The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity.
20
21The model allows to access the domain object directly, or to access individual properties directly via the rows columns.
22
23The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted).
24
25Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent.
26
27If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded).
28
29#### Enhancements
30* Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded.
31* To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application.
32
33#### Filter
34A filter consists of:
35
36* a property to filter on as defined by the [[Domain Types]]
37* a comparator to use
38* a value
39
40The available comparators are:
41
42* equal
43* greater than
44* less than
45* inclusive range
46
47Value types include:
48
49* Null
50* Bool
51* Regular Expression
52* Substring
53* A type-specific literal value (e.g. string, number, date, ..)
54
55Filters can be combined using AND, OR, NOT.
56
57#### Example
58```
59query = {
60 offset: int
61 limit: int
62 filter = {
63 and {
64 collection = foo
65 or {
66 resource = res1
67 resource = res2
68 }
69 }
70 }
71}
72```
73
74possible API:
75
76```
77query.filter().and().property("collection") = "foo"
78query.filter().and().or().property("resource") = "res1"
79query.filter().and().or().property("resource") = "res2"
80query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime)
81```
82
83The problem is that it is difficult to adjust an individual resource property like that.
84
85### Usecases ###
86Mail:
87
88* All mails in folder X within date-range Y that are unread.
89* All mails (in all folders) that contain the string X in property Y.
90
91Todos:
92
93* Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection
94* Give me all the todos in that collection where their RELATED-TO field has a given value
95* Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes;
96
97Events:
98
99* All events of calendar X within date-range Y.
100
101Generic:
102* entity with identifier X
103* all entities of resource X
104
diff --git a/docs/resource.md b/docs/resource.md
index defbf9a..8c87522 100644
--- a/docs/resource.md
+++ b/docs/resource.md
@@ -4,7 +4,7 @@ The resource consists of:
4* a plugin providing the client-api facade 4* a plugin providing the client-api facade
5* a configuration setting of the filters 5* a configuration setting of the filters
6 6
7# Synchronizer 7## Synchronizer
8The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source. 8The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source.
9 9
10Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted. 10Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted.
@@ -16,7 +16,15 @@ The synchronizer process has the following primary components:
16* Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly. 16* Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly.
17* Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well. 17* Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well.
18 18
19# Preprocessors 19A resource can:
20
21* provide a full mirror of the source.
22* provide metadata for efficient access to the source.
23
24In the former case the local mirror is fully functional locally and changes can be replayed to the source once a connection is established again.
25It the latter case the resource is only functional if a connection to the source is available (which is i.e. not a problem if the source is a local maildir on disk).
26
27## Preprocessors
20Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity. 28Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
21 29
22Usecases: 30Usecases:
@@ -33,16 +41,29 @@ The following kinds of preprocessors exist:
33 41
34Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource. 42Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource.
35 43
36## Requirements 44### Requirements
37* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing. 45* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
38* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system. 46* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
39 47
40## Design 48### Design
41Commands are processed in batches. Each preprocessor thus has the following workflow: 49Commands are processed in batches. Each preprocessor thus has the following workflow:
42* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database) 50* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
43* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions. 51* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
44* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction. 52* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
45 53
54### Generic Preprocessors
55Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
56It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
57
58The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
59It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
60
61### Preprocessors generating additional entities
62A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
63
64In such a case the preprocessor must invoke the complete pipeline for the new entity.
65
66
46## Indexes 67## Indexes
47Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data. 68Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
48 69
@@ -65,6 +86,9 @@ Index types:
65 * sort indexes (i.e. sorted by date) 86 * sort indexes (i.e. sorted by date)
66 * Could also be a lookup in the range index (increase date range until sufficient matches are available) 87 * Could also be a lookup in the range index (increase date range until sufficient matches are available)
67 88
89### Default implementations
90Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
91
68### Example index implementations 92### Example index implementations
69* uid lookup 93* uid lookup
70 * add: 94 * add:
@@ -106,25 +130,14 @@ Building the index on-demand is a matter of replaying the relevant dataset and u
106 130
107The indexes status information can be recorded using the latest revision the index has been updated with. 131The indexes status information can be recorded using the latest revision the index has been updated with.
108 132
109## Generic Preprocessors
110Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
111It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
112
113The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
114It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
115
116## Preprocessors generating additional entities
117A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
118
119In such a case the preprocessor must invoke the complete pipeline for the new entity.
120
121# Pipeline 133# Pipeline
122A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed. 134A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed.
123 135
124# Synchronization / Change Replay 136# Synchronization
125* The synchronization can either: 137The synchronization can either:
126 * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store. 138
127 * If the source supports incremental changes the changeset can directly be generated from that information. 139* Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
140* If the source supports incremental changes the changeset can directly be generated from that information.
128 141
129The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source. 142The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source.
130 143
@@ -142,8 +155,12 @@ The remoteid mapping has to be updated in two places:
142* New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders). 155* New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders).
143* Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync. 156* Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync.
144 157
158## Change Replay
159To replay local changes to the source the synchronizer replays all revisions of the store and maintains the current replay state in the synchronization store.
160Changes that already come from the source via synchronizer are not replayed to the source again.
161
145# Testing / Inspection 162# Testing / Inspection
146Resources new to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production. 163Resources have to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
147 164
148To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed. 165To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed.
149 166
diff --git a/docs/storage.md b/docs/storage.md
index 4852131..afd55d8 100644
--- a/docs/storage.md
+++ b/docs/storage.md
@@ -1,17 +1,3 @@
1## Store access
2Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like:
3```
4Event {
5 startDate: QDateTime
6 subject: QString
7 ...
8}
9```
10
11This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data.
12
13Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers.
14
15## Storage Model 1## Storage Model
16The storage model is simple: 2The storage model is simple:
17``` 3```
@@ -42,8 +28,7 @@ Each entity can be as normalized/denormalized as useful. It is not necessary to
42 28
43Denormalized: 29Denormalized:
44 30
45* priority is that mime message stays intact (signatures/encryption) 31* priority is that the mime message stays intact (signatures/encryption)
46* could we still provide a streaming api for attachments?
47 32
48``` 33```
49Mail { 34Mail {
@@ -55,7 +40,7 @@ Mail {
55Normalized: 40Normalized:
56 41
57* priority is that we can access individual members efficiently. 42* priority is that we can access individual members efficiently.
58* we don't care about exact reproducability of e.g. ical file 43* we don't care about exact reproducability of e.g. an ical file
59``` 44```
60Event { 45Event {
61 id 46 id
@@ -101,7 +86,7 @@ The resource can be effectively removed from disk (besides configuration),
101by deleting the directories matching `$RESOURCE_IDENTIFIER*` and everything they contain. 86by deleting the directories matching `$RESOURCE_IDENTIFIER*` and everything they contain.
102 87
103#### Design Considerations 88#### Design Considerations
104* The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data. 89The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data.
105 90
106### Revisions 91### Revisions
107Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store. 92Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store.
@@ -167,6 +152,8 @@ Using regular files as the interface has the advantages:
167The copy is necessary to guarantee that the file remains for the client/resource even if the resource removes the file on it's side as part of a sync. 152The copy is necessary to guarantee that the file remains for the client/resource even if the resource removes the file on it's side as part of a sync.
168The copy could be optimized by using hardlinks, which is not a portable solution though. For some next-gen copy-on-write filesystems copying is a very cheap operation. 153The copy could be optimized by using hardlinks, which is not a portable solution though. For some next-gen copy-on-write filesystems copying is a very cheap operation.
169 154
155A downside of having a file based design is that it's not possible to directly stream from a remote resource i.e. into the application memory, it always has to go via a file.
156
170## Database choice 157## Database choice
171By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes 158By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes
172SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system. 159SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system.
diff --git a/docs/terminology.md b/docs/terminology.md
index 1826bec..5238c79 100644
--- a/docs/terminology.md
+++ b/docs/terminology.md
@@ -13,7 +13,7 @@ It is recommended to familiarize yourself with the terms before going further in
13* resource: A plugin which provides client command processing, a store facade and synchronization for a given type of store. The resource also manages the configuration for a given source including server settings, local paths, etc. 13* resource: A plugin which provides client command processing, a store facade and synchronization for a given type of store. The resource also manages the configuration for a given source including server settings, local paths, etc.
14* store facade: An object provided by resources which provides transformations between domain objects and the store. 14* store facade: An object provided by resources which provides transformations between domain objects and the store.
15* synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients. 15* synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients.
16* Preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it) 16* preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it)
17* pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete) 17* pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete)
18* query: A declarative method for requesting entities from one or more sources that match a given set of constraints 18* query: A declarative method for requesting entities from one or more sources that match a given set of constraints
19* command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing 19* command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing
diff --git a/mkdocs.yml b/mkdocs.yml
index 9f71214..07299f4 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -5,11 +5,12 @@ pages:
5 - Terminology: terminology.md 5 - Terminology: terminology.md
6 - "Design Goals": designgoals.md 6 - "Design Goals": designgoals.md
7 - Overview: design.md 7 - Overview: design.md
8 - Resource: resource.md
9 - Storage: storage.md
10 - Logging: logging.md
11 - "Client API": clientapi.md 8 - "Client API": clientapi.md
12 - "Application Domain Types": applicationdomaintypes.md 9 - "Application Domain Types": applicationdomaintypes.md
10 - Queries: queries.md
11 - Resource: resource.md
12 - "Store and Indexes": storage.md
13 - Logging: logging.md
13 - "Tradeoffs and Design Decisions": tradeoffs.md 14 - "Tradeoffs and Design Decisions": tradeoffs.md
14- Development: 15- Development:
15 - "Extending Sink": extending.md 16 - "Extending Sink": extending.md