diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/clientapi.md | 129 | ||||
-rw-r--r-- | docs/design.md | 27 | ||||
-rw-r--r-- | docs/queries.md | 104 | ||||
-rw-r--r-- | docs/resource.md | 59 | ||||
-rw-r--r-- | docs/storage.md | 23 | ||||
-rw-r--r-- | docs/terminology.md | 2 |
6 files changed, 172 insertions, 172 deletions
diff --git a/docs/clientapi.md b/docs/clientapi.md index 219f972..be8ff19 100644 --- a/docs/clientapi.md +++ b/docs/clientapi.md | |||
@@ -13,16 +13,6 @@ The client API consists of: | |||
13 | * property-level on-demand loading of data | 13 | * property-level on-demand loading of data |
14 | * streaming support for large properties (attachments) | 14 | * streaming support for large properties (attachments) |
15 | 15 | ||
16 | ## Domain Types | ||
17 | A set of standardized domain types is defined. This is necessary to decouple applications from resources (so a calendar can access events from all resources), and to have a "language" for queries. | ||
18 | |||
19 | The definition of the domain model directly affects: | ||
20 | |||
21 | * granularity for data retrieval (email property, or individual subject, date, ...) | ||
22 | * queriable properties for filtering and sorting (sender, id, ...) | ||
23 | |||
24 | The purpose of these domain types is strictly to be the interface and the types are not necessarily meant to be used by applications directly, or to be restricted by any other specifications (such as ical). By nature these types will be part of the evolving interface, and will need to be adjusted for every new property that an application must understand. | ||
25 | |||
26 | ## Store Facade | 16 | ## Store Facade |
27 | The store is always accessed through a store specific facade, which hides: | 17 | The store is always accessed through a store specific facade, which hides: |
28 | 18 | ||
@@ -52,118 +42,12 @@ Each modification is associated with a specific revision, which allows the synch | |||
52 | ### Conflict Resolution | 42 | ### Conflict Resolution |
53 | Conflicts can occur at two points: | 43 | Conflicts can occur at two points: |
54 | 44 | ||
55 | * While i.e. an editor is open and we receive an update for the same entity | 45 | * In the client: While i.e. an editor is open and we receive an update for the same entity |
56 | * After a modification is sent to the synchronizer but before it's processed | 46 | * In the synchronizer: After a modification is sent to the synchronizer but before it's processed |
57 | 47 | ||
58 | In the first case the client is repsonsible to resolve the conflict, in the latter case it's the synchronizer's responsibility. | 48 | In the first case the client is repsonsible to resolve the conflict, in the latter case it's the synchronizer's responsibility. |
59 | A small window exists where the client has already started the modification (i.e. command is in socket), and a notification has not yet arrived that the same entity has been changed. In such a case the synchronizer may reject the modification because it has the revision the modification refers to no longer available. | 49 | A small window exists where the client has already started the modification (i.e. command is in socket), and a notification has not yet arrived that the same entity has been changed. In such a case the synchronizer may reject the modification because it has the revision the modification refers to no longer available. |
60 | 50 | ||
61 | This design allows the synchronizer to be in control of the revisions, and keeps it from having to wait for all clients to update until it can drop revisions. | ||
62 | |||
63 | ## Query System | ||
64 | The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources. | ||
65 | |||
66 | The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated. | ||
67 | |||
68 | Queries should are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution. | ||
69 | |||
70 | Queries can be kept open (live) to receive updates as the store changes. | ||
71 | |||
72 | ### Query | ||
73 | The query consists of: | ||
74 | |||
75 | * a set of filters to match the wanted entities | ||
76 | * the set of properties to retrieve for each entity | ||
77 | |||
78 | Queryable properties are defined by the [[Domain Types]] above. | ||
79 | |||
80 | ### Query Result | ||
81 | The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity. | ||
82 | |||
83 | The model allows to access the domain object directly, or to access individual properties directly via the rows columns. | ||
84 | |||
85 | The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted). | ||
86 | |||
87 | Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent. | ||
88 | |||
89 | If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded). | ||
90 | |||
91 | #### Enhancements | ||
92 | * Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded. | ||
93 | * To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application. | ||
94 | |||
95 | #### Filter | ||
96 | A filter consists of: | ||
97 | |||
98 | * a property to filter on as defined by the [[Domain Types]] | ||
99 | * a comparator to use | ||
100 | * a value | ||
101 | |||
102 | The available comparators are: | ||
103 | |||
104 | * equal | ||
105 | * greater than | ||
106 | * less than | ||
107 | * inclusive range | ||
108 | |||
109 | Value types include: | ||
110 | |||
111 | * Null | ||
112 | * Bool | ||
113 | * Regular Expression | ||
114 | * Substring | ||
115 | * A type-specific literal value (e.g. string, number, date, ..) | ||
116 | |||
117 | Filters can be combined using AND, OR, NOT. | ||
118 | |||
119 | #### Example | ||
120 | ``` | ||
121 | query = { | ||
122 | offset: int | ||
123 | limit: int | ||
124 | filter = { | ||
125 | and { | ||
126 | collection = foo | ||
127 | or { | ||
128 | resource = res1 | ||
129 | resource = res2 | ||
130 | } | ||
131 | } | ||
132 | } | ||
133 | } | ||
134 | ``` | ||
135 | |||
136 | possible API: | ||
137 | |||
138 | ``` | ||
139 | query.filter().and().property("collection") = "foo" | ||
140 | query.filter().and().or().property("resource") = "res1" | ||
141 | query.filter().and().or().property("resource") = "res2" | ||
142 | query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime) | ||
143 | ``` | ||
144 | |||
145 | The problem is that it is difficult to adjust an individual resource property like that. | ||
146 | |||
147 | ### Usecases ### | ||
148 | Mail: | ||
149 | |||
150 | * All mails in folder X within date-range Y that are unread. | ||
151 | * All mails (in all folders) that contain the string X in property Y. | ||
152 | |||
153 | Todos: | ||
154 | |||
155 | * Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection | ||
156 | * Give me all the todos in that collection where their RELATED-TO field has a given value | ||
157 | * Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes; | ||
158 | |||
159 | Events: | ||
160 | |||
161 | * All events of calendar X within date-range Y. | ||
162 | |||
163 | Generic: | ||
164 | * entity with identifier X | ||
165 | * all entities of resource X | ||
166 | |||
167 | ### Lazy Loading ### | 51 | ### Lazy Loading ### |
168 | The system provides property-level lazy loading. This allows i.e. to defer downloading of attachments until the attachments is accessed, at the expense of having to have access to the source (which could be connected via internet). | 52 | The system provides property-level lazy loading. This allows i.e. to defer downloading of attachments until the attachments is accessed, at the expense of having to have access to the source (which could be connected via internet). |
169 | 53 | ||
@@ -173,12 +57,3 @@ Note: We should perhaps define a minimum set of properties that *must* be availa | |||
173 | 57 | ||
174 | ### Data streaming ### | 58 | ### Data streaming ### |
175 | Large properties such as attachments should be streamable. An API that allows to retrieve a single property of a defined entity in a streamable fashion is probably enough. | 59 | Large properties such as attachments should be streamable. An API that allows to retrieve a single property of a defined entity in a streamable fashion is probably enough. |
176 | |||
177 | ### Indexes ### | ||
178 | Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites. | ||
179 | |||
180 | ## Notifications ## | ||
181 | A notification mechanism is required to inform clients about changes. Running queries will automatically update the result-set if a notification is received. | ||
182 | |||
183 | Note: A notification could supply a hint on what changed, allowing clients to ignore revisions with irrelevant changes. | ||
184 | A running query can do all of that transparently behind the scenes. Note that the hints should indeed only hint what has changed, and not supply the actual changeset. These hints should be tailored to what we see as useful, and must therefore be easy to modify. | ||
diff --git a/docs/design.md b/docs/design.md index 499f527..9b64056 100644 --- a/docs/design.md +++ b/docs/design.md | |||
@@ -9,20 +9,37 @@ This allows applications to transparently use various data sources with various | |||
9 | ## Resource | 9 | ## Resource |
10 | A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api. | 10 | A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api. |
11 | 11 | ||
12 | ## Store | 12 | ## Store / Indexes |
13 | Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored. | 13 | Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored. |
14 | The store consits of revisions with every revision containing one entity. | ||
15 | |||
16 | The store additionally contains various secondary indexes for efficient lookups. | ||
14 | 17 | ||
15 | ## Types | 18 | ## Types |
16 | ### Domain Type | 19 | ### Domain Type |
17 | The domain types exposed in the public interface. | 20 | The domain types exposed in the public interface provide standardized access to the store. The domain types and their properties directly define the granularity of data retrieval and thus also what queries can be executed. |
18 | 21 | ||
19 | ### Buffer Type | 22 | ### Buffer Type |
20 | The individual buffer types as specified by the resource. The are internal types that don't necessarily have a 1:1 mapping to the domain types, although that is the default case that the default implementations expect. | 23 | The buffers used by the resources in the store may be different from resource to resource, and don't necessarily have a 1:1 mapping to the domain types. |
24 | This allows resources to store data in a way that is convenient/efficient for synchronization, altough it may require a bit more effort when accessing the data. | ||
25 | The individual buffer types are specified by the resource and internal to it. Default buffer types exist of all domain types. | ||
26 | |||
27 | ### Commands | ||
28 | Commands are used to modify the store. The resource processes commands that are generated by clients and the synchronizer. | ||
29 | |||
30 | ### Notifications | ||
31 | The resource emits notifications to inform clients of new revisions and other changes. | ||
21 | 32 | ||
22 | ## Mechanisms | 33 | ## Mechanisms |
23 | ### Change Replay | 34 | ### Change Replay |
24 | The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision. | 35 | The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision. |
25 | 36 | ||
26 | ### Preprocessor pipeline | 37 | ### Synchronization |
27 | Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering. The pipeline guarantees that the preprocessor steps are executed before the entity is persisted. | 38 | The synchronizer executes a periodic synchronization that results in change commands to synchronize the store with the source. |
39 | The change-replay mechanism is used to write back changes to the source that happened locally. | ||
40 | |||
41 | ### Command processing | ||
42 | The resources have an internal persitant command queue hat is populated by the synchronizer and clients continuously processed. | ||
43 | |||
44 | Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering, and through which every command goes before it enters the store. The pipeline guarantees that the preprocessor steps are executed on any command before the entity is persisted. | ||
28 | 45 | ||
diff --git a/docs/queries.md b/docs/queries.md new file mode 100644 index 0000000..8676392 --- /dev/null +++ b/docs/queries.md | |||
@@ -0,0 +1,104 @@ | |||
1 | ## Query System | ||
2 | The query system should allow for efficient retrieval for just the amount of data required by the client. Efficient querying is supported by the indexes provided by the resources. | ||
3 | |||
4 | The query always retrieves a set of entities matching the query, while not necessarily all properties of the entity need to be populated. | ||
5 | |||
6 | Queries are declarative to keep the specification simple and to allow the implementation to choose the most efficient execution. | ||
7 | |||
8 | Queries can be kept open (live) to receive updates as the store changes. | ||
9 | |||
10 | ### Query | ||
11 | The query consists of: | ||
12 | |||
13 | * a set of filters to match the wanted entities | ||
14 | * the set of properties to retrieve for each entity | ||
15 | |||
16 | Queryable properties are defined by the [[Domain Types]] above. | ||
17 | |||
18 | ### Query Result | ||
19 | The result is returned directly after running the query in form of a QAbstractItemModel. Each row in the model represents a matching entity. | ||
20 | |||
21 | The model allows to access the domain object directly, or to access individual properties directly via the rows columns. | ||
22 | |||
23 | The model is always populated asynchronously. It is therefore initially empty and will then populate itself gradually, through the regular update mechanisms (rowsInserted). | ||
24 | |||
25 | Tree Queries allow the application to query for i.e. a folder hierarchy in a single query. This is necessary for performance reasons to avoid recursive querying in large hierarchies. To avoid on the other hand loading large hierchies directly into memory, the model only populates the toplevel rows automatically, all other rows need to be populated by calling `QAbstractItemModel::fetchMore(QModelIndex);`. This way the resource can deal efficiently with the query (i.e. by avoiding the many roundtrips that would be necessary with recursive queries), while keeping the amount of data in memory to a minimum (i.e. if the majority of the folder tree is collapsed in the view anyways). A tree result set can therefore be seen as a set of sets, where every subset corresponds to the children of one parent. | ||
26 | |||
27 | If the query is live, the model updates itself if the update applies to one of the already loaded subsets (otherwise it's currently irrelevant and will load once the subset is loaded). | ||
28 | |||
29 | #### Enhancements | ||
30 | * Asynchronous loading of entities/properties can be achieved by returning an invalid QVariant initially, and emitting dataChanged once the value is loaded. | ||
31 | * To avoid loading a large list when not all data is necessary, a batch size could be defined to guarantee for instance that there is sufficient data to fill the screen, and the fetchMore mechanism can be used to gradually load more data as required when scrolling in the application. | ||
32 | |||
33 | #### Filter | ||
34 | A filter consists of: | ||
35 | |||
36 | * a property to filter on as defined by the [[Domain Types]] | ||
37 | * a comparator to use | ||
38 | * a value | ||
39 | |||
40 | The available comparators are: | ||
41 | |||
42 | * equal | ||
43 | * greater than | ||
44 | * less than | ||
45 | * inclusive range | ||
46 | |||
47 | Value types include: | ||
48 | |||
49 | * Null | ||
50 | * Bool | ||
51 | * Regular Expression | ||
52 | * Substring | ||
53 | * A type-specific literal value (e.g. string, number, date, ..) | ||
54 | |||
55 | Filters can be combined using AND, OR, NOT. | ||
56 | |||
57 | #### Example | ||
58 | ``` | ||
59 | query = { | ||
60 | offset: int | ||
61 | limit: int | ||
62 | filter = { | ||
63 | and { | ||
64 | collection = foo | ||
65 | or { | ||
66 | resource = res1 | ||
67 | resource = res2 | ||
68 | } | ||
69 | } | ||
70 | } | ||
71 | } | ||
72 | ``` | ||
73 | |||
74 | possible API: | ||
75 | |||
76 | ``` | ||
77 | query.filter().and().property("collection") = "foo" | ||
78 | query.filter().and().or().property("resource") = "res1" | ||
79 | query.filter().and().or().property("resource") = "res2" | ||
80 | query.filter().and().property("start-date") = InclusiveRange(QDateTime, QDateTime) | ||
81 | ``` | ||
82 | |||
83 | The problem is that it is difficult to adjust an individual resource property like that. | ||
84 | |||
85 | ### Usecases ### | ||
86 | Mail: | ||
87 | |||
88 | * All mails in folder X within date-range Y that are unread. | ||
89 | * All mails (in all folders) that contain the string X in property Y. | ||
90 | |||
91 | Todos: | ||
92 | |||
93 | * Give me all the todos in that collection where their RELATED-TO field maps to no other todo UID field in the collection | ||
94 | * Give me all the todos in that collection where their RELATED-TO field has a given value | ||
95 | * Give me all the collections which have a given collection as parent and which have a descendant matching a criteria on its attributes; | ||
96 | |||
97 | Events: | ||
98 | |||
99 | * All events of calendar X within date-range Y. | ||
100 | |||
101 | Generic: | ||
102 | * entity with identifier X | ||
103 | * all entities of resource X | ||
104 | |||
diff --git a/docs/resource.md b/docs/resource.md index defbf9a..8c87522 100644 --- a/docs/resource.md +++ b/docs/resource.md | |||
@@ -4,7 +4,7 @@ The resource consists of: | |||
4 | * a plugin providing the client-api facade | 4 | * a plugin providing the client-api facade |
5 | * a configuration setting of the filters | 5 | * a configuration setting of the filters |
6 | 6 | ||
7 | # Synchronizer | 7 | ## Synchronizer |
8 | The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source. | 8 | The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source. |
9 | 9 | ||
10 | Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted. | 10 | Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted. |
@@ -16,7 +16,15 @@ The synchronizer process has the following primary components: | |||
16 | * Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly. | 16 | * Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly. |
17 | * Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well. | 17 | * Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well. |
18 | 18 | ||
19 | # Preprocessors | 19 | A resource can: |
20 | |||
21 | * provide a full mirror of the source. | ||
22 | * provide metadata for efficient access to the source. | ||
23 | |||
24 | In the former case the local mirror is fully functional locally and changes can be replayed to the source once a connection is established again. | ||
25 | It the latter case the resource is only functional if a connection to the source is available (which is i.e. not a problem if the source is a local maildir on disk). | ||
26 | |||
27 | ## Preprocessors | ||
20 | Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity. | 28 | Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity. |
21 | 29 | ||
22 | Usecases: | 30 | Usecases: |
@@ -33,16 +41,29 @@ The following kinds of preprocessors exist: | |||
33 | 41 | ||
34 | Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource. | 42 | Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource. |
35 | 43 | ||
36 | ## Requirements | 44 | ### Requirements |
37 | * A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing. | 45 | * A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing. |
38 | * Preprocessors need to be fast, since they directly affect how fast a message is processed by the system. | 46 | * Preprocessors need to be fast, since they directly affect how fast a message is processed by the system. |
39 | 47 | ||
40 | ## Design | 48 | ### Design |
41 | Commands are processed in batches. Each preprocessor thus has the following workflow: | 49 | Commands are processed in batches. Each preprocessor thus has the following workflow: |
42 | * startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database) | 50 | * startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database) |
43 | * add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions. | 51 | * add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions. |
44 | * endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction. | 52 | * endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction. |
45 | 53 | ||
54 | ### Generic Preprocessors | ||
55 | Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail). | ||
56 | It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in. | ||
57 | |||
58 | The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented. | ||
59 | It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections. | ||
60 | |||
61 | ### Preprocessors generating additional entities | ||
62 | A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread). | ||
63 | |||
64 | In such a case the preprocessor must invoke the complete pipeline for the new entity. | ||
65 | |||
66 | |||
46 | ## Indexes | 67 | ## Indexes |
47 | Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data. | 68 | Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data. |
48 | 69 | ||
@@ -65,6 +86,9 @@ Index types: | |||
65 | * sort indexes (i.e. sorted by date) | 86 | * sort indexes (i.e. sorted by date) |
66 | * Could also be a lookup in the range index (increase date range until sufficient matches are available) | 87 | * Could also be a lookup in the range index (increase date range until sufficient matches are available) |
67 | 88 | ||
89 | ### Default implementations | ||
90 | Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites. | ||
91 | |||
68 | ### Example index implementations | 92 | ### Example index implementations |
69 | * uid lookup | 93 | * uid lookup |
70 | * add: | 94 | * add: |
@@ -106,25 +130,14 @@ Building the index on-demand is a matter of replaying the relevant dataset and u | |||
106 | 130 | ||
107 | The indexes status information can be recorded using the latest revision the index has been updated with. | 131 | The indexes status information can be recorded using the latest revision the index has been updated with. |
108 | 132 | ||
109 | ## Generic Preprocessors | ||
110 | Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail). | ||
111 | It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in. | ||
112 | |||
113 | The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented. | ||
114 | It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections. | ||
115 | |||
116 | ## Preprocessors generating additional entities | ||
117 | A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread). | ||
118 | |||
119 | In such a case the preprocessor must invoke the complete pipeline for the new entity. | ||
120 | |||
121 | # Pipeline | 133 | # Pipeline |
122 | A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed. | 134 | A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed. |
123 | 135 | ||
124 | # Synchronization / Change Replay | 136 | # Synchronization |
125 | * The synchronization can either: | 137 | The synchronization can either: |
126 | * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store. | 138 | |
127 | * If the source supports incremental changes the changeset can directly be generated from that information. | 139 | * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store. |
140 | * If the source supports incremental changes the changeset can directly be generated from that information. | ||
128 | 141 | ||
129 | The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source. | 142 | The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source. |
130 | 143 | ||
@@ -142,8 +155,12 @@ The remoteid mapping has to be updated in two places: | |||
142 | * New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders). | 155 | * New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders). |
143 | * Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync. | 156 | * Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync. |
144 | 157 | ||
158 | ## Change Replay | ||
159 | To replay local changes to the source the synchronizer replays all revisions of the store and maintains the current replay state in the synchronization store. | ||
160 | Changes that already come from the source via synchronizer are not replayed to the source again. | ||
161 | |||
145 | # Testing / Inspection | 162 | # Testing / Inspection |
146 | Resources new to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production. | 163 | Resources have to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production. |
147 | 164 | ||
148 | To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed. | 165 | To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed. |
149 | 166 | ||
diff --git a/docs/storage.md b/docs/storage.md index 4852131..afd55d8 100644 --- a/docs/storage.md +++ b/docs/storage.md | |||
@@ -1,17 +1,3 @@ | |||
1 | ## Store access | ||
2 | Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like: | ||
3 | ``` | ||
4 | Event { | ||
5 | startDate: QDateTime | ||
6 | subject: QString | ||
7 | ... | ||
8 | } | ||
9 | ``` | ||
10 | |||
11 | This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data. | ||
12 | |||
13 | Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers. | ||
14 | |||
15 | ## Storage Model | 1 | ## Storage Model |
16 | The storage model is simple: | 2 | The storage model is simple: |
17 | ``` | 3 | ``` |
@@ -42,8 +28,7 @@ Each entity can be as normalized/denormalized as useful. It is not necessary to | |||
42 | 28 | ||
43 | Denormalized: | 29 | Denormalized: |
44 | 30 | ||
45 | * priority is that mime message stays intact (signatures/encryption) | 31 | * priority is that the mime message stays intact (signatures/encryption) |
46 | * could we still provide a streaming api for attachments? | ||
47 | 32 | ||
48 | ``` | 33 | ``` |
49 | Mail { | 34 | Mail { |
@@ -55,7 +40,7 @@ Mail { | |||
55 | Normalized: | 40 | Normalized: |
56 | 41 | ||
57 | * priority is that we can access individual members efficiently. | 42 | * priority is that we can access individual members efficiently. |
58 | * we don't care about exact reproducability of e.g. ical file | 43 | * we don't care about exact reproducability of e.g. an ical file |
59 | ``` | 44 | ``` |
60 | Event { | 45 | Event { |
61 | id | 46 | id |
@@ -101,7 +86,7 @@ The resource can be effectively removed from disk (besides configuration), | |||
101 | by deleting the directories matching `$RESOURCE_IDENTIFIER*` and everything they contain. | 86 | by deleting the directories matching `$RESOURCE_IDENTIFIER*` and everything they contain. |
102 | 87 | ||
103 | #### Design Considerations | 88 | #### Design Considerations |
104 | * The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data. | 89 | The stores are split by buffertype, so a full scan (which is done by type), doesn't require filtering by type first. The downside is that an additional lookup is required to get from revision to the data. |
105 | 90 | ||
106 | ### Revisions | 91 | ### Revisions |
107 | Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store. | 92 | Every operation (create/delete/modify), leads to a new revision. The revision is an ever increasing number for the complete store. |
@@ -167,6 +152,8 @@ Using regular files as the interface has the advantages: | |||
167 | The copy is necessary to guarantee that the file remains for the client/resource even if the resource removes the file on it's side as part of a sync. | 152 | The copy is necessary to guarantee that the file remains for the client/resource even if the resource removes the file on it's side as part of a sync. |
168 | The copy could be optimized by using hardlinks, which is not a portable solution though. For some next-gen copy-on-write filesystems copying is a very cheap operation. | 153 | The copy could be optimized by using hardlinks, which is not a portable solution though. For some next-gen copy-on-write filesystems copying is a very cheap operation. |
169 | 154 | ||
155 | A downside of having a file based design is that it's not possible to directly stream from a remote resource i.e. into the application memory, it always has to go via a file. | ||
156 | |||
170 | ## Database choice | 157 | ## Database choice |
171 | By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes | 158 | By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes |
172 | SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system. | 159 | SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system. |
diff --git a/docs/terminology.md b/docs/terminology.md index 1826bec..5238c79 100644 --- a/docs/terminology.md +++ b/docs/terminology.md | |||
@@ -13,7 +13,7 @@ It is recommended to familiarize yourself with the terms before going further in | |||
13 | * resource: A plugin which provides client command processing, a store facade and synchronization for a given type of store. The resource also manages the configuration for a given source including server settings, local paths, etc. | 13 | * resource: A plugin which provides client command processing, a store facade and synchronization for a given type of store. The resource also manages the configuration for a given source including server settings, local paths, etc. |
14 | * store facade: An object provided by resources which provides transformations between domain objects and the store. | 14 | * store facade: An object provided by resources which provides transformations between domain objects and the store. |
15 | * synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients. | 15 | * synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients. |
16 | * Preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it) | 16 | * preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it) |
17 | * pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete) | 17 | * pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete) |
18 | * query: A declarative method for requesting entities from one or more sources that match a given set of constraints | 18 | * query: A declarative method for requesting entities from one or more sources that match a given set of constraints |
19 | * command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing | 19 | * command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing |