diff options
Diffstat (limited to 'docs/resource.md')
-rw-r--r-- | docs/resource.md | 91 |
1 files changed, 81 insertions, 10 deletions
diff --git a/docs/resource.md b/docs/resource.md index bba4dbb..0f3d163 100644 --- a/docs/resource.md +++ b/docs/resource.md | |||
@@ -2,7 +2,7 @@ The resource consists of: | |||
2 | 2 | ||
3 | * the syncronizer process | 3 | * the syncronizer process |
4 | * a plugin providing the client-api facade | 4 | * a plugin providing the client-api facade |
5 | * a configuration setting up the filters | 5 | * a configuration setting of the filters |
6 | 6 | ||
7 | # Synchronizer | 7 | # Synchronizer |
8 | * The synchronization can either: | 8 | * The synchronization can either: |
@@ -13,25 +13,96 @@ The changeset is then simply inserted in the regular modification queue and proc | |||
13 | The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store. | 13 | The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store. |
14 | 14 | ||
15 | # Preprocessors | 15 | # Preprocessors |
16 | Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. The can therefore be used for various tasks that need to be executed on every entity. | 16 | Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity. |
17 | 17 | ||
18 | Usecases: | 18 | Usecases: |
19 | 19 | ||
20 | * Updating various indexes | 20 | * Update indexes |
21 | * detecting spam/scam mail and setting appropriate flags | 21 | * Detect spam/scam mail and set appropriate flags |
22 | * email filtering | 22 | * Email filtering to different folders or resources |
23 | |||
24 | Preprocessors need to be fast, since they directly affect how fast a message is processed by the system. | ||
25 | 23 | ||
26 | The following kinds of preprocessors exist: | 24 | The following kinds of preprocessors exist: |
27 | 25 | ||
28 | * filtering preprocessors that can potentially move an entity to another resource | 26 | * filtering preprocessors that can potentially move an entity to another resource |
29 | * passive filter, that extract data that is stored externally (i.e. indexers) | 27 | * passive preprocessors, that extract data that is stored externally (i.e. indexers) |
30 | * flag extractors, that produce data stored with the entity (spam detection) | 28 | * flag extractors, that produce data stored with the entity (spam detection) |
31 | 29 | ||
32 | Filter typically be read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource. | 30 | Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource. |
31 | |||
32 | ## Requirements | ||
33 | * A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing. | ||
34 | * Preprocessors need to be fast, since they directly affect how fast a message is processed by the system. | ||
35 | |||
36 | ## Design | ||
37 | Commands are processed in batches. Each preprocessor thus has the following workflow: | ||
38 | * startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database) | ||
39 | * add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions. | ||
40 | * endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction. | ||
41 | |||
42 | ## Indexes | ||
43 | Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data. | ||
44 | |||
45 | Index types: | ||
46 | |||
47 | * fixed value indexes (i.e. uid) | ||
48 | * Input: key-value pair where key is the indexed property and the value is the uid of the entity | ||
49 | * Lookup: by key, value is always zero or more uid's | ||
50 | * fixed value where we want to do smaller/greater-than comparisons (like start date) | ||
51 | * Input: | ||
52 | * Lookup: by key with comparator (greater, equal range) | ||
53 | * Result: zero or more uid's | ||
54 | * range indexes (like the date range an event affects) | ||
55 | * Input: start and end of range and uid of entity | ||
56 | * Lookup: by key with comparator. The value denotes start or end of range. | ||
57 | * Result: zero or more uid's | ||
58 | * group indexes (like tree hierarchies as nested sets) | ||
59 | * could be the same as fixed value indexes, which would then just require a recursive query. | ||
60 | * Input: | ||
61 | * sort indexes (i.e. sorted by date) | ||
62 | * Could also be a lookup in the range index (increase date range until sufficient matches are available) | ||
63 | |||
64 | ### Example index implementations | ||
65 | * uid lookup | ||
66 | * add: | ||
67 | * add uid + entity id to index | ||
68 | * update: | ||
69 | * remove old uid + entity id from index | ||
70 | * add uid + entity id to index | ||
71 | * remove: | ||
72 | * remove uid + entity id from index | ||
73 | * lookup: | ||
74 | * query for entity-id by uid | ||
75 | |||
76 | * mail folder hierarchy | ||
77 | * parent folder uid is a property of the folder | ||
78 | * store parent-folder-uid + entity id | ||
79 | * lookup: | ||
80 | * query for entity-id by uid | ||
81 | |||
82 | * mails of mail folder | ||
83 | * parent folder uid is a property of the email | ||
84 | * store parent-folder-uid + entity id | ||
85 | * lookup: | ||
86 | * query for entity-id by uid | ||
87 | |||
88 | * email threads | ||
89 | * Thread objects should be created as dedicated entities | ||
90 | * the thread uid | ||
91 | |||
92 | * email date sort index | ||
93 | * the date of each email is indexed as timestamp | ||
94 | |||
95 | * event date range index | ||
96 | * the start and end date of each event is indexed as timestamp (floating date-times would change sorting based on current timezone, so the index would have to be refreshed) | ||
97 | |||
98 | ### On-demand indexes | ||
99 | To avoid building all indexes initially, and assuming not all indexes are necessarily regularly used for the complete data-set, it should be possible to omit updating an index, but marking it as outdated. The index can then be built on demand when the first query requires the index. | ||
100 | |||
101 | Building the index on-demand is a matter of replaying the relevant dataset and using the usual indexing methods. This should typically be a process that doesn't take too long, and that provides status information, since it will block the query. | ||
102 | |||
103 | The indexes status information can be recorded using the latest revisoin the index has been updated with. | ||
33 | 104 | ||
34 | # Generic Preprocessors | 105 | ## Generic Preprocessors |
35 | Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail). | 106 | Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail). |
36 | It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in. | 107 | It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in. |
37 | 108 | ||