docs

author: Christian Mollekopf <chrigi_1@fastmail.fm> 2015-10-28 11:19:14 +0100
committer: Christian Mollekopf <chrigi_1@fastmail.fm> 2015-10-28 11:19:14 +0100
commit: 20f049b65c4bd8c3d0c16bbf398641675648a93f (patch)
tree: 95c0dada9435b139aa8e9d36571614820cc8392b
parent: 86987a0c6b0d9e1aa4216c5268b83a44e0bae9a4 (diff)
download: sink-20f049b65c4bd8c3d0c16bbf398641675648a93f.tar.gz
sink-20f049b65c4bd8c3d0c16bbf398641675648a93f.zip
1 files changed, 81 insertions, 10 deletions
diff --git a/docs/resource.md b/docs/resource.md
index bba4dbb..0f3d163 100644
--- a/docs/resource.md
+++ b/docs/resource.md
@@ -2,7 +2,7 @@ The resource consists of:
 * the syncronizer process
 * a plugin providing the client-api facade
-* a configuration setting up the filters
+* a configuration setting of the filters
 # Synchronizer
 * The synchronization can either:
@@ -13,25 +13,96 @@ The changeset is then simply inserted in the regular modification queue and proc
 The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store.
 # Preprocessors
-Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. The can therefore be used for various tasks that need to be executed on every entity.
+Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
 Usecases:
-* Updating various indexes
+* Update indexes
-* detecting spam/scam mail and setting appropriate flags
+* Detect spam/scam mail and set appropriate flags
-* email filtering
+* Email filtering to different folders or resources
-Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
 The following kinds of preprocessors exist:
 * filtering preprocessors that can potentially move an entity to another resource
-* passive filter, that extract data that is stored externally (i.e. indexers)
+* passive preprocessors, that extract data that is stored externally (i.e. indexers)
 * flag extractors, that produce data stored with the entity (spam detection)
-Filter typically be read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource.
+Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource.
+## Requirements
+* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
+* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
+## Design
+Commands are processed in batches. Each preprocessor thus has the following workflow:
+* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
+* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
+* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
+## Indexes
+Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
+Index types:
+    * fixed value indexes (i.e. uid)
+        * Input: key-value pair where key is the indexed property and the value is the uid of the entity
+        * Lookup: by key, value is always zero or more uid's
+    * fixed value where we want to do smaller/greater-than comparisons (like start date)
+        * Input:
+        * Lookup: by key with comparator (greater, equal range)
+        * Result: zero or more uid's
+    * range indexes (like the date range an event affects)
+        * Input: start and end of range and uid of entity
+        * Lookup: by key with comparator. The value denotes start or end of range.
+        * Result: zero or more uid's
+    * group indexes (like tree hierarchies as nested sets)
+        * could be the same as fixed value indexes, which would then just require a recursive query.
+        * Input:
+    * sort indexes (i.e. sorted by date)
+        * Could also be a lookup in the range index (increase date range until sufficient matches are available)
+### Example index implementations
+* uid lookup
+    * add:
+        * add uid + entity id to index
+    * update:
+        * remove old uid + entity id from index
+        * add uid + entity id to index
+    * remove:
+        * remove uid + entity id from index
+    * lookup:
+        * query for entity-id by uid
+* mail folder hierarchy
+    * parent folder uid is a property of the folder
+    * store parent-folder-uid + entity id
+    * lookup:
+        * query for entity-id by uid
+* mails of mail folder
+    * parent folder uid is a property of the email
+    * store parent-folder-uid + entity id
+    * lookup:
+        * query for entity-id by uid
+* email threads
+    * Thread objects should be created as dedicated entities
+    * the thread uid
+* email date sort index
+    * the date of each email is indexed as timestamp
+* event date range index
+    * the start and end date of each event is indexed as timestamp (floating date-times would change sorting based on current timezone, so the index would have to be refreshed)
+### On-demand indexes
+To avoid building all indexes initially, and assuming not all indexes are necessarily regularly used for the complete data-set, it should be possible to omit updating an index, but marking it as outdated. The index can then be built on demand when the first query requires the index.
+Building the index on-demand is a matter of replaying the relevant dataset and using the usual indexing methods. This should typically be a process that doesn't take too long, and that provides status information, since it will block the query.
+The indexes status information can be recorded using the latest revisoin the index has been updated with.
-# Generic Preprocessors
+## Generic Preprocessors
 Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
 It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
author	Christian Mollekopf <chrigi_1@fastmail.fm>	2015-10-28 11:19:14 +0100
committer	Christian Mollekopf <chrigi_1@fastmail.fm>	2015-10-28 11:19:14 +0100
commit	20f049b65c4bd8c3d0c16bbf398641675648a93f (patch)
tree	95c0dada9435b139aa8e9d36571614820cc8392b
parent	86987a0c6b0d9e1aa4216c5268b83a44e0bae9a4 (diff)
download	sink-20f049b65c4bd8c3d0c16bbf398641675648a93f.tar.gz sink-20f049b65c4bd8c3d0c16bbf398641675648a93f.zip

diff --git a/docs/resource.md b/docs/resource.md index bba4dbb..0f3d163 100644 --- a/docs/resource.md +++ b/docs/resource.md
@@ -2,7 +2,7 @@ The resource consists of:
2		2
3	* the syncronizer process	3	* the syncronizer process
4	* a plugin providing the client-api facade	4	* a plugin providing the client-api facade
5	* a configuration setting up the filters	5	* a configuration setting of the filters
6		6
7	# Synchronizer	7	# Synchronizer
8	* The synchronization can either:	8	* The synchronization can either:
@@ -13,25 +13,96 @@ The changeset is then simply inserted in the regular modification queue and proc
13	The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store.	13	The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store.
14		14
15	# Preprocessors	15	# Preprocessors
16	Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. The can therefore be used for various tasks that need to be executed on every entity.	16	Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
17		17
18	Usecases:	18	Usecases:
19		19
20	* Updating various indexes	20	* Update indexes
21	* detecting spam/scam mail and setting appropriate flags	21	* Detect spam/scam mail and set appropriate flags
22	* email filtering	22	* Email filtering to different folders or resources
23
24	Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
25		23
26	The following kinds of preprocessors exist:	24	The following kinds of preprocessors exist:
27		25
28	* filtering preprocessors that can potentially move an entity to another resource	26	* filtering preprocessors that can potentially move an entity to another resource
29	* passive filter, that extract data that is stored externally (i.e. indexers)	27	* passive preprocessors, that extract data that is stored externally (i.e. indexers)
30	* flag extractors, that produce data stored with the entity (spam detection)	28	* flag extractors, that produce data stored with the entity (spam detection)
31		29
32	Filter typically be read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource.	30	Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the akonadi domain model, can therefore be stored in the local buffer of each resource.
		31
		32	## Requirements
		33	* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
		34	* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
		35
		36	## Design
		37	Commands are processed in batches. Each preprocessor thus has the following workflow:
		38	* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
		39	* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
		40	* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
		41
		42	## Indexes
		43	Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
		44
		45	Index types:
		46
		47	* fixed value indexes (i.e. uid)
		48	* Input: key-value pair where key is the indexed property and the value is the uid of the entity
		49	* Lookup: by key, value is always zero or more uid's
		50	* fixed value where we want to do smaller/greater-than comparisons (like start date)
		51	* Input:
		52	* Lookup: by key with comparator (greater, equal range)
		53	* Result: zero or more uid's
		54	* range indexes (like the date range an event affects)
		55	* Input: start and end of range and uid of entity
		56	* Lookup: by key with comparator. The value denotes start or end of range.
		57	* Result: zero or more uid's
		58	* group indexes (like tree hierarchies as nested sets)
		59	* could be the same as fixed value indexes, which would then just require a recursive query.
		60	* Input:
		61	* sort indexes (i.e. sorted by date)
		62	* Could also be a lookup in the range index (increase date range until sufficient matches are available)
		63
		64	### Example index implementations
		65	* uid lookup
		66	* add:
		67	* add uid + entity id to index
		68	* update:
		69	* remove old uid + entity id from index
		70	* add uid + entity id to index
		71	* remove:
		72	* remove uid + entity id from index
		73	* lookup:
		74	* query for entity-id by uid
		75
		76	* mail folder hierarchy
		77	* parent folder uid is a property of the folder
		78	* store parent-folder-uid + entity id
		79	* lookup:
		80	* query for entity-id by uid
		81
		82	* mails of mail folder
		83	* parent folder uid is a property of the email
		84	* store parent-folder-uid + entity id
		85	* lookup:
		86	* query for entity-id by uid
		87
		88	* email threads
		89	* Thread objects should be created as dedicated entities
		90	* the thread uid
		91
		92	* email date sort index
		93	* the date of each email is indexed as timestamp
		94
		95	* event date range index
		96	* the start and end date of each event is indexed as timestamp (floating date-times would change sorting based on current timezone, so the index would have to be refreshed)
		97
		98	### On-demand indexes
		99	To avoid building all indexes initially, and assuming not all indexes are necessarily regularly used for the complete data-set, it should be possible to omit updating an index, but marking it as outdated. The index can then be built on demand when the first query requires the index.
		100
		101	Building the index on-demand is a matter of replaying the relevant dataset and using the usual indexing methods. This should typically be a process that doesn't take too long, and that provides status information, since it will block the query.
		102
		103	The indexes status information can be recorded using the latest revisoin the index has been updated with.
33		104
34	# Generic Preprocessors	105	## Generic Preprocessors
35	Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).	106	Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
36	It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.	107	It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
37		108