Documentation

author: Christian Mollekopf <chrigi_1@fastmail.fm> 2016-02-08 15:38:45 +0100
committer: Christian Mollekopf <chrigi_1@fastmail.fm> 2016-02-08 15:38:45 +0100
commit: b8200434209c317ebc4883b9f87513991bae33e3 (patch)
tree: 00164031c5b39eec7a5969289de21651d328d038 /docs/resource.md
parent: 0376079b862cf38973a59336f3843bca2788c576 (diff)
download: sink-b8200434209c317ebc4883b9f87513991bae33e3.tar.gz
sink-b8200434209c317ebc4883b9f87513991bae33e3.zip
1 files changed, 38 insertions, 21 deletions
diff --git a/docs/resource.md b/docs/resource.md
index defbf9a..8c87522 100644
--- a/docs/resource.md
+++ b/docs/resource.md
@@ -4,7 +4,7 @@ The resource consists of:
 * a plugin providing the client-api facade
 * a configuration setting of the filters
-# Synchronizer
+## Synchronizer
 The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source.
 Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted.
@@ -16,7 +16,15 @@ The synchronizer process has the following primary components:
 * Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly.
 * Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well.
-# Preprocessors
+A resource can:
+* provide a full mirror of the source.
+* provide metadata for efficient access to the source.
+In the former case the local mirror is fully functional locally and changes can be replayed to the source once a connection is established again.
+It the latter case the resource is only functional if a connection to the source is available (which is i.e. not a problem if the source is a local maildir on disk).
+## Preprocessors
 Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
 Usecases:
@@ -33,16 +41,29 @@ The following kinds of preprocessors exist:
 Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource.
-## Requirements
+### Requirements
 * A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
 * Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
-## Design
+### Design
 Commands are processed in batches. Each preprocessor thus has the following workflow:
 * startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
 * add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
 * endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
+### Generic Preprocessors
+Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
+It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
+The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
+It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
+### Preprocessors generating additional entities
+A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
+In such a case the preprocessor must invoke the complete pipeline for the new entity.
 ## Indexes
 Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
@@ -65,6 +86,9 @@ Index types:
    * sort indexes (i.e. sorted by date)
        * Could also be a lookup in the range index (increase date range until sufficient matches are available)
+### Default implementations
+Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
 ### Example index implementations
 * uid lookup
    * add:
@@ -106,25 +130,14 @@ Building the index on-demand is a matter of replaying the relevant dataset and u
 The indexes status information can be recorded using the latest revision the index has been updated with.
-## Generic Preprocessors
-Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
-It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
-The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
-It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
-## Preprocessors generating additional entities
-A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
-In such a case the preprocessor must invoke the complete pipeline for the new entity.
 # Pipeline
 A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed.
-# Synchronization / Change Replay
+# Synchronization
-* The synchronization can either:
+The synchronization can either:
-    * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
-    * If the source supports incremental changes the changeset can directly be generated from that information.
+* Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
+* If the source supports incremental changes the changeset can directly be generated from that information.
 The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source.
@@ -142,8 +155,12 @@ The remoteid mapping has to be updated in two places:
 * New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders).
 * Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync.
+## Change Replay
+To replay local changes to the source the synchronizer replays all revisions of the store and maintains the current replay state in the synchronization store.
+Changes that already come from the source via synchronizer are not replayed to the source again.
 # Testing / Inspection
-Resources new to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
+Resources have to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
 To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed.
author	Christian Mollekopf <chrigi_1@fastmail.fm>	2016-02-08 15:38:45 +0100
committer	Christian Mollekopf <chrigi_1@fastmail.fm>	2016-02-08 15:38:45 +0100
commit	b8200434209c317ebc4883b9f87513991bae33e3 (patch)
tree	00164031c5b39eec7a5969289de21651d328d038 /docs/resource.md
parent	0376079b862cf38973a59336f3843bca2788c576 (diff)
download	sink-b8200434209c317ebc4883b9f87513991bae33e3.tar.gz sink-b8200434209c317ebc4883b9f87513991bae33e3.zip

diff --git a/docs/resource.md b/docs/resource.md index defbf9a..8c87522 100644 --- a/docs/resource.md +++ b/docs/resource.md
@@ -4,7 +4,7 @@ The resource consists of:
4	* a plugin providing the client-api facade	4	* a plugin providing the client-api facade
5	* a configuration setting of the filters	5	* a configuration setting of the filters
6		6
7	# Synchronizer	7	## Synchronizer
8	The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source.	8	The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source.
9		9
10	Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted.	10	Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted.
@@ -16,7 +16,15 @@ The synchronizer process has the following primary components:
16	* Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly.	16	* Listener: Opens a socket and listens for incoming connections. On connection all incoming commands are read and entered into command queues. Control commands (i.e. a sync) don't require persistency and are therefore processed directly.
17	* Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well.	17	* Synchronization: Handles synchronization to the source, as well as change-replay to the source. The modification commands generated by the synchronization enter the command queue as well.
18		18
19	# Preprocessors	19	A resource can:
		20
		21	* provide a full mirror of the source.
		22	* provide metadata for efficient access to the source.
		23
		24	In the former case the local mirror is fully functional locally and changes can be replayed to the source once a connection is established again.
		25	It the latter case the resource is only functional if a connection to the source is available (which is i.e. not a problem if the source is a local maildir on disk).
		26
		27	## Preprocessors
20	Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.	28	Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity.
21		29
22	Usecases:	30	Usecases:
@@ -33,16 +41,29 @@ The following kinds of preprocessors exist:
33		41
34	Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource.	42	Preprocessors are typically read-only, to i.e. not break signatures of emails. Extra flags that are accessible through the sink domain model, can therefore be stored in the local buffer of each resource.
35		43
36	## Requirements	44	### Requirements
37	* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.	45	* A preprocessor must work with batch processing. Because batch-processing is vital for efficient writing to the database, all preprocessors have to be included in the batch processing.
38	* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.	46	* Preprocessors need to be fast, since they directly affect how fast a message is processed by the system.
39		47
40	## Design	48	### Design
41	Commands are processed in batches. Each preprocessor thus has the following workflow:	49	Commands are processed in batches. Each preprocessor thus has the following workflow:
42	* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)	50	* startBatch is called: The preprocessor can do necessary preparation steps to prepare for the batch (like starting a transaction on an external database)
43	* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.	51	* add/modify/remove is called for every command in the batch: The preprocessor executes the desired actions.
44	* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.	52	* endBatch is called: If the preprocessor wrote to an external database it can now commit the transaction.
45		53
		54	### Generic Preprocessors
		55	Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
		56	It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
		57
		58	The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
		59	It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
		60
		61	### Preprocessors generating additional entities
		62	A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
		63
		64	In such a case the preprocessor must invoke the complete pipeline for the new entity.
		65
		66
46	## Indexes	67	## Indexes
47	Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.	68	Most indexes are implemented as preprocessors to guarantee that they are always updated together with the data.
48		69
@@ -65,6 +86,9 @@ Index types:
65	* sort indexes (i.e. sorted by date)	86	* sort indexes (i.e. sorted by date)
66	* Could also be a lookup in the range index (increase date range until sufficient matches are available)	87	* Could also be a lookup in the range index (increase date range until sufficient matches are available)
67		88
		89	### Default implementations
		90	Since only properties of the domain types can be queried, default implementations for commonly used indexes can be provided. These indexes are populated by generic preprocessors that use the domain-type interface to extract properties from individual entites.
		91
68	### Example index implementations	92	### Example index implementations
69	* uid lookup	93	* uid lookup
70	* add:	94	* add:
@@ -106,25 +130,14 @@ Building the index on-demand is a matter of replaying the relevant dataset and u
106		130
107	The indexes status information can be recorded using the latest revision the index has been updated with.	131	The indexes status information can be recorded using the latest revision the index has been updated with.
108		132
109	## Generic Preprocessors
110	Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail).
111	It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in.
112
113	The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented.
114	It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections.
115
116	## Preprocessors generating additional entities
117	A preprocessor, such as an email threading preprocessors, might generate additional entities (A thread entity is a regular entity, just like the mail that spawned the thread).
118
119	In such a case the preprocessor must invoke the complete pipeline for the new entity.
120
121	# Pipeline	133	# Pipeline
122	A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed.	134	A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed.
123		135
124	# Synchronization / Change Replay	136	# Synchronization
125	* The synchronization can either:	137	The synchronization can either:
126	* Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.	138
127	* If the source supports incremental changes the changeset can directly be generated from that information.	139	* Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store.
		140	* If the source supports incremental changes the changeset can directly be generated from that information.
128		141
129	The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source.	142	The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. This is done by marking changes that don't require changereplay to the source.
130		143
@@ -142,8 +155,12 @@ The remoteid mapping has to be updated in two places:
142	* New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders).	155	* New entities that are synchronized immediately get a localid assinged, that is then recorded together with the remoteid. This is required to be able to reference other entities directly in the command queue (i.e. for parent folders).
143	* Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync.	156	* Entities created by clients get a remoteid assigned during change replay, so the entity can be recognized during the next sync.
144		157
		158	## Change Replay
		159	To replay local changes to the source the synchronizer replays all revisions of the store and maintains the current replay state in the synchronization store.
		160	Changes that already come from the source via synchronizer are not replayed to the source again.
		161
145	# Testing / Inspection	162	# Testing / Inspection
146	Resources new to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.	163	Resources have to be tested, which often requires inspections into the current state of the resource. This is difficult in an asynchronous system where the whole backend logic is encapsulated in a separate process without running tests in a vastly different setup from how it will be run in production.
147		164
148	To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed.	165	To alleviate this inspection commands are introduced. Inspection commands are special commands that the resource processes just like all other commands, and that have the sole purpose of inspecting the current resource state. Because the command is processed with the same mechanism as other commands we can rely on ordering of commands in a way that a prior command is guaranteed to be executed once the inspection command is processed.
149		166