summaryrefslogtreecommitdiffstats
path: root/docs/design.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/design.md')
-rw-r--r--docs/design.md104
1 files changed, 24 insertions, 80 deletions
diff --git a/docs/design.md b/docs/design.md
index 4451b49..2890450 100644
--- a/docs/design.md
+++ b/docs/design.md
@@ -1,101 +1,45 @@
1# Design Goals
2## Axioms
31. Personal information is stored in multiple sources (address books, email stores, calendar files, ...)
42. These sources may local, remote or a mix of local and remote
5
6## Requirements
71. Local mirrors of these sources must be available to 1..N local clients simultaneously
82. Local clients must be able to make (or at least request) changes to the data in the local mirrors
93. Local mirrors must be usable without network, even if the source is remote
104. Local mirrors must be able to syncronoize local changes to their sources (local or remote)
115. Local mirrors must be able to syncronize remote changes and propagate those to local clients
126. Content must be searchable by a number of terms (dates, identities, body text ...)
137. This must all run with acceptable performance on a moderate consumer-grade desktop system
14
15Nice to haves:
16
171. As-close-to-zero-copy-as-possible for data
182. Simple change notification semantics
193. Resource-specific syncronization techniques
204. Data agnostic storage
21
22Immediate goals:
23
241. Ease development of new features in existing resources
252. Ease maintenance of existing resources
263. Make adding new resources easy
274. Make adding new types of data or data relations easy
285. Improve performance relative to existing Akonadi implementation
29
30Long-term goals:
31
321. Project view: given a query, show all items in all stores that match that query easily and quickly
33
34Implications of the above:
35
36* Local mirrors must support multi-reader, but are probably best served with single-writer semantics as this simplifies both local change recording as well as remote synchronization by keeping it in one process which can process write requests (local or remote) in sequential fashion.
37* There is no requirement for a central server if the readers can concurrently access the local mirror directly
38* A storage system which requires a schema (e.g. relational databases) are a poor fit given the desire for data agnosticism and low memory copying
39
40# Overview 1# Overview
41 2
42## Client API 3Sink is a data access layer that additionally handles synchronization with external sources and indexing of data for efficient queries.
43The client facing API hides all Sink internals from the applications and emulates a unified store that provides data through a standardized interface. 4
5## Store
6The client facing Store API hides all Sink internals from the applications and emulates a unified store that provides data through a standardized interface.
44This allows applications to transparently use various data sources with various data source formats. 7This allows applications to transparently use various data sources with various data source formats.
45 8
46## Resource 9## Resource
47A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api. 10A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api.
48 11
49## Store 12## Storage / Indexes
50Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored. 13Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored.
14The store consists of revisions with every revision containing one entity.
15
16The store additionally contains various secondary indexes for efficient lookups.
51 17
52## Types 18## Types
53### Domain Type 19### Domain Type
54The domain types exposed in the public interface. 20The domain types exposed in the public interface provide standardized access to the store. The domain types and their properties directly define the granularity of data retrieval and thus also what queries can be executed.
55 21
56### Buffer Type 22### Buffer Type
57The individual buffer types as specified by the resource. The are internal types that don't necessarily have a 1:1 mapping to the domain types, although that is the default case that the default implementations expect. 23The buffers used by the resources in the store may be different from resource to resource, and don't necessarily have a 1:1 mapping to the domain types.
24This allows resources to store data in a way that is convenient/efficient for synchronization, altough it may require a bit more effort when accessing the data.
25The individual buffer types are specified by the resource and internal to it. Default buffer types exist of all domain types.
26
27### Commands
28Commands are used to modify the store. The resource processes commands that are generated by clients and the synchronizer.
29
30### Notifications
31The resource emits notifications to inform clients of new revisions and other changes.
58 32
59## Mechanisms 33## Mechanisms
60### Change Replay 34### Change Replay
61The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision. 35The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision.
62 36
63### Preprocessor pipeline 37### Synchronization
64Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering. The pipeline guarantees that the preprocessor steps are executed before the entity is persisted. 38The synchronizer executes a periodic synchronization that results in change commands to synchronize the store with the source.
65 39The change-replay mechanism is used to write back changes to the source that happened locally.
66# Tradeoffs/Design Decisions
67* Key-Value store instead of relational
68 * `+` Schemaless, easier to evolve
69 * `-` No need to fully normalize the data in order to make it queriable. And without full normalization SQL is not really useful and bad performance wise.
70 * `-` We need to maintain our own indexes
71
72* Individual store per resource
73 * Storage format defined by resource individually
74 * `-` Each resource needs to define it's own schema
75 * `+` Resources can adjust storage format to map well on what it has to synchronize
76 * `+` Synchronization state can directly be embedded into messages
77 * `+` Individual resources could switch to another store technology
78 * `+` Easier maintenance
79 * `+` Resource is only responsible for it's own store and doesn't accidentaly break another resources store
80 * `-` Inter`-`resource moves are both more complicated and more expensive from a client perspective
81 * `+` Inter`-`resource moves become simple additions and removals from a resource perspective
82 * `-` No system`-`wide unique id per message (only resource/id tuple identifies a message uniquely)
83 * `+` Stores can work fully concurrently (also for writing)
84 40
85* Indexes defined and maintained by resources 41### Command processing
86 * `-` Relational queries accross resources are expensive (depending on the query perhaps not even feasible) 42The resources have an internal persitant command queue hat is populated by the synchronizer and clients continuously processed.
87 * `-` Each resource needs to define it's own set of indexes
88 * `+` Flexible design as it allows to change indexes on a per resource level
89 * `+` Indexes can be optimized towards resources main usecases
90 * `+` Indexes can be shared with the source (IMAP serverside threading)
91 43
92* Shared domain types as common interface for client applications 44Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering, and through which every command goes before it enters the store. The pipeline guarantees that the preprocessor steps are executed on any command before the entity is persisted.
93 * `-` yet another abstraction layer that requires translation to other layers and maintenance
94 * `+` decoupling of domain logic from data access
95 * `+` allows to evolve types according to needs (not coupled to specific application domain types)
96 45
97# Risks
98* key-value store does not perform with large amounts of data
99* query performance is not sufficient
100* turnaround time for modifications is too high to feel responsive
101* design turns out similarly complex as Akonadi