From 296a5ac347dbccfff44d2b2a0a378e98b0480cac Mon Sep 17 00:00:00 2001 From: Christian Mollekopf Date: Mon, 28 Dec 2015 13:52:46 +0100 Subject: Docs --- docs/akonadish.md | 54 ++++++++++++++++++++++++++++++++++++++++++ docs/applicationdomaintypes.md | 1 + docs/clientapi.md | 3 +++ docs/design.md | 16 +++++++++++-- docs/logging.md | 4 +++- docs/resource.md | 17 ++++++++----- docs/storage.md | 19 ++++++++++++--- docs/terminology.md | 2 +- 8 files changed, 103 insertions(+), 13 deletions(-) create mode 100644 docs/akonadish.md (limited to 'docs') diff --git a/docs/akonadish.md b/docs/akonadish.md new file mode 100644 index 0000000..b144dba --- /dev/null +++ b/docs/akonadish.md @@ -0,0 +1,54 @@ +The akonadi shell is the primary interaction point from the commandline. It can be used for debugging, maintenance and scripting. + +The syntax is: + `akonadish COMMAND TYPE ...` + +# Commands + +## list +The list command allows to execute queries and retreive results in form of lists. +Eventually you will be able to specify which properties should be retrieved, for now it's a hardcoded list for each type. It's generally useful to check what the database contains and whether queries work. + +## count +Like list, but only output the result count. + +## stat +Some statistics how large the database is, how the size is distributed accross indexes, etc. + +## create/modify/delete +Allows to create/modify/delete entities. Currently this is only of limited use, but works already nicely with resources. Eventually it will allow to i.e. create/modify/delete all kinds of entities such as events/mails/folders/.... + +## clear +Drops all caches of a resource but leaves the config intact. This is useful while developing because it i.e. allows to retry a sync, without having to configure the resource again. + +## synchronize +Allows to synchronize a resource. For an imap resource that would mean that the remote server is contacted and the local dataset is brought up to date, +for a maildir resource it simply means all data is indexed and becomes queriable by akonadi. + +Eventually this will allow to specify a query as well to i.e. only synchronize a specific folder. + +## show +Provides the same contents as "list" but in a graphical tree view. This was really just a way for me to test whether I can actually get data into a view, so I'm not sure if it will survive as a command. For the time being it's nice to compare it's performance to the QML counterpart. + +# Setting up a new resource instance +akonadi_cmd is already the primary way how you create resource instances: + + `akonadish create resource org.kde.maildir path /home/developer/maildir1` + +This creates a resource of type "org.kde.maildir" and a configuration of "path" with the value "home/developer/maildir1". Resources are stored in configuration files, so all this does is write to some config files. + + `akonadish list resource` + +By listing all available resources we can find the identifier of the resource that was automatically assigned. + + `akonadish synchronize org.kde.maildir.instance1` + +This triggers the actual synchronization in the resource, and from there on the data is available. + + `akonadish list folder org.kde.maildir.instance1` + +This will get you all folders that are in the resource. + + `akonadish remove resource org.kde.maildir.instance1` + +And this will finally remove all traces of the resource instance. diff --git a/docs/applicationdomaintypes.md b/docs/applicationdomaintypes.md index 9a50940..4baf317 100644 --- a/docs/applicationdomaintypes.md +++ b/docs/applicationdomaintypes.md @@ -2,6 +2,7 @@ A set of standardized domain types is defined. This is necessary to decouple applications from resources (so a calendar can access events from all resources), and to have a "language" for queries. The definition of the domain model directly affects: + * granularity for data retrieval (email property, or individual subject, date, ...) * queriable properties for filtering and sorting (sender, id, ...) diff --git a/docs/clientapi.md b/docs/clientapi.md index e0c66fb..219f972 100644 --- a/docs/clientapi.md +++ b/docs/clientapi.md @@ -17,6 +17,7 @@ The client API consists of: A set of standardized domain types is defined. This is necessary to decouple applications from resources (so a calendar can access events from all resources), and to have a "language" for queries. The definition of the domain model directly affects: + * granularity for data retrieval (email property, or individual subject, date, ...) * queriable properties for filtering and sorting (sender, id, ...) @@ -24,6 +25,7 @@ The purpose of these domain types is strictly to be the interface and the types ## Store Facade The store is always accessed through a store specific facade, which hides: + * store access (one store could use a database, and another one plain files) * message type (flatbuffers, ...) * indexes @@ -69,6 +71,7 @@ Queries can be kept open (live) to receive updates as the store changes. ### Query The query consists of: + * a set of filters to match the wanted entities * the set of properties to retrieve for each entity diff --git a/docs/design.md b/docs/design.md index fe0d214..772bd65 100644 --- a/docs/design.md +++ b/docs/design.md @@ -39,6 +39,16 @@ Implications of the above: # Overview +## Client API +The client facing API hides all akonadi internals from the applications and emulates a unified store that provides data through a standardized interface. +This allows applications to transparently use various data sources with various data source formats. + +## Resource +A resource is a plugin that provides access to an additional source. It consists of a store, a synchronizer process that executes synchronization & change replay to the source and maintains the store, as well as a facade plugin for the client api. + +## Store +Each resource maintains a store that can either store the full dataset for offline access or only metadata for quick lookups. Resources can define how data is stored. + ## Types ### Domain Type The domain types exposed in the public interface. @@ -46,10 +56,12 @@ The domain types exposed in the public interface. ### Buffer Type The individual buffer types as specified by the resource. The are internal types that don't necessarily have a 1:1 mapping to the domain types, although that is the default case that the default implementations expect. +## Mechanisms ### Change Replay -The change replay is based on the revisions in the store. Clients (and also the write-back mechanism), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update it's state to the latest revision. +The change replay is based on the revisions in the store. Clients (as well as also the write-back mechanism that replays changes to the source), are informed that a new revision is available. Each client can then go through all new revisions (starting from the last seen revision), and thus update its state to the latest revision. -# Client API +### Preprocessor pipeline +Each resource has an internal pipeline of preprocessors that can be used for tasks such as indexing or filtering. The pipeline guarantees that the preprocessor steps are executed before the entity is persisted. # Tradeoffs/Design Decisions * Key-Value store instead of relational diff --git a/docs/logging.md b/docs/logging.md index a11a943..a495a7a 100644 --- a/docs/logging.md +++ b/docs/logging.md @@ -19,8 +19,10 @@ For debugging purposes a logging framework is required. Simple qDebugs() proved ## Collected information Additionally to the regular message we want: + * pid * threadid? * timestamp * sourcefile + position + function name -* application name +* application name / resource identfier +* component identifier (i.e. resource access) diff --git a/docs/resource.md b/docs/resource.md index d1b2bbe..aa263e8 100644 --- a/docs/resource.md +++ b/docs/resource.md @@ -5,12 +5,9 @@ The resource consists of: * a configuration setting of the filters # Synchronizer -* The synchronization can either: - * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store. - * If the source supports incremental changes the changeset can directly be generated from that information. +The synchronizer process is responsible for processing all commands, executing synchronizations with the source, and replaying changes to the source. -The changeset is then simply inserted in the regular modification queue and processed like all other modifications. -The synchronizer already knows that it doesn't have to replay this changeset to the source, since replay no longer goes via the store. +Processing of commands happens in the pipeline which executes all preprocessors ebfore the entity is persisted. # Preprocessors Preprocessors are small processors that are guaranteed to be processed before an new/modified/deleted entity reaches storage. They can therefore be used for various tasks that need to be executed on every entity. @@ -106,7 +103,7 @@ The indexes status information can be recorded using the latest revision the ind Most preprocessors will likely be used by several resources, and are either completely generic, or domain specific (such as only for mail). It is therefore desirable to have default implementations for common preprocessors that are ready to be plugged in. -The domain types provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented. +The domain type adaptors provide a generic interface to access most properties of the entities, on top of which generic preprocessors can be implemented. It is that way trivial to i.e. implement a preprocessor that populates a hierarchy index of collections. ## Preprocessors generating additional entities @@ -116,3 +113,11 @@ In such a case the preprocessor must invoke the complete pipeline for the new en # Pipeline A pipeline is an assembly of a set of preprocessors with a defined order. A modification is always persisted at the end of the pipeline once all preprocessors have been processed. + +# Synchronization / Change Replay +* The synchronization can either: + * Generate a full diff directly on top of the db. The diffing process can work against a single revision/snapshot (using transactions). It then generates a necessary changeset for the store. + * If the source supports incremental changes the changeset can directly be generated from that information. + +The changeset is then simply inserted in the regular modification queue and processed like all other modifications. The synchronizer has to ensure only changes are replayed to the source that didn't come from it already. + diff --git a/docs/storage.md b/docs/storage.md index b6d73fe..f1de2db 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -27,7 +27,9 @@ Entity { The store consists of entities that have each an id and a set of properties. Each entity can have multiple revisions. A entity is uniquely identified by: + * Resource + Id + The additional revision identifies a specific instance/version of the entity. Uri Scheme: @@ -37,8 +39,10 @@ Uri Scheme: Each entity can be as normalized/denormalized as useful. It is not necessary to have a solution that fits everything. Denormalized: + * priority is that mime message stays intact (signatures/encryption) * could we still provide a streaming api for attachments? + ``` Mail { id @@ -47,6 +51,7 @@ Mail { ``` Normalized: + * priority is that we can access individual members efficiently. * we don't care about exact reproducability of e.g. ical file ``` @@ -72,6 +77,7 @@ The advantage of this is that a resource only needs to specify a minimal set of ### Value Format Each entity-value in the key-value store consists of the following individual buffers: + * Metadata: metadata that is required for every entity (revision, ....) * Resource: the buffer defined by the resource (synchronized properties, values that help for synchronization such as remoteId's) * Local-only: default storage buffer that is domain-type specific. @@ -81,7 +87,7 @@ Each entity-value in the key-value store consists of the following individual bu Storage is split up in multiple named databases that reside in the same database environment. ``` - $DATADIR/akonadi2/storage/$RESOURCE_IDENTIFIER/$BUFFERTYPE.main + $DATADIR/storage/$RESOURCE_IDENTIFIER/$BUFFERTYPE.main $BUFFERTYPE.index.$INDEXTYPE ``` @@ -103,29 +109,35 @@ Files are used to handle opaque large properties that should not end up in memor For reading: Resources... -* store the file in ~/akonadi2/storage/$RESOURCE_IDENTIFIER_files/ + +* store the file in $DATADIR/storage/$RESOURCE_IDENTIFIER_files/ * store the filename in the blob property. * delete the file when the corresponding entity is deleted. Queries... + * Copy the requested property to /tmp/akonadi2/client_files/ and provide the path in the property * The file is guaranteed to exist for the lifetime of the query result. Clients.. + * Load the file from disk and use it as they wish (moving is fine too) For writing: Clients.. + * Request a path from akonadi2 and store the file there. * Store the path of the written file in the property. Resources.. -* move the file to ~/akonadi2/storage/$RESOURCE_IDENTIFIER_files/ + +* move the file to $DATADIR/storage/$RESOURCE_IDENTIFIER_files/ * store the new path in the entity #### Design Considerations Using regular files as the interface has the advantages: + * Existing mechanisms can be used to stream data directly to disk. * The necessary file operations can be efficiently handled depending on OS and filesystem. * We avoid reinventing the wheel. @@ -147,6 +159,7 @@ SQL not very useful (it would just be a very slow key-value store). While docume * Memory consumption is suitable for desktop-system (no in-memory stores). Other useful properties: + * Is suitable to implement some indexes (the fewer tools we need the better) * Support for transactions * Small overhead in on-disk size diff --git a/docs/terminology.md b/docs/terminology.md index 4b49f2d..9da8851 100644 --- a/docs/terminology.md +++ b/docs/terminology.md @@ -14,7 +14,7 @@ It is recommended to familiarize yourself with the terms before going further in * store facade: An object provided by resources which provides transformations between domain objects and the store. * synchronizer: The operating system process responsible for overseeing the process of modifying and synchronizing a store. To accomplish this, a synchronizer loads the correct resource plugin, manages pipelines and handles client communication. One synchronizer is created for each source that is accessed by clients; these processes are shared by all clients. * Preprocessor: A component that takes an entity and performs some modification of it (e.g. changes the folder an email is in) or processes it in some way (e.g. indexes it) -* pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (add, update, remove) +* pipeline: A run-time definable set of filters which are applied to an entity after a resource has performed a specific kind of function on it (create, modify, delete) * query: A declarative method for requesting entities from one or more sources that match a given set of constraints * command: Clients request modifications, additions and deletions to the store by sending commands to a synchronizer for processing * command queue: A queue of commands kept by the synchronizer to ensure durability and, when necessary, replayability -- cgit v1.2.3