Data Cache evolution RFC #36

mikolasstuchlik · 2025-01-16T12:51:36Z

This initial post contains a summary and a proposition for a discussion. If you have any suggestions for the summary, please share them in the comments so we can agree on the summary before we start discussion.

Data Cache

Data Cache is a construct which is responsible for managing and propagating shared data in global and flow-local contexts. The main aim of a Data Cache is:

to avoid concurrency hazards,
provide multi-delegate API,
be a nice fit with commonly used network APIs (REST and GraphQL),
coalesce related updates into batches.

The point regarding concurrency hazards is mainly concerned about runtime exceptions regarding simultaneous writes.

A Data Cache is mostly used in:

View Model of a SwiftUI Scene for updates of already loaded content[1],
View Model of a SwiftUI Scene to avoid re-loading content, which was loaded previously,
as a destination for updates fetched from the API,
as a source of truth for global state (sign in status, etc.)

Future directions:

the integration of persistence (be it database, user default, files, ...) is not yet fully explored.

It is not the aim of this discussion to provide solution for Future directions, but we should take into consideration how our solution limits our ability to integrate other means of data management.

For historical context, the concept of Data Cache was introduced to allow sharing data of GraphQL queries between scenes. Our architecture does not discourage storing data in services. Data Cache is not envisioned to be a single source of truth for the entire app. For example, Location Services may share location updates without going through Data Cache.

Swift Concurrency discussion

Authorities

Before we discuss implications of statements above, let us quickly refresh our knowledge of actors and actor isolated functions.

When control returns to an asynchronous function, it picks up exactly where it was. That doesn’t necessarily mean that it’ll be running on the exact same thread it was before, because the language doesn’t guarantee that after a suspension. [...] (async functions) associated with specific actors [...], and they’re always supposed to run as part of that actor. Swift does guarantee that such functions will in fact return to their actor to finish executing.[2]

Actor-isolated functions are reentrant. When an actor-isolated function suspends, reentrancy allows other work to execute on the actor before the original actor-isolated function resumes, which we refer to as interleaving.[3]

[...] both functions and data can be attributed with a global actor type to isolate them to that global actor.[4]

no context switches - instead of a full thread context switch, swapping continuations comes has the cost of a function call [6]

Earlier, we talked through some of the costs associated with concurrency such as additional memory allocations and logic in the Swift runtime. As such, you need to be careful to only write new code with Swift concurrency when the cost of introducing concurrency into your code outweighs the cost of managing it. The code snippet here might not actually benefit from the additional concurrency of spawning a child task simply to read a value from the user's defaults. This is because the useful work done by the child task is diminished by the cost of creating and managing the task.
async let x = userDefaults.bool(forKey: "X")
if await x { ... }
[5; 22:20]

However, if execution hops on and off the main actor frequently, the overhead of switching threads can start to add up. If your application spends a large fraction of time in context switching, you should restructure your code so that work for the main actor is batched up. You can batch work by pushing the loop into the loadArticles and updateUI method calls, making sure they process arrays instead of one value at a time. Batching up work reduces the number of context switches. [5; 37:51]

Observations

Citations above serve to clarify some misunderstandings, which we discussed in person (regarding the nature of reentrancy and context switches).

Based on the citations above, it is fair to say, that overhead incurred by Swift Concurrency suspension points is nowhere near the cost of switching a Thread. However, it should not be dismissed either.

My instinct is, that the Data Cache stores data meant for the UI and therefore should be bound to the @MainActor. This will work fine, if we bind "almost everything" to the @MainActor. My observation is, that in the past, we have been doing so anyway, due to our usage of PromiseKit[7] and in latter days, we have been receiving updates from Combine on Main executor too.[8] Likewise in Core Data, we have rarely used other than viewContext[8].

However these days, as pointed out by Matěj, we have to deal with high refresh rate devices. The added workload opens the doors for re-evaluation of my assumptions.

For sake of clarity (not only for this discussion, but also to improve the quality of our documentation and better collective understanding) we should start by clarifying the individual domains of isolation in our apps and how we image inter-domain communication[9].

Should we decide to isolate Data Cache to a different actor, we might run into issues when working with multiple Data Caches. Having multiple Data Caches is encouraged to avoid excessive growth of the global Data Cache. Data only used in a certain flow might be offloaded to a flow-local Data Cache. The flow-local Data Cache is isolated to a different actor, we would need a more complicated abstraction to "tie" both caches together.

SwiftUI and data change observation discussion

In the current architecture, we use ObservableObject as a View Model. This means, that SwiftUI is subscribed to our object by listening on the objectWillChangePublisher. The ObservableObject in turn listens to Publisher of @Published variables. Those publishers are also triggered by willSet observer.

A publisher can be "converted" to AsyncStream by the values property. However, unlike the publisher, AsyncStream does not support multicast (i.e. only one subscriber can be subscribed to one stream).

Macros discussion

Macros are a powerful Swift feature. Some members of the community discourage developers from creating their own macros.[11] Using Macros in a project incurs build-time penalty, because custom macros (unlike those provided by Apple) require to build swift-syntax and other packages. The powerful nature of Macros may also have detrimental impact on readability of a codebase. From that reason, we should scrutinise all Macros we add to our projects.

However, we already use Macro EnumIdentable and there are strong inclination to add macros for static URLs.

Actor/Published-based Data Cache

The current solution is an actor and @Published based implementation. The actor is used to serialise write access, the @Published provides multi-delegate solution (based on Combine).

The full source code is available in DataCache.swift.

In the time of writing, the Data Cache is mainly used in a following manner:

Upon loading view, the task modifier is used to invoke onAppear method of a View Model.
The onAppear method gets the reference to the @Published from the Data Cache actor via a suspension.
After the onAppear is resumed, the @Published is projected (the map operator) into an Equatable subset of contained properties.
The removeDumplicates operator is used on the projection.
We might ignore the first value depending on the implementation.
The View Model then uses the sink operator to maintain the subscription. The Cancellable is owned by the View Model.

When modifying the Data Cache in bulk, the following pattern is used:

Obtain the current value of the @Publisher from the Data Cache (1st suspension point).
Modify the value representing the whole data cache.
Update the whole data case via update(with:) method (2nd suspension point).

At the time of writing, the best example in practise is in a private project (see Slack message). The relevant links contain an example scene and architectural amendments provided by the project. One of which involves View Model-based error/loading/empty/state value. Indeed, on of the main reasons for removing the state value from the global state was, that in most cases, scenes may handle failure to fetch new data differently. The pattern of Cache Projection may be the crucial architectural link, which may help us cross the boundaries of isolation in an expressive manner.

Issues with Actor/Published-based Data Cache

Suspension points on read.
Since the Data Cache itself is an actor, any attempt to access the underlying @Publisher requires a suspension point. (Note, that it is impossible to create a non isolated accessor.) Since essential protocols of SwiftUI are bound to the @MainActor it is impossible to perform non-suspending reads on the Data Cache from the SwiftUI context.

Suspension points in non-SwiftUI contexts
In the current implementation, we need to always introduce suspension point, if we want to read a data Data Cache. This implies, that work with Data Cache is either offloaded to a new Task, or the function must be marked as async. This is even more problematic in Coordinator (coordinator basically being a View Model for a Container View), where we don't usually intent to write an imperative code.

The process of write
At this moment, the API explicitly does not guarantee, that data cache update (see example above) does not override any changes happening between the 1st and 2nd suspension point.

Downstream overhead
Projects currently using the Actor/Publisher data cache currently use map and removeDumplicates operator to filter-out unneeded updates. However, this requires storing copies of parts of the Data Cache and performing deep equations. In the best case scenario (assuming such an optimisation is even implemented in the Standard Library), we might expect, that collections are considered equal, if those collections point to a same buffer. However, if collections are stored in two different buffers, the language needs to evaluate the equity item by item. This operation might by even more complicated for "hash maps" (Set, Dictionary) and other collections, where "equal" items may be stored at different offsets.

Mitigation of Actor/Published issues

Suspension points on read.
TODO

Suspension points in non-SwiftUI contexts
TODO

The process of write
This might be solved by adding a "transaction function" like:

extension DataCache {
    func transaction(_ block: (inout T) -> Void) {
        var mutableCopy = value
        block(&mutableCopy)
        self.update(with: mutableCopy)
    }
}

Downstream overhead
TODO

Macro-based Data Cache

TODO

Notes and authorities

[1] We assume, that newly loaded SwiftUI Scene may require initial update from the API but we may want to reflect the newly arrived data globally.
[2] SE-0296 https://github.com/swiftlang/swift-evolution/blob/main/proposals/0296-async-await.md#proposed-solution-asyncawait
[3] SE-0306 https://github.com/swiftlang/swift-evolution/blob/main/proposals/0306-actors.md#actor-reentrancy
[4] SE-0316 https://github.com/swiftlang/swift-evolution/blob/main/proposals/0316-global-actors.md#global-actors-and-instance-actors
[5] WWDC21: Swift Concurrency Behind the Scenes https://wwdcnotes.com/documentation/wwdcnotes/wwdc21-10254-swift-concurrency-behind-the-scenes/
[6] https://wwdcnotes.com/documentation/wwdcnotes/wwdc21-10254-swift-concurrency-behind-the-scenes/
[7] PromiseKit by default dispatches "everything" to main queue. https://github.com/mxcl/PromiseKit/blob/master/Sources/Configuration.swift
[8] Trust me, bro.
[9] Essentially, what kind of data and when we want to pass from one actor to another.
[10] https://forums.swift.org/t/swift-async-algorithms-proposal-broadcast-previously-shared/61210/49
[11] https://www.youtube.com/watch?v=MroBR2ProT0

The text was updated successfully, but these errors were encountered:

mikolasstuchlik assigned eRDe33, jmarek41, mikolasstuchlik, samoilyk and ssestak Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Cache evolution RFC #36

Data Cache evolution RFC #36

mikolasstuchlik commented Jan 16, 2025 •

edited

Loading

Data Cache evolution RFC #36

Data Cache evolution RFC #36

Comments

mikolasstuchlik commented Jan 16, 2025 • edited Loading

Data Cache

Swift Concurrency discussion

Authorities

Observations

SwiftUI and data change observation discussion

Macros discussion

Actor/Published-based Data Cache

Issues with Actor/Published-based Data Cache

Mitigation of Actor/Published issues

Macro-based Data Cache

Notes and authorities

mikolasstuchlik commented Jan 16, 2025 •

edited

Loading