Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property to indicate the version of applications that have been used #41

Open
NoelDeMartin opened this issue Aug 3, 2019 · 14 comments
Open

Comments

@NoelDeMartin
Copy link
Contributor

NoelDeMartin commented Aug 3, 2019

I'm going to update a Solid application and I want it to upgrade the data on PODs that interacted with older versions of my app. In order to know if this is necessary, and in case it is what needs to be modified, I need to know which was the last version that interacted with the POD. An analogous concept with databases would be migrations. Because the POD may still be accessed with old versions of the app, or even other apps who understand the previous schema, all changes will be backwards compatible.

In order to achieve this, and maybe other features, it is necessary to store which version of the app interacted with the POD. I guess in a more complex scenario this could be granular per-container or something, but for now that's a use-case I'm not contemplating.

What I will be doing for now is use the http://vocab.org/open/ ontology and write the following information to /settings/prefs.ttl (read from pim:preferencesFile):

<https://userdomain.com/profile/card#me> <http://open.vocab.org/terms/uses> <https://github.com/user/app/releases/tag/v0.1.1> .

With this, I'll be able to know the latest version of the app and upgrade the data accordingly. But I think we could have something more specific to Solid, because I'm not sure using the settings file is the best idea.

@SimonShapiro
Copy link

This is an interesting take on the role of apps. I would see it as a question of various data shapes and the mapping between them The premise is that the data is the user's and the apps work on that. You are suggesting that there is a kind of 'master app' that will change users' data from one shape to another.

@namedgraph
Copy link

I think you should think in terms of the (RDF) dataset instead. An app is just API that allows you to project it. So if you want to version, it would make more sense to do it as graph provenance. E.g. something like this: https://api.dydra.com/provenance/index.html

@dmitrizagidulin
Copy link
Member

This is an interesting take on the role of apps

That's fairly common in desktop apps, for example (where the user owns the data).

@SimonShapiro
Copy link

:-) I meant in the way I understood Solid redefining the relationship between data and apps. The promise is there is no one app that has total control over data. "If you don't like the app anymore, there is no lock-in, it's your data, just change to an app you prefer."

@NoelDeMartin
Copy link
Contributor Author

@SimonShapiro Yes of course, I'm not saying that my app will rule all the data on the POD. I see that uses relationship being used by multiple apps at a time.

The thing is that data which was created from using an older version of my app, may have changed, so I'd like to migrate this data so that the new version of my app is capable to continue working with the user's existing data. As I mentioned, this would be only additive so nothing prevents the user to continue using the older version or other apps working with the previous schema.

@NoelDeMartin
Copy link
Contributor Author

If you want a very specific example, I am building a task manager and I was using foaf:name for Task names. After getting some feedback I realized that's not the appropriate property, and I've changed it to rdfs:label for the new version. Without any migration process, users who open the new version of my app won't be able to see the names of tasks they created previously. So my idea in this case is to add rdfs:label to all existing tasks, without removing the foaf:name so that nothing is broken in previous versions.

@csarven
Copy link
Member

csarven commented Aug 4, 2019

foaf:name is a sub property of rdfs:label. Apps should have some minimal built-in smarts to handle that, or pragmatically speaking, just treat them as interchangeably where meaningful.

There are many points of entries for apps, and so unless the app information is appended into the resource description, it is not particularly reliable to know which app interacted with what. Besides, it is deemed to be preferable to decouple the application from data, so, adding information as such only brings it closer (even if small). Apps should care about what the data expresses not which app was used to create it.

From a different angle, what I would suggest is check whether a property along the lines of oa:renderedVia ( https://www.w3.org/TR/annotation-model/#rendering-software , https://www.w3.org/TR/annotation-vocab/#renderedvia ) gets you want you want. This way, the information that you really want is part of the resource and there is no out of band information to mangle with.

Is this satisfactory?

@NoelDeMartin
Copy link
Contributor Author

@namedgraph Provenance looks promising, in particular the prov:wasGeneratedBy relationship. I think doing something with this on the containers where my app placed data would be enough.

@csarven The built-in smarts you mention to know that foaf:name is a sub property of rdfs:label would be nice, but I'm not sure how to go about it in an automated way (meaning the developer doesn't have to be aware of such relation, for example I just learned that foaf:name is a sub property of rdfs:label).

I see what you mean about data and app being separated, and I agree. But I think it's necessary to have some mechanism to keep track of the data created in previous versions (like the ones you mentioned with prov and animations). Developers may make mistakes and create data with invalid properties, so it's important that new versions of an app can "fix" data from previous versions. Or else every time users interact with a newer version of an app, they risk losing data (through the app UI of course, I know they'll still have it on their POD).

I guess using one of those provenance/annotations vocabularies can get the job done, so you can close this issue if you don't think that a new property should be added to the solid vocab. I'll come back to explain my conclusions once I've implemented something.

@csarven
Copy link
Member

csarven commented Aug 4, 2019

wasGeneratedBy is fine but note that its value is an activity, not an entity (application). You can indeed attach the application on the activity.

Well, when you encounter any HTTP resource, it is possible to get more information about it. Holds true for vocab terms eg. foaf:name being defined somewhere and that reveals:

<http://xmlns.com/foaf/0.1/name>
  <http://www.w3.org/2000/01/rdf-schema#subPropertyOf>
    <http://www.w3.org/2000/01/rdf-schema#label> .

If a developer wants to use a term, it is best to look up its actual definition, and not the literal characters 'n', 'a', 'm', 'e'.

Developers may make mistakes and create data with invalid properties

Sure, that's always the case. I don't see how announcing the app version would circumvent or can reliably work with it.

Server should announce a shape for resources - what's okay to accept (and so clients use that information to create payload or update).

so it's important that new versions of an app can "fix" data from previous versions

What would fixing entail any way? Fix by upgrading.. and immediately break it for older applications? Applications can choose to be backwards compatible with older/earlier shapes. If they encounter an old version, they can work with it (read/write as is) or they can lift the old shape to new - minding the possibility of breaking things. Besides, there needs to be information about shapes to know what's the current or preferred shape. Otherwise, how would we know which app is actually most current or even doing it properly?

I agree that the PROV or WA may suffice. See how much mileage you get. Perhaps we can revisit once the use cases and requirements are more clear?

@NoelDeMartin
Copy link
Contributor Author

Well, when you encounter any HTTP resource, it is possible to get more information about it.

I think what we're talking about here is related to something I read on this article by Ruben:

In theory, data modeled with one vocabulary can be accessed seamlessly using another through mechanisms such as reasoning. In practice, reasoning is seldom available on the client or server, so data access patterns would need to match storage patterns exactly.

So, in theory it's possible to know that foaf:name is a subproperty of rdfs:label, but in practice it's difficult for an application to know that (not only this in particular, but all the hierarchy of properties and classes).

I don't see how announcing the app version would circumvent or can reliably work with it.

If the data shape is invalid, knowing the version that created it allows the application to interpret the data in that context. For example, foaf:name is supposed to be on an foaf:Person but in my case it was on a prov:Activity so it doesn't make sense to search for prov:Activity documents with a foaf:name property outside of the context of data created with my app.

What would fixing entail any way? Fix by upgrading.. and immediately break it for older applications?

No, only adding new properties.

Perhaps we can revisit once the use cases and requirements are more clear?

Yes for sure, I still have a lot to learn so I'll come back once things are more clear.

@dmitrizagidulin
Copy link
Member

@NoelDeMartin, I'd like to reopen this issue if you don't mind.

It's important that we settle on a property (or a set of properties) to indicate, for any given resource, which application was used to create it, for several reasons:

One of them has to do with app identity and authorization. Just like how a Solid server should record which person created a particular resource (like most file systems do now), it would be helpful (for informational or administrative reasons) to know which app created it. We care about identities of apps (currently, to match them against a trusted app whitelist, but also in the short future to narrow their scope of access), and recording that identity is a part of that process.

The second reason is - Schema Migration (and we still have schemas and shapes, in Solid, even if we don't call them out explicitly). Here's the thing about schema migration:

  1. Breaking schema migration should be avoided at all costs (developers should be encouraged and educated on how to do non-breaking schema migration, how to degrade gracefully, and so on).
  2. Schema migration that is one-way (that involves breaking stuff) is unavoidable. Hopefully rare, yes, but there's no way to legislate it away.

There's a rich body of both research and practical experience that helps with those two points, advising both how to avoid breaking changes, and how to handle the one-way migration process when it inevitably comes up.

The reason that schema migration relates to 'what version of the application was used' is that in some cases, the schema of a data resource was not explicitly identified by the developer (either intentionally or not). And in those cases, the app version is basically the only way to figure out what schema was being used (so that decisions about either migration or more likely data transformation need to be made).

These two reasons for deciding on the app version property, app identity recording and schema migration, of course bring up a number of other questions (which I think should be discussed in the appropriate panels):

  • Where should we store the app version / app identity metadata? The current consensus seems to be: in the data itself for RDF Resources, and in the .meta files for non-RDF resources. But we should probably make firm recommendations here.
  • What about if multiple apps were responsible for creating (or modifying) a resource? (This also goes for recording the human authors of a resource). Fortunately, we're not limited to just an owner username in a unix file system, we have room for nuances.
  • How do we make specifying the schema/shape of a resource easier (so that developers don't have to rely on implicit signals like app version)?

@SimonShapiro
Copy link

Let me see if I've got the correct understanding of this. Let's take 'bookmarks' as an example. I can have one document with all my 'bookmarks', so I could record all this information against that whole document. Alternatively, I can have a document per bookmark. So against each bookmark, which is essentially: a url; a creator; a title; and a creation date, I would be storing the app that created it and it's version; and possibly the datashape to which it conforms. Or do we simply point to various context graphs that have all that information? Like:

:thisBookmark a :bookmark;
    :title "A really cool link"@en;
    :creator :me;
    :refersTo :coolUrl;
    :createdOn "today"^^xsd:datetime;
    :conformsTo :shapeShacl;
    :creatingApp :coolApp .

@TallTed
Copy link
Contributor

TallTed commented Aug 5, 2019

Of immediate note --

:createdOn "today"^^xsd:datetime;

-- would of course have to be more like --

:createdOn "2019-08-05"^^xsd:date ;

-- as "today" is not a valid literal value for an xsd:datetime, and even if it were, once midnight passes, it would have a different meaning.

@SimonShapiro
Copy link

SimonShapiro commented Aug 5, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants