stronger explicit coupling of code and data #86

tiborsimko · 2016-02-29T14:19:43Z

Nice proposal! Many things in the pitch are exactly what we try to achieve within the context of the CERN Open Data service and the CERN Analysis Preservation pilot.

One suggestion: the proposal seems to address running code more in length than it addresses its relation to data. It may be useful to promote the idea of coupling of code and data more closely, e.g. via git-annex or git-lfs tools, that permit researchers to maintain versioning of both software and data in the same place, even though the data is located on some remote storage service due to its size.

For services like Zenodo, this would open an easy possibility to archive not only software, but also (reasonably sized) datasets at the time of the release, for example.

khinsen · 2016-02-29T14:38:21Z

@tiborsimko That's indeed an important issue, but difficult to deal with in our proposal, for two reasons: (1) Executability and linking with data are nearly orthogonal issues and (2) Depending on the size and nature of data, very different technical solutions are required.

What we could do is to mention the issue in some kind of outlook - something we'd look at in phase II.

lukasheinrich · 2016-03-02T21:53:04Z

yes I'm also interested in this.

@tiborsimko do you know if at CERN the EOS people have looked into having EOS as a git-lfs backend? (for non-CERNies, EOS is CERN's multi-PB storage solution)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stronger explicit coupling of code and data #86

stronger explicit coupling of code and data #86

tiborsimko commented Feb 29, 2016

khinsen commented Feb 29, 2016

lukasheinrich commented Mar 2, 2016

stronger explicit coupling of code and data #86

stronger explicit coupling of code and data #86

Comments

tiborsimko commented Feb 29, 2016

khinsen commented Feb 29, 2016

lukasheinrich commented Mar 2, 2016