Making FAIR data real - The community experience #12

FrTr · 2017-06-22T22:09:16Z

Probably you might already know this all but maybe it is still somewhat helpful answering your questions on how to make FAIR data real from what I have learned from the our almost 800 questioned scientists in a brief overview.

To what extent are the FAIR principles alone sufficient to reduce fragmentation and increase interoperability? The principles have a great potential to influence the minds of stakeholders towards more efficient data sharing and reuse, but perhaps additional measures and more specifics are needed to guide implementation?

It should be clearly said there is also research data beyond "publication supporting data", which is often enough ignored. From our survey there was admittedly a very strong disciplinary difference, but especially in life and natural sciences we had a strong demand (every second) for publication of "negative" results. In other discplines it was said, that "there is no negative result". An example: Dozens of researches try the same synthesizing pathway for a new molecule in a "standard way" or "second standard way", but fail and nobody publishs it, because it is normal that "other ways" are needed (that is their science to find them) and the negative result is not "publishable". A simple trustworthy entry in a database of failure would be enough for them, but currently the knowledge is just lost. It is not a big deal in the single case, just some hours of work, but it happens thousands of times.
So the question must be answered: "What is the data, that needs to be FAIR?" 45% of our researches said they could benefit "much" or "very much" from some kind of "negative data", but I do not see it coming automatically by being FAIR alone. The disciplines know in principle what could be needed, but are not able to change their "credit system". I think there shoud be funding offers to disciplines to think about their data and work on such things as a whole.
"Reproducibility" is often misunderstood. It should be more "press the button, there it comes" (and it is possible to exchange input data) and not "Read the paper and source code, there is all you need." It should be at best so, that the man on the street "can reproduce" science, because the scientist from another discipline is exactly like the man on the street. And by the way, the man on the street could benefit, too.

What are the necessary components of a FAIR data ecosystem in terms of technologies, standards, legal framework, skills etc?

Concerning skills you will surely know the http://edison-project.eu
Although it is not my preferred solution, maybe we need an EU-wide copyright collective to pay scientific software that is needed to reproduce science each time some work is "reproduced" by people not having access to necessary software. I am sure there could be a contract for widely used software or only software will be allowed to use in funded projects that takes part in such a contract. But of course, someone has to pay that little money (and misuse has to be prevented in a smart way). I suggest this with some qualms, because I would prefer more open solutions, but maybe we can't have our cake and eat it, too.

What existing components can be built on, and are there promising examples of joined-up architectures and interoperability around research data such as those based on Digital Objects?

In Germany we begin to have federated archiving-place in some federal states (e.g. in Baden-Württemberg). This makes sense for Germany, because the federal states mainly pay their universities. However, for each "membership" in e.g. European, national or other other networks, the universities will automatically try the most synergetic approach and therefore will make these things compatible. This is not a "no-brainer" and sometimes a project on its own, but it is happening at such melting points.
Secondly I would heavily build on repositories, because they are close to their community needs.

Do we need a layered approach to tackle the complexity of building a global data infrastructure ecosystem, and if so, what are the layers?
Which global initiatives are working on relevant architectural frameworks to put FAIR into practice?

I am not sure if we need layers, but the roles must be clear for all players. We should not get into a situation with scientists are running in circles asking for payment or just the "delivery" of a service (like data deposition) because each asked "station" in the circle feels irresponsible.
So please make clear among all players what exactly universities, project funders and disciplinary or EU-solutions should offer their scientists (e.g. as condition for some participation/cooperation)

A large proportion of data-driven research has been shown to not be reproducible. Do we need to turn to automated processing guided by documented workflows, and if so how should this be organised?

In my opinion this must definitely come, but strictly driven by the scientists, not from information centers. Scientists should be allowed to take part at programs to develop their disciplines automation, if they are a "relevant mass" in their disciplines and have good support from central information infrastructures. There is a 4 sided paper of RFII in Germany (http://www.rfii.de/?wpdmdl=2269 - you know it I guess, although it is in German). It is impossible for me to overemphasise the importance of the scientists role. The performing scientists of such automation projects should talk to and include "all" scientists in their discipline to avoid isolated solutions. This should be absolutely mandatory.

What kind of roles and professions are required to put the FAIR principles into place?

There should be a main data or "FAIR" manager at each university (even if it is just a title at the start).
Maybe (as a rough idea) we need to push forward the profession of "replication science" as an own science with own professorships.
Let us try the analogy with industry: This professorships for "data replication science" are comparable with a specialised "quality control", but for the good "data". This is different from the current approach where the production lines somehow check each other. The new guys would focus on checking only the data. These people could give valuable feedback and impact for the data science field. Science is an industry with high quality and sensitive products.
There are also totally different "data scientists" we also need, which would be more comparable to "supply chain management" in industry, which we currently totally ignore. A company with complex products ignoring this would fail fast today (ask your factory next door). Well, I think scientific data is a complex product. So please consider a supply chain management profession perspective, too and transfer it from factories to universities and scientific data.

ghost · 2017-07-03T11:20:05Z

Dear Frank,
I am not familiar with your community research at KIT (?). Can you please share your published work here? Starting in August 2017 I will execute case studies in a few subjects here at TU Delft and would like to know more about your work on the topic.

Many thanks in advance!
Jasmin

FrTr · 2017-07-03T12:35:10Z

Dear Jasmin,

in German we have a report and user stories online.
There are also some slides in English, that summarize the results, but of course cannot go into details.
Section 3.3.2 of this work covers a few aspects of our report in English.
Unfortunately I didn't have time or funding to translate the report into English (it has 150 pages). The report was made for our German funding institution (Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg (MWK)).
If you have special questions, you can also directly contact me. My or my successors contact information is here on the right side.

CaroleGoble · 2017-07-31T19:56:02Z

BioITWorld FAIR Hackathon http://www.bio-itworldexpo.com/fair-data-hackathon/ also focused on FAIR approaches to Pharmaceutical data.

Incidentally the latest IMI call is about Fairification
https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/imi2-2017-12-02.html,

CaroleGoble · 2017-07-31T20:55:37Z

Presentation Title: FAIRShake: Toolkit to Enable the FAIRness Assessment of Biomedical Digital Objects

Abstract: While it is clear that there will be a benefit in making biomedical digital objects more FAIR, the FAIR principles are abstract and high level. FAIRShake brings these principles into practice by encouraging digital object producers to make their products more FAIR. The FAIRShake toolkit is designed to enable the biomedical research community to assess the FAIRness of biomedical research digital objects. These include: repositories, databases, tools, journal and book publications, courses, scientific meetings and more. The FAIRShake toolkit uses the FAIR insignia to display the results FAIR assessments. The insignia symbolizes the FAIRness of a digital object according to 16 FAIR metrics. Each square on the insignia represents the average answer to a FAIR metric question. The FAIRShake Chrome extension inserts the insignia into web-sites that list biomedical digital objects. Users can see the insignia and also contribute evaluations by clicking on the insignia. It is also possible to embed the insignia without the need for a Chrome extension and initiate FAIR evaluation projects using the FAIRShake web-site directly. Currently, the FAIRShake web site enlists four projects: evaluation of the LINCS tools and datasets, evaluation of the MOD repositories, evaluations of over 5,000 bioinformatics tools and databases, and evaluations of the repositories listed on DataMed. The project is at an early prototyping phase so it is not ready for broad use.

peter-wittenburg · 2017-08-10T15:36:38Z

Frank raises a couple of different aspects - some have been commented by others. Let me try to find my way. I should add here that I have some knowledge of what is being done at KIT and we had some collaborations - also on questions Frank is raising.

Is FAIR sufficient to prevent fragmentation?
You refer to publishing also negativ results which is indeed a discussion in many if not all disciplines. But the FAIR principles do not go about this question: they only state that if you produce data make them FAIR. This includes negative results. It is more of a social problem that many researchers hesitate to publish data which did not lead to clear results. So FAIR does not address the social aspects or?
Don't know whether I can share your view about "reproducibility". You are asking for the ideal solution correct? Would be wonderful. But currently even your second option does not work which is a desaster for science.
Components of a FAIR data ecosystem?
Yes skills are needed desperately and it is a pity that Edison will not be continued if it is correct what I heard - please correct me if I am wrong. Your second point is interesting, can't see the direction at this moment.
Architecture Examples
Yes indeed together with KIT and others we built the EUDAT federation amongst others and KIT as others are involved in several infrastructures. Not quite sure Frank what you mean with "most synergy". I just had an interaction with one of the large German companies who start building federation environments for data (they call it differently). When I asked him about RDA and standardisation he argued that standardisation would ruin his business modell, since "heterogeneity" means money in economic terms. Standardisation would mean reduction of costs. RDA (as all the other standardisation initiatives) wass set up with the intention to harmonise and thus reduce costs. Whether looking for synergies will automatically lead to compatibility I dare to doubt. Compatibility in "economic" terms at this moment means for companies to try to convince their clients and other that there solution is the best, changes would cost time and money. So they remain with a silo.
Your second remark is absolutely right. Some time ago we looked around in all the ESFRI research infrastructures and it is the network of repositories (some call it centers with some additional tasks) is crucial for almost all of them.
Layered Approach?
You are absolutely right that when it comes to an interaction with the users - there should be one interface and a clear assignment of roles. But this is not meant here. We were speaking about systems design. How to get a complex system done so that it is compatible. I just had another chat with one of the two founders of Internet. They just throw TCP/IP on the floor and showed that it works for message exchange and routing without any further going claims. It was the evolution that led us from FTP to the Web. If we now look at some initiatives such as IIC or IDS they come up with coherent and comprehensive arhitectures to guide infrastructure development - so a slightly different approach as it seems. How do we get ahead and overcome all this fragmentation?
Automatic Processing
We made a large survey in Europe 3 years ago intercting with about 120 departments etc. and we found out that data scientists speak about 75% of their time losing for data finding, integration etc. A colleague from MIT (M. Brodie) reported about a study where data scientists reported 80% of their time being vasted. So it is obvious that we cannot go on like this and everyone knows. Why is it so difficult to change? We we re given two major answers: 1) In cancer research for example there are so many variants, parameter, etc. choices that it is difficult to create a workflow framework that would really help. Much work is still ad hoc scripting etc. 2) There is a lack of people who can really develop these kind of flexible workflow systems - it is almost an art :)
So yes it seems that we are on the same page here, but developign flexible workflows is tough.
Professions
Let me be brief here since it is not my favorite area and Edison has worked out a nice classification of job profiles - yet we are far away from getting them into practice. Your idea of a FAIR manager is a bit what the colleagues behind Go-FAIR are dreaming of. At least German ministry decided to fund a FAIR node - whatever it will do.

So thanks for your great input which we need to consider in the report.

sjDCC · 2017-08-20T18:55:48Z

Thanks for the FAIRShake reference @CaroleGoble I've found a link to a short video on youtube but if you have other literature references we should follow up that would be great.

peter-wittenburg · 2017-08-21T09:27:59Z

I just looked at the FAIRshake video and it is indeed pritty cool. if I got it right, it's finally the crowds view on the fairness of DOs. So this makes it complementary to approaches such as DSA/WDS where people do self-assessment based on rule sets.
Thanks Carole
Peter

Daniel-Mietchen added research culture FAIRness labels Jul 3, 2017

Daniel-Mietchen added the case study label Jul 3, 2017

sjDCC assigned sjDCC and peter-wittenburg Aug 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making FAIR data real - The community experience #12

Making FAIR data real - The community experience #12

FrTr commented Jun 22, 2017

ghost commented Jul 3, 2017

FrTr commented Jul 3, 2017

CaroleGoble commented Jul 31, 2017

CaroleGoble commented Jul 31, 2017

peter-wittenburg commented Aug 10, 2017

sjDCC commented Aug 20, 2017

peter-wittenburg commented Aug 21, 2017

Making FAIR data real - The community experience #12

Making FAIR data real - The community experience #12

Comments

FrTr commented Jun 22, 2017

ghost commented Jul 3, 2017

FrTr commented Jul 3, 2017

CaroleGoble commented Jul 31, 2017

CaroleGoble commented Jul 31, 2017

peter-wittenburg commented Aug 10, 2017

sjDCC commented Aug 20, 2017

peter-wittenburg commented Aug 21, 2017