-
Notifications
You must be signed in to change notification settings - Fork 0
/
en.search-data.min.1602afe6ab5291339601cb2d29400e16ce8c3088c2a53639eaffd231d5504df4.json
1 lines (1 loc) · 21.5 KB
/
en.search-data.min.1602afe6ab5291339601cb2d29400e16ce8c3088c2a53639eaffd231d5504df4.json
1
[{"id":0,"href":"/docs/guide-development/","title":"Guide Development","section":"Docs","content":" Workflow development Guide # Workflows are all things that can be computed, broadly speaking.\nFor reproducibility, we want our workflows to be repeatable: producing the same output every time they are computed. This is easy enough to do in first approximation, but might be harder to achieve than it seems when the workflow relies on external resources. But we track every execution so it is not necessary to be overly concerned about these delicate details at every moment.\nGenerally, we want the workflows to be parametrized. Non-parametrized but strickly repeatable notebooks are less reusable since they always produce the same output data.\nOne way to create them in ODA is to build jupyter notebook.\nSimple ODA Jupyter Notebook to a workflow # Write a working repeatable workflow # First you need to make sure your notebook runs in a cloud environment. It needs to be repeatable - i.e. you can run it many times. If it depends on external services - try to make sure the requests are also repeatable - you might need to specify sufficient details. If the notebook does not produce the exactly the same result every time - it\u0026rsquo;s unfortunate, but do not worry too much, it might still be reproducible (see motivation on the difference between reproducibility and repeatability)\nwrite your notebook, and make sure it runs from top to bottom make a requirements.txt will the modules you need for this notebook You can use a mock lightcurve notebook as an example.\nParametetrize the notebook # Create a cell with the following tag \u0026ldquo;parameters\u0026rdquo; (see papermill manual):\nthe names of the declared variables will be used as parameter names in the MMODA service (except the default parameters, see below) if not annotated, the types of the inputs parameters are determined based on the parameter default value one can annotate the input parameter by putting comment with the term from the ontology. Default parameters # Several default common parameters are always set by the MMODA frontend. These include:\nType annotation Parameter default name http://odahub.io/ontology#PointOfInterestRA RA http://odahub.io/ontology#PointOfInterestDEC DEC http://odahub.io/ontology#StartTime T1 http://odahub.io/ontology#EndTime T2 http://odahub.io/ontology#AstrophysicalObject src_name If notebook contains parameters anotated with these types, their names will be automatically converted by the dispatcher plugin to the default ones. If some of them are ommited, they will be added to the list of workflow parameters automatically.\nNote\nNote that both target (Point of Interest) source name and target source coordinates are passed to the workflow, and in principle there is no guarantee the coordinates are that of the source. Indeed the exact choice of the coordinates for a given source depends on the energy band, desired precision, etc. For now, we leave is up to the workflow developer to reconcile these parameters.\nAnnotate the notebook outputs # define the notebook output, similarly creating cell with tag \u0026ldquo;outputs\u0026rdquo;. outputs may be strings, floats, lists outputs may be also strings which contain filenames for valid files. If they do, the whole file will be considered output. if you want to give more detailed description of the notebook input and output, use terms from the ontology. Adding annotations the entire notebook # Annotations can apply to parameters or entire notebook. In both cases they are kept in the notebook cell tagged parameters. For example:\n# oda:version \u0026#34;v0.1.1\u0026#34; # oda:reference \u0026#34;https://doi.org/10.1051/0004-6361/202037850\u0026#34; source_name = \u0026#34;Crab\u0026#34; # oda:AstrophysicalObject reference_energy = 20 # oda:keV Publish your workflow as a test service # publish the workflow to RenkuLab in the dedicated group: https://renkulab.io/gitlab/astronomy/mmoda/ add a \u0026ldquo;live-workflow\u0026rdquo; topic. once some bots do their job, the workflow will be automatically installed in MMODA (by default, on a staging instance), and you will recieve an email! Try to access your new service # Assuming lightcurve-example from above was used, and the notebook name was random, you can run this: $ oda-api -u https://dispatcher-staging.odahub.io get -i lightcurve-example -p random -a n_bins=5 TODO: workflow version, plot here and in renku create\n(optional) Try a test service # install nb2workflow tooling pip install 'nb2workflow[cwl,service,rdf,mmoda]\u0026gt;=1.3.30' --upgrade. Note that his command should be the only one you need to install the necessary dependencies for the workflow engine. You may of course also need some domain-specific packages . inspect the notebook nbinspect my-notebook.ipynb try to run the notebook nbrun my-notebook.ipynb it will use all default parameters you can specify parameters as nbrun --inp-nbins=10 my-notebook.ipynb, if nbins happens to be one of the parameters. try to start the service nb2service my-notebook.ipynb Note\nif you experience issues testing the service due to some \u0026ldquo;import error\u0026rdquo; or other strange messages try containerized service (note that it will not work in Renku):\nnb2deploy $PWD test --local then, look onto http://0.0.0.0:8000 for some metadata about the service try to run some simple queries in http://0.0.0.0:8000/apidocs/ Note\nIf you still experience issues with local environment, try to develop the workflow directly in renkulab - note that some commands, like nb2deploy, will not work in this case.\nDeveloping service in Renku # https://renkulab.io/\nTODO: explain how to run server\n(optional) Add some verification test cases # To make sure your service does not break with future updates, it\u0026rsquo;s useful to express some assumptions about the service outputs in some reference cases. They will be tested automatically every time new workflow version is installed.\nwe will explain later how to do this.\n"},{"id":1,"href":"/docs/guide-discovery/","title":"Guide Discovery","section":"Docs","content":" Workflow Publishing and Discovery with KG: Astronomer Guide # Latest-Version https://github.com/oda-hub/workflow-discovery/, also deployed as https://odahub.io/ Purpose of this note # We want to demostrate on concrete and scientifically-useful working examples how an astronomer, who might indeed have relatively little interest to look in the code, can leverage ODA Knowledge Base and Knowledge Graphs together with other valuable resources (especially Renku):\ncollaborate on workflows discover and use ODA-built services discovery and use our record of globally available web-based data analysis services easily contribute your own analysis as web-servces annotate your work in the ways ready for consumption by the synthetic astronomer robots, making some of the reasonable reasoning for scienstics, and most of all, support scientists with discovery space empowering their irreplacible scientific capacities. It\u0026rsquo;s clear that much of this functionality is available in other frameworks, usually custom for purpose. We make use of many the re-usable open-source technologies, to support an ecosystem of tools and other developments, which can be re-used between some of our projects.\nDeveloping the workflow # The simplest way to build a workflow is to write a jupyter notebook. We will not go in every details here, see dedicated guide for step-by-step instructions.\nInstead, this document here focuses on workflow annotation, publishing, and discovery. These features are powered by an RDF Knowledge Graph. What exactly is stored in the Knowledge Graph, is described by the ontologies (which are themselves stored in some KG).\nOntology # We will describe here the simplest elements of the ontology, which are necessary for workflow annotation. We will not go into details about how to define various constrains and relations on/between things here.\nOntology describes relations between some things, terms (represented as RDF URIs). URI can look like a URL, e.g. https://odahub.io/ontology/sources/Mrk421 (the URL may or may not be leading to a real location, although it generally should). The URI can be also shortened, assuming a namespace prefix:\nPREFIX odaSources: \u0026lt;https://odahub.io/ontology/sources#\u0026gt; (see some default list of prefixes here)\nThis way, https://odahub.io/ontology/sources/Mrk421 becomes odaSources:Mrk421.\nIt is necessary to annotate the workflow with these terms. Specifically, to make relations between the workflow and these terms. Relations have a form of simply propositions, expressed as subject-predicate-object triples.\nFor example\noda:sdssWorkflow oda:isImportantIn oda:radioAstronomy . or\noda:sdssWorkflow astroquery:uses astroquery:sdssArchive . oda:sdssWorkflow oda:isAbout odaSources:Mrk421 . Consider that it is benefitial to use terms already used by other people, described in existing ontologies. This way we speak in the same language as other people, and will be able to more easily combine our resources. However, it can be quite an effort to understand what other people meant, which is necessary to use their terms correctly. This effort should be made conciously when possible. It is advisable to also discuss unclear points withing our group, and come to a common solution.\nParticular attention should be paid to International Virtual Observatory (IVOA) vocabulaires. See their rdf vocabularies here: https://www.ivoa.net/rdf/index.html and references therein. Developments used in variety of tables managed by CDS-Strasbourg, where much of the needed terms for astrophysical entities can be found (one can start here).\nIt is however, often important to adopt project-specific narrowed-down scope. For example, our understanding of what an AGN is, may differ from that of CDS-Strasbourg. Which is why, in unclear cases, we should not hesitate to use custom terms, such as odaSources:AGN. Then, we can also model and encode equivalence between our own understanding of the AGN with that of CDS. For example, as so:\nodaSources:Mrk421 oda:isSubclassOf odaSources:AGN . odaSources:AGN oda:equivalentTo cds:AGN . Later, these equivalences can be reduced under specific assumptions: for example some agent may assume that oda:equivalentTo implies literal substitution in all contexts.\nWorkflow inputs # For our purposes, the most important workflow properties are set by their inputs and outputs.\nWe will use nb2workflow (commands below will need pip install nb2workflow) to add addition details and instrospection on the workflow notebooks.\nname_input = \u0026#34;Mrk 421\u0026#34; # name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName radius_input = 3.0 # arcmin They can be see for example with\n$ nbinspect final.ipynb ... \u0026#34;name_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;default_value\u0026#34;: \u0026#34;Mrk 421\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;name_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;str\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: \u0026#34;Mrk 421\u0026#34; }, ... \u0026#34;radius_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; arcmin\u0026#34;, \u0026#34;default_value\u0026#34;: 3.0, \u0026#34;name\u0026#34;: \u0026#34;radius_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://www.w3.org/2001/XMLSchema#float\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;float\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: 3.0 } ... Notice how in the first case, owl_type (OWL being the ontology definition language) is derived from the comment. And in the second case, it is derived from the variable type.\nTo generate a graph from the notebook:\n$ nb2rdf final-an.ipynb final.rdf This graph can be viewed, for example, with WebVOWL. It can also be published in a common location (see nb2service --help).\nTODO: show command to publish\nWorkflow properties from capturing workflow behavior # Renku (with a renku plugin) is currently able to deduce that workflow uses some algorithms, providing basis for useful automatic annotation. Current plugin is dedicated to ML algorithms.\nWe expect in the future to make an ODA-specific (or, more generally, astrquery-specific) renku plugin.\nOther Domain-specific knowledge # It is possible to assign any other characteristics to the workflow. It should be seem in case what makes sense. We used oda:importantIn predicate to assign relevance to some domains, e.g. domain:transients. In many cases these predicates can be assigned based on reasoning rules.\nFrom Workflow to Web-Based Data Analysis # We made a simple tool to present an HTTP service executing given notebook on demand. See https://github.com/oda-hub/nb2workflow\nReasoning workflows # Reasoning is transformation of knowledge.\nSince our knowledge base is the knowledge graph, our reasoning is transformation of the knowledge graph.\nI know, it may sound ambitious and unreasonable to claim capacity of out platform to reason. However, this terminology has been accepted in the community. This restricted form of general reasoning.\nIngesting data into the graph also transforms the graph. Moreoever, workflows ingesting data may be guided by the present, graph content. When possible, we separate reasoning from external source ingestion by ingesting first, and reasoning later: this allows to preserve. But it is not always feasible.\nReasoning is performed by executing these reasoning workflows: in response to external triggers, or just regularly.\nWorkflow exection # Curiusly, it is very convenient to see worklow execution as reasoning. See more details.\nOther reasoning rules # Various standard reasoning rules can be applied.\nLiterature # Literature parsing # Simple workflows to read astronomical and arxiv publications and produce some RDF.\nhttps://github.com/oda-hub/literature-to-facts\nLiterature building # Integrating data into paper. Adding another compile step deriving data from various sources (a lot of the time - workflow executions) and producing macroses for the latex.\nMade use-case first, for the easist possible latex work.\nhttps://github.com/oda-hub/linked-data-latex\nHuman interventions into the KG # Human agents are first-class citizens in the ODA KB/KB, on paar with the automated workflows. Humans are not very reproducible, but provide unique intuitively-guided inputs, owing to their own built-in very large but a bit vague Knowledge \u0026ldquo;Graphs\u0026rdquo;. Key aspect of our development here is to allow data and workflow interoperability. It is only natural that we are concerned with human-ODA unteroperability. Technically, we implement human interactions are implemented in the same way workflow executions.\nMost Humans experience the KB through various frontends. These multiple light-weight frontends allow making pre-defined actions, leading to workflow excutions.\nSome pre-built frontends for develoment needs are presented here:\nhttps://in.odahub.io/odatests/\nViewing computed workflows # As described in the details on reasoning engine computed workflows are fully curryied workflows are equivalent to simple data-fetching workflows.\nAdding a workflow # it should be as simple as pushing a button. They could be synchronized from Renku. If Renku will provide simple a limited public graph, we could directly use it, without reproducing it part of it in ODA KG.\nExample of adding new workflow which reacts on astro transients # TODO\n"},{"id":2,"href":"/docs/guide-ontology/","title":"Guide Ontology","section":"Docs","content":" Ontology # Purpose # Ontology defines terms in which we describe what we do: data, workflows, publications, etc.\nWhile creating and discovering workflows it is useful to learn to speak in these terms: find and assign suitable annotations. In an increasingly large number of cases we identify and assign annotations automatically.\nThe tools we develop commit to implement the common understanding of the terms.\nDiscovering terms to use # The terms look like URLs, e.g. https://odahub.io/ontology/#AstrophysicalObject . These URLs can be directly pasted in the browser, leading to some description:\nWe advise to look into public ontologies like https://www.ivoa.net/rdf/object-type/2020-10-06/object-type.rdf and https://odahub.io/ontology/ for the available terms. If it looks like there is nothing suitable there - it may be necessary to introduce new terms.\nAdvanced: It is also possible to look into an interactive graph explorer http://graphdb.obsuks1.unige.ch/ and https://share.streamlit.io/oda-hub/streamlite-graph/javascript-lib-interaction/main/main.py .\nAdding new terms to the ontology # Sometimes, it is necessary to add a new term. In principle, Workflow Developer may add a new term at will - it is their own understanding of what is being labeled. But the term will not be fully used until it is related to other terms in the Ontology, which is done either automatically or by the Ontology Developers.\nOntology Developers can improve the common ontology with http://webprotege.obsuks1.unige.ch .\nThere is also an experimental edit interface here, the edited result should be stored and uploaded manually.\n"},{"id":3,"href":"/docs/issues/","title":"Issues","section":"Docs","content":" What if a user experiences a problem? # Purpose # Explain to users how issues are handled, and what can be expected.\nProcess # user receives a kind message, \u0026ldquo;treatment redirected to humans, follow-up promised\u0026rdquo;. This may be delivered in the interactive session, and/or in the email. issue is addressed by the support, and request can be submitted again. It will be generally pre-computed by the time of the new request. user is informed by the platform that all is good, but it is clear to the user that the result is not satisfactory \u0026ldquo;feedback\u0026rdquo; button anything is unclear please feel free to contact [email protected] Please also consider consulting http://status.odahub.io/ to check for any current problems.\n"},{"id":4,"href":"/docs/reasoning-engine/","title":"Reasoning Engine","section":"Docs","content":" Details about the reasoning engines # Workflows entities in the KG can undergo various transformations. One key transformation is currying, understood in the same way as function currying - since workflow, for our purposes, is very similar to a function. Currying transforms workflow with parameters with workflow with less parameters (arguments), possibly without any parameters. We assume that only workflow without parameters can be computed (executed).\nWorkflow execution is\nThis approach separates:\nworkflow composition, which becomes one of the workflow transformation operations. workflow execution (computing) Reasoning engine is itself a process (workflow) which takes as an input some KG state, and produces new triples (which can be inserted back in the KG).\nCurrying worker # Execution # workflows have a property which describes what can execute them. Two forms used now are\nExecuting, computing the workflow is also a reasoning action, deriving equivalence between the given workflow and a trivial worklow which implements to request to data store.\n"},{"id":5,"href":"/docs/workflow-development-progression/","title":"Workflow Development Progression","section":"Docs","content":" Maintaining semantic coherence in workflow development progression: from jupyter notebooks to python modules, packages, API\u0026rsquo;s # At some point, it may be advisable to move part of code in functions of a python module (e.g. my_functions.py), stored in the same repository. The functions can be called from the workflow notebook as from my_functions import my_nice_function; my_nice_function(argument).\nIf some functions are often re-used, they can be stored in external packages, and even published on pypi (to allow pip install my_function_package).\nSometimes, the function may be in fact called remotely, though API. From the point of view of workflow (e.g. notebook) where the function is called there such a remotely executed function may look very similar to local function from a module, giving similar advantages and posing similar challenges.\nOn should be wary that extracting the functions somewhat obscure content of the workflow, by introducing structure which is not generally automatically traced by workflow execution provenance tracking.\nSo when reusable part of the workflow matures, it may be extracted and treated as another workflow, providing inputs to the current workflow under development.\nIt is not feasible to always design workflow to use other workflows by consuming some pre-computed inputs. As described above in this section, workflow development progression often separates some function from within the workflow, or uses. SmarkSky project and in a way in general renku plugins essentially acknowledges this feature of the workflows: they use external functions from within the code at random locations, possibly calling them multiple times.\nThis additional information about functions called by the workflow can be introduced to the workflow metadata with special annotations (see more about workflow annotation in ODA Workflow Publishing and Discovery Guide), such as oda:requestsAstroqueryService. These annotations should be also include information about parameters used to annotate the workflow. This additional structure associated with workflows will be ingested in the KG. While it can not be directly interpretted as workflow provenance graph, it is possible to produce additional similar-looking graph with inferred provenance, which is different but analogous to strict renku-derivde provenance.\n"}]