-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MANGO Annotation Scope #18
Comments
Any usage that mixes columns together (e.g. error matrix, columns grouping) |
If you change the model in a way that breaks the backward compatibility you will get concrete operational consequences whatever the way you associated model with data. |
At least 2 reason to targeting this:
|
I DO NOT want this JSON serialisation. It is both an example for our discussion and a convenient way to exercice and to validate my proposal. |
At least a clear point of agreement |
For TS or spectra I send you back to the @mcdittmar responses For the catalogcase, let's talk MANGO. Mango is a simple model with 2 docks (container).
The content of those docks is totally free (non-entangled components?)
The are designed in a way to carry any meta data we need to to perfecly describe any measure. So that, a Mango instances are self-consistant. If by some magic you need to handle some out of the VOTable scope (SAMP, datalink...) I'll expect them to be complete.
This is not false, but this is an annotation issue. If I've a unit in some model leaf, my annotation scheme must be able say that this unit comes from that FIELD. |
On Fri, Mar 19, 2021 at 11:01:29AM -0700, Laurent MICHEL wrote:
> we do in the model has concrete operational consequences. Which, mind you, is
> fine -- we'll have to deal with them somewhere and the DM is the
> right place for that.
If you change the model in a way that breaks the backward
compatibility you will get concrete operational consequences
whatever the way you associated model with data.
Yes, but the question is: Will changing one model in this way take
entire rest of the annotation with it or will the remaining
annotation keep working? This is what the entanglement problem
is about.
Additionally, in the explicit-annotation scheme, it's simple to keep
the old annotation around (it's just one extra INSTANCE, and as you
see insteresting columns can very simply have a dozen annotations),
so there's no problem writing VOTables that just work for old and new
clients as long as you still care to keep old clients operational.
|
On Fri, Mar 19, 2021 at 10:52:13AM -0700, Laurent MICHEL wrote:
> So, perhaps a clarification: is my time series use case "single
> column annotation", and if so, why? What actual usage would go
> beyond what's possible there?
Any usage that mixes columns together (e.g. error matrix, columns grouping)
Could you recommend a specific one that I should tackle to show that
this kind of thing is of course possible with explicit referencing?
|
|
IMO, the annotation must be faith to the model, but do not require the model to be totally mapped. If you are saying that clients must be updated to take advantage of new model features, you are right, whatever the annotation scheme is, this is just because. |
On Mon, Mar 22, 2021 at 08:52:59AM -0700, Laurent MICHEL wrote:
> Yes, but the question is: Will changing one model in this way take
> entire rest of the annotation with it or will the remaining
> annotation keep working? This is what the entanglement problem
> is about.
IMO, the annotation must be faith to the model, but do not require
the model to be totally mapped. Only data present in the dataset
If this means that we need to be very careful with what attributes we
make mandatory in our models, I totally agree.
have to be mapped. The rest can (must) be ignored. The mapping
block represents a subset of the model. If the model changes keep
the backward compatibility, the 'old' annotations remain consistant
and the interoperability between dataset mapped with different DM
versions is preserved.
Yes -- that's a minor version. These aren't a (large) problem, and
indeed I'm claiming that our system needs to be built in a way that
clients don't even notice minor versions unless they really want to
(which, I think, so far is true for all proposals).
If you are saying that clients must be updated to take advantage of
new model features, you are right, whatever the annotation scheme
is, this is just because. `new model class => new role => new
processing`.
No, that is not my point. My point is what happens in a major
version change. When DM includes Coord and Coord includes Meas and
you now need to change Meas *incompatibly* ("major version), going to
Meas2 with entangled DMs will require new Coord2 and a DM2 models,
even it nothing changes in them, simply to update the types of the
references -- which *are* breaking changes.
With the simple, stand-alone models, you just add a Meas2 annotation,
and Coord and DM remain as they are. In an ideal world, once all
clients are updated, we phase out the legacy Meas annotation. The
reality is of course going to be uglier, but still feasible, in
contrast to having to re-do all DM standards when we need to re-do
Meas).
|
On Mon, Mar 22, 2021 at 08:32:46AM -0700, Laurent MICHEL wrote:
> Could you recommend a specific one that I should tackle to show that
> this kind of thing is of course possible with explicit referencing?
- Column grouping
[here](https://github.com/ivoa/dm-usecases/tree/main/usecases/column_grouping).
This based on a real Vizier tabke
I'm afraid I don't really understand this use case: what are clients
expected to do with this grouping information? Without that, it's
hard to make any meaningful annotation.
Looking at your annotation, I'm wondering in particular: which client
should consume the ucd, description and unit annotations from the
INSTANCE-s rather than from the FIELD-s where they already are, and a
lot more easily accessible?
- Error matrix:
[here](https://github.com/ivoa/dm-usecases/tree/main/usecases/precise_astrometry).
This is based on a mock VOtable that I wrote to test my code. The
real use case if Gaia and testing this feature on it is still
planed
I've added an annotation to this table and made a PR. I still
believe we don't have a credible use case for annotating covariances
yet, which is why I'm using "meas2": Once clients start doing
interesting things with DM annotations, we can, I believe, start
thinking about doing tricks like these. Having said that, I've
written code that uses this annotation to do something halfway
interesting using my astropy annotation implementation:
https://github.com/msdemlei/astropy#working-with-covariance
|
Again do not mix model and annotation
|
Continued in #24 |
On Fri, Mar 19, 2021 at 11:08:24AM -0700, Laurent MICHEL wrote:
At least 2 reason to targeting this:
1. I would be happy if i could develop my client just by reading
the model spec without fighting with VOTable elements (supposing
that someone provided me with a low level library doing the dirty
job)
Hm -- complicating things a great deal to perhaps simplify standards
development a bit doesn't sound like a good deal to me.
Wouldn't you agree that out in the field, people should be taking the
annotation from the VOTables? If what you're saying instead is "aw,
VOTable is inconvenient, let's invent something else that people
should be consuming", I'd become fairly nervous.
Mind you, there's nothing wrong with thinking of alternative
representations of this stuff, and indeed, for DaCHS I'm already
telling people to add the annotations in a quick and compact way --
http://docs.g-vo.org/DaCHS/ref.html#annotation-using-sil --, but that
shouldn't drive our design. Let's not complicate matters even more
by imagining we ought to magically fix, say, CSVs (where, of course,
that's still possible by inventing a clever scheme in the spirit of
FITS+, but that ought to be an afterthought).
2. The comparison between 2 datasets is straighforward if the
quantities 100% certified model compliant.
But wouldn't such a comparison happen in a client after it's parsed
and deserialised the instances into whatever representation it
chooses? Where would such an abstract "normalise-and-compare"
operation play a role?
|
On Wed, Mar 24, 2021 at 08:03:45AM -0700, Laurent MICHEL wrote:
> I'm afraid I don't really understand this use case
- This is a Vizier usecase, more to say.
Yes, but what *is* the use case, i.e., what sort of functionality
should be enabled? Without that, it'd exceedingly hard to say
anything.
- I repeated several time that the model must be self-consistance
and independant of any particular dataset.
(see other issue #18)
- I would say that the issues page is not the right place to
question one of the use cases that have been proposed and validated
about 2 months ago.
Perhaps, but I'd say a use case must be explicit on *use*, which I
submit entails saying "A client wants to..." or something similar.
Is there something like that for this grouping thing?
|
See the Wiki post by Gilles. Another use case, a bit aside, in shown here as an alternative to define (in)dependant axes. |
I won't say that using annotations faith to the model is complicating things. It is rather the opposite.
Yes I do, I even plead for these annotations, read in the VOTable, to bear the structure of the model.
You are pointing the root of our disagreement: I do not say that you approach is not appropriate, but I claim that it makes the job more tough for clients for a little benefit whereas you way to consume data do work with my annotation sheme. This is not a good deal. The way out of this discussion is likely somewhere in this topic |
On Fri, Mar 26, 2021 at 01:08:44AM -0700, Laurent MICHEL wrote:
See the Wiki [post](https://github.com/ivoa/dm-usecases/wiki/mango) by Gilles.
Some catalogs may have columns that give extra information about a
particular quantity (e.g. quality flag, statistical sample
size...). A client could hide such associated information at the
first stage and then show them up on demand (e.g. with a tooltip)
Gilles' original example has a clear use case -- that's measurement
annotation with "plotting error bars" and in a bright future perhaps
"automatic error propagation".
A Measurement model that does not needlessly multiply the number
of classes by mingling in various sorts of physics covers this use
case perfectly (and note that no work needs to be done if someone has
a new sort of thing with errors in that case, and clients still don't
have to put up with vague "columns related in some way" annotation).
The examples further down "limit flags or notes", "flags on
magnitudes" can be trivially solved by a class (say) relatedData
that I'd probably put into a source DM (but perhaps we could find a
better place for that once we better understand where this kind of
thing actually happens).
The annotation would then trivially be:
```
<INSTANCE dmtype="src:relatedData">
<COLLECTION>
<ITEM ref="themag"/>
<ITEM ref="flag_on_the_mag"/>
</COLLECTION>
</INSTANCE>
```
Should I put this into the dm-usecases repo? It almost seems a bit
too trivial to me...
In
http://viz-beta.u-strasbg.fr/viz-bin/Mango?-out.max=10&-source=I/322A/out&-out.all=1,
it seems you're doing something very much different from grouping
different columns. The associatedDataDock annotation looks more like
an "associated link" thing, and I'd respectfully ask that you check
again if this is something for DM annotation or if this wouldn't be
much better addressed using Datalink -- it sucks for everyone if
there's multiple ways to do (about) the same thing.
Another use case, a bit aside, in shown
[here](https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/modelinstanceinvot-mapping/gaia_multiband.annot.xml)
as an alternative to define (in)dependant axes. The independant
axis is represented by a parameter and all the dependant axes are
its associated parameters.
Hm... to me, that's a bit of an argument against doing this. If this
"related columns" thing lets people do what we thought ndcube should
be doing, then I'd say one of the two should go.
|
|
This issue is a fork of #12 that diverged from the initial
dependant axes
topicLast message (#12 (comment)):
On Fri, Mar 19, 2021 at 07:23:56AM -0700, Laurent MICHEL wrote:
So, perhaps a clarification: is my time series use case "single
column annotation", and if so, why? What actual usage would go
beyond what's possible there?
Well, the thing with dmrole and dmtype to me is the annotation, but
I think what you're saying here is that the annotation should be
directly derived from the model.
That I wholeheartedly agree with,
and that's why I'm so concerned about the current MCT proposal -- if
it were some abstract musing, I'd be totally ok with it. But when
the model defines the annotation structure. whatever we do in the
model has concrete operational consequences. Which, mind you, is
fine -- we'll have to deal with them somewhere and the DM is the
right place for that.
...and I still cannot figure out why you want this -- after all, the
point of the whole exercise IMNSHO is to add information to VOTables
(and later perhaps other container formats) that is not previously in
there.
What would the use case for your free-floating annotation be, if this
is what your are proposing?
-- but why would you want to do this JSON serialisation? Wouldn't it
be much better overall to just put that value into a VOTable and
transmit that rather than fiddle around with custom JSON
dictionaries? In particular when there are quite tangible benefits
if you make it explicit in the model what exactly it is that you're
annotating?
By the way, if by "wouldn't require Python classes" you mean "You
don't have to map model classes into python classes" then yes, I
agree, that is a very desirable part of anything we come up with.
Let's avoid code generators and similar horrors as much as we can.
Nobody likes those.
I agree to all these use cases (except, as I said, even for basic
quantities the gain is enormous because we can finally express
frames, photometric systems, and the like in non-hackish ways).
But: which of these use cases would you miss with the non-entangled,
explicit-reference models?
The text was updated successfully, but these errors were encountered: