-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mechanism to include a list or set of values in a string column #25
Comments
Ad-hoc/practical solutions usually involve defining a delimiter that isn't used/allowed in any of the values. Could VOTable feasibly define a standard delimiter that works across all table serialisations? or would it have to be a mechanism where the writer specifies the delimiter as column (FIELD) or cell (TD) metadata? |
If a common delimiter DALI could define an xtype? eg xtype="word-list" and specify the standard delimiter What about a dynamic xtype, e.g. xtype="word-list-|" where | is the delimiter? That is DALI defines "word-list-{delim}". Given the meaning of xtype as telling the client about some structure in the value- "this is a char* you can interpret as a word-list" - this would be consistent with other use of xtype. Of course, it would only work in cases where a delimiter is known to be safe for all possible values (recall the problem with specifying a null and a streaming output in BINARY). |
I support this idea, I think xtype is a good way to address this long-running annoyance. How about using a newline character ( |
Prior art: ObsCore + TAP outputs the pol_states column as a list of string delimited by | In general I have found | to be a good delimiter that doesn't seem to collide with other uses and require escaping and such). For example, CAOM has several feilds named |
More general problem: WD-DALI-1.2 includes an xtype="multipolygon" which has VOTable metadata:
(or float). A multipolygon is {polygon} {separator} {polygon} ... so we need a delimiter in the double array to separate the component polygons. Component polygons have 6 or more numbers and there are 1 or more component polygons in a multipolygon. Since polygon is supported as a single double (or float) array, this use case is also one of variable length array and variable in both dimensions (just like list-of-string). We currently specify NaN value(s) as a separator because they are valid double values and easily parseable... but it seems like a problem that might come up again so of VOtable supported a little more we would have to say less about parsing MultiPolygon in DALI and more of the parsing would be done by generic code. For example, a generic VOTable parser could do
A parser or application that knew what multipolygon was could use that and convert the raw arrays into a multipolygon object, as could some other structure that was encoded as multiple arrays of numbers. Side notes: the NaN option is usable for double and float but not for fixed point datatypes. |
Just to get the wheels turning a little.... What about something like allowing |
Since we coming around to final WD-DALI-1.2 and an RFC, I would like to resolve this. My current inclination is to go with a pure xtype solution in DALI or as custom xtypes, which would have something like: xtype="words" : space-delimited list of words Of course: each of the above to be discussed individually over in DALI when use cases (usually TAP) are presented. I have no concrete use case for list of The only thing that could happen in VOTable might be to define the "words" and "phrases" xtypes here instead of DALI (debatable). |
On Fri, Nov 18, 2022 at 12:10:12PM -0800, Patrick Dowler wrote:
Since we coming around to final WD-DALI-1.2 and an RFC, I would like to resolve this. My current inclination is to go with a pure xtype solution in DALI or as custom xtypes, which would have something like:
xtype="words" : space-delimited list of words
xtype="phrases" : |-delimited list of phrases
...
xtype="multipolygon" : NaN-delimited list of (simple) polygons
Ah..., sigh... Can we simply implement all this stuff (including
multipolygon) before putting it into DALI and try it for a few years?
All this (in particular including multipolygon) sounds like stuff we
will regret later to me.
That is: Can we just proceed with DALI 1.2 and then have this as "to
be investigated for 1.3"?
The only thing that could happen in VOTable might be to define the
"words" and "phrases" xtypes here instead of DALI (debatable).
I still think VOTable should finally get first class strings and then
string arrays on top of that. We're going to touch VOTable in 2023
anyway (for MIN/MAX). Let's briefly step back and see if we can't do
this properly and without conventions we'll later regret. I mention
in passing that the conventional separator in RegTAP and EPN-TAP is a
hash mark, which worked nicely until someone started to put in
URIs...
Anyway: DALI 1.2 needs to get out of the door, and I don't think
we'll find a good and widely implemented solution to all of this in
the fime frame I'd like to see for DALI 1.2.
|
FWIW, TOPCAT already understands multi-polygons marked as |
My only point here is that I am no longer seeking a VOTable solution to this, hence Of course: each of the above to be discussed individually over in DALI when use cases (usually TAP) are presented. and I didn't even say I would actually bring this to DALI either: xtype solution in DALI or as custom xtypes That was my only point here. |
There seem to be many cases where one wants to put multiple values into a character field, eg:
The columns are currently datatype="char" arraysize="*". Since words tend to be different lengths :-) the multi-dimensional array notation in VOTable isn't really usable (the first dimension has to be fixed) but with word lists the length of and number of words is variable.
The text was updated successfully, but these errors were encountered: