Skip to content

Commit

Permalink
Merge pull request #13 from ivoa-std/linetap-species-table
Browse files Browse the repository at this point in the history
Linetap species table
  • Loading branch information
msdemlei authored Nov 20, 2024
2 parents dcd7c26 + 21fcbf7 commit e97364e
Show file tree
Hide file tree
Showing 4 changed files with 195 additions and 36 deletions.
129 changes: 103 additions & 26 deletions LineTAP.tex
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,16 @@ \section{Introduction}
section~\ref{sect:quantities}, while the mapping between our columns and the
VAMDC-XSAMS Data Model is given in section~\ref{sect:mapping}.

During the development of the standard, a major problem in molecular
spectroscopy turned out to be species nomenclature. The core LineTAP
table sidesteps this problem by identifying species using IUPAC standard
InChIs, a choice unpopular with many practitioners. To facilitate the
use of colloquial species designations (``ethyl alcohol''), this
specification also defines a \textit{species table} associating common
names and sum formulas with InChIs in section \ref{sect:speciestable}.

When accessed using the Table Access Protocol TAP
\citep{2019ivoa.spec.0927D}, the table can be queried using the
\citep{2019ivoa.spec.0927D}, the tables can be queried using the
expressive SQL-derived query language ADQL, while query results are
available in the VOTable format, easily readable by VO client
applications. Line databases accessible in this way can be registered
Expand Down Expand Up @@ -220,6 +228,13 @@ \subsection{Credit}
repository of line data, it should be as simple as possible for users to
give credit to the contributors of line data.

\subsection{Resolution of Molecule Designation}
\label{uc:resolution}

A researcher wants to find lines for the molecule they have been calling
``Methyl Mercaptan'' or designated by a pseudo-structural formula like
\verb|CH3SHv=0| for a long time.


\subsection{Non-Use Cases}

Expand All @@ -235,6 +250,7 @@ \subsection{Non-Use Cases}
\end{itemize}



\begin{table}[hpt]
\hskip -0.05\linewidth
\begin{tabular}{p{0.43\linewidth}cp{0.5\linewidth}}
Expand Down Expand Up @@ -280,7 +296,7 @@ \subsection{Non-Use Cases}
\end{table}


\section{Spectral Line Data}\label{sect:quantities}
\section{Spectral Lines Table}\label{sect:quantities}

Table~\ref{tab:ltcols} gives the columns that make up the LineTAP
relational model. Implementations MUST have all columns given in this
Expand Down Expand Up @@ -379,12 +395,53 @@ \section{Spectral Line Data}\label{sect:quantities}

\end{itemize}

\section{Species Table}\label{sect:speciestable}
\label{ref:speciestable}

The species table is used to facilitate the referencing of molecules. As
there are many summary formulas and colloquial molecule names for common
species (and more than one species may correspond to a given summary
formula and even colloquial name), the resolution of such identifiers to
InChIs is generally non-trivial.

\section{Protocol}
\label{sect:protocol}
\subsection{Queries: LineTAP}
LineTAP's species table contains a mapping between common names and
summary formulas and InChIs. It should be populated by data providers
publishing molecule data to the best of their knowledge. It is
explicitly possible to associate multiple names with a single InChI.
There is no explicit relationship between a species table and LineTAP
tables on a given service, i.e., the presence of a species in the the
species table is not a guarantee that data on it is available from any
table in the service.

For most cases, only the InChIKey is enough to reference a molecule. The InChi
column is present in this table for the case that users want to use it to confirm if the
returned molecule is the one they're searching for.

\begin{table}[hpt]
\hskip -0.05\linewidth
\begin{tabular}{p{0.43\linewidth}cp{0.5\linewidth}}
\sptablerule
\textbf{Name [Unit]} \ucd{UCD}&\textbf{Type}&\textbf{Description}\\
\sptablerule
% GENERATED: python3 make-species-table.py
\texttt{inchikey} \hfil\break\ucd{} & text & \raggedright InChIKey of this species\tabularnewline
\rowsep
\texttt{inchi} \hfil\break\ucd{} & text & \raggedright InChI of this species\tabularnewline
\rowsep
\texttt{name} \hfil\break\ucd{} & text & \raggedright A common name of this species\tabularnewline
\rowsep
\texttt{formula} \hfil\break\ucd{} & text & \raggedright Chemical formula of this species in some free-ish notation\tabularnewline
\rowsep
\texttt{source\_id} \hfil\break\ucd{} & text & \raggedright VAMDC identifier of the origin of this mapping\tabularnewline

\subsection{User-defined functions}
% /GENERATED
\sptablerule
\end{tabular}
\caption{The columns that make up the Species Table. }
\label{tab:spcols}
\end{table}

\section{ADQL User-defined functions}
\label{sect:udfs}

LineTAP services MUST implement the \texttt{ivo\_specconv} user defined
Expand Down Expand Up @@ -541,6 +598,24 @@ \subsubsection{Characterising a Service's Data Holdings}
GROUP BY inchi
\end{lstlisting}

\subsubsection{Searching With Trivial Molecule Names}

Searching with trivial names as discussed in use
case~\ref{uc:resolution} would often be a two-step process where clients
ask the researcher which InChI would correspond the the species they
were looking for. In simple cases, however, a single joined query can be
run, too.

% please-run-a-test
\begin{lstlisting}[language=SQL]
SELECT
*
FROM casa_lines.line_tap
JOIN species.main as s USING (inchikey)
WHERE s.name='Methylidynium'
\end{lstlisting}


\section{Mapping from VAMDCXSAMS}
\label{sect:mapping}

Expand Down Expand Up @@ -665,16 +740,13 @@ \section{LineTAP and the VO Registry}

\subsection{Registering LineTAP-conforming Tables}

LineTAP tables are registered using VODataService \citep{2021ivoa.spec.1102D}
LineTAP line tables are registered using VODataService \citep{2021ivoa.spec.1102D}
tablesets, where the table utype is set to
$$\hbox{\verb|ivo://ivoa.net/std/linetap#table-1.0|}.$$
$$\hbox{\verb|ivo://ivoa.net/std/linetap#lines-1.0|}.$$

The tableset is normally contained in a VODataService \xmlel{CatalogService}
record with a TAP capability, and this capability normally is an auxiliary
capability as per DDC \citep{2019ivoa.spec.0520D}. For one-table
services a full TAPRegExt \citep{2012ivoa.spec.0827D} capability is also
allowed; other resource types can be used for registration as
appropriate.
The tableset is contained in a VODataService \xmlel{CatalogResource}
record with a TAP auxiliary capability
as per DDC \citep{2019ivoa.spec.0520D}.

Further capabilities, for instance for full VAMDC or legacy SLAP
services, may be given in the same record.
Expand Down Expand Up @@ -714,7 +786,7 @@ \subsection{Registering LineTAP-conforming Tables}
<name>toss.ivoa_lines</name>
<title>TOSS</title>
<description> The LineTAP version of...</description>
<utype>ivo://ivoa.net/std/linetap#table-1.0</utype>
<utype>ivo://ivoa.net/std/linetap#lines-1.0</utype>
...
</table>
\end{lstlisting}
Expand All @@ -726,6 +798,12 @@ \subsection{Registering LineTAP-conforming Tables}
and is thus to be expected in most registrations of this type. Clients
are advised to use the resource description for full text searches.

Species tables are registered in exactly the same way, except their
utype is
$$\hbox{\verb|ivo://ivoa.net/std/linetap#species-1.0|}.$$
Data providers should only register line and species tables in one
resource record if the species table really has the same metadata
(description, author, source, etc) as the line table.

\subsection{Discovering LineTAP services}

Expand All @@ -738,35 +816,34 @@ \subsection{Discovering LineTAP services}
would return TAP access URLs and the table names:

\begin{lstlisting}[language=SQL]
SELECT DISTINCT table_name, access_url
SELECT table_name, access_url
FROM rr.res_table
NATURAL JOIN rr.capability
NATURAL JOIN rr.interface
WHERE
table_utype LIKE 'ivo://ivoa.net/std/linetap#table-1.%'
table_utype LIKE 'ivo://ivoa.net/std/linetap#lines-1.%'
AND standard_id LIKE 'ivo://ivoa.net/std/tap%'
AND intf_role='std'
AND res_type='vs:catalogresource'
\end{lstlisting}

The \texttt{DISTINCT} in the main query is a rough filter that removes
entries duplicated because their tables are registred both in the main
TAP record and in an auxiliary capability.

The regular expression in the utype match is to make sure minor version
increments do not prevent service discovery; by IVOA versioning rules,
all LineTAP services of minor version 1 can be operated by all LineTAP
clients of version 1. We do not constrain the version of the TAP
service. Clients may want to adapt the TAP discovery pattern to match
their specific needs.


Adapting the utype, this query will work analogously for species tables.

\appendix
\section{Changes from Previous Versions}
\section{Changes from WD-2023-03-23}

No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
\begin{itemize}
\item Adding the species table
\item Changing the line table utype to \dots lines-1.0 (rather than
\dots table-1.0 before).
\end{itemize}


\bibliography{ivoatex/ivoabib,ivoatex/docrepo, localrefs}
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ DOCNAME = LineTAP
DOCVERSION = 1.0

# Publication date, ISO format; update manually for "releases"
DOCDATE = 2023-03-23
DOCDATE = 2024-09-18

# What is it you're writing: NOTE, WD, PR, REC, PEN, or EN
DOCTYPE = WD
Expand Down
24 changes: 15 additions & 9 deletions linetap.vor
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
<ri:Resource
xsi:type="vstd:Standard"
created="2020-10-26T11:44:00"
<ri:Resource
xsi:type="vstd:Standard"
created="2020-10-26T11:44:00"
updated="2020-10-26T11:44:00"
status="active"
xmlns:vr="http://www.ivoa.net/xml/VOResource/v1.0"
xmlns:vstd="http://www.ivoa.net/xml/StandardsRegExt/v1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:vr="http://www.ivoa.net/xml/VOResource/v1.0"
xmlns:vstd="http://www.ivoa.net/xml/StandardsRegExt/v1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ri="http://www.ivoa.net/xml/RegistryInterface/v1.0"
xsi:schemaLocation="http://www.ivoa.net/xml/VOResource/v1.0
http://www.ivoa.net/xml/VOResource/v1.0
Expand All @@ -16,7 +16,7 @@

<title>IVOA Relational model for Spectral Lines (LineTAP)</title>
<shortName>linetap</shortName>
<identifier>ivo://ivoa.net/std/linetap</identifier>
<identifier>ivo://ivoa.net/std/linetap</identifier>
<curation>
<publisher>IVOA</publisher>

Expand Down Expand Up @@ -61,8 +61,14 @@
<endorsedVersion status="wd">1.0</endorsedVersion>

<key>
<name>table-1.0</name>
<description>The LineTAP table schema as of version 1.0.
<name>lines-1.0</name>
<description>The LineTAP lines table schema as of version 1.0.
</description>
</key>

<key>
<name>species-1.0</name>
<description>The LineTAP species table schema as of version 1.0.
</description>
</key>

Expand Down
76 changes: 76 additions & 0 deletions make-species-table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/python3
"""
This writes LaTeX for the rows of our table of LineTAP columns. Technically,
this obtains the info from the standard columns of an operational (and
hopefully validated) table at dc.g-vo.org.
Dependency: python3-pyvo (and hence astropy).
"""

import pyvo

NON_NULL_COLUMNS = {'title', 'vacuum_wavelength'}
TYPE_MAP = {
("char", "*"): "text",
("unicodeChar", "*"): "text",
("int", ""): "integer",
("double", ""): "float",}


def e(tx):
"""returns tx with TeX's standard active (and other magic) characters
escaped.
"""
return tx.replace("\\", "$\\backslash$"
).replace("&", "\\&"
).replace("#", "\\#"
).replace("%", "\\%"
).replace("_", "\\_"
).replace("}", "\\}"
).replace("{", "\\{"
).replace('"', '{"}')


def get_type(datatype, arraysize, nonnull):
"""returns a simple type identifier for a VOTable datatype/arraysize.
Well, this really only nows what people have manually entered into
TYPE_MAP above...
"""
res = e(TYPE_MAP[datatype, arraysize])
if nonnull:
res = f"\\textbf{{{res}}}"
return res


def main():
svc = pyvo.tap.TAPService("http://dc.g-vo.org/tap")
rows = []

for row in svc.run_sync("""
select column_name, description, unit, ucd, datatype, arraysize
from tap_schema.columns
where
table_name='species.main'
order by column_index"""):
parts = [r"\texttt{{{}}}".format(e(row["column_name"]))]
if row["unit"]:
parts.append(e("["+row["unit"].replace("Angstrom", "Å")+"]"))
parts.append(r"\hfil\break\ucd{{{}}}".format(e(row["ucd"])))

parts.append("&")
parts.append(get_type(
row["datatype"],
row["arraysize"],
row["column_name"] in NON_NULL_COLUMNS))

parts.append("&")
parts.append(r"\raggedright "+e(row["description"]))

rows.append(" ".join(parts)+r"\tabularnewline")

print("\n\\rowsep\n".join(rows))


if __name__=="__main__":
main()

0 comments on commit e97364e

Please sign in to comment.