Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADQLParser is not thread safe #128

Open
vforchi opened this issue Apr 6, 2021 · 13 comments
Open

ADQLParser is not thread safe #128

vforchi opened this issue Apr 6, 2021 · 13 comments
Assignees
Labels

Comments

@vforchi
Copy link
Contributor

vforchi commented Apr 6, 2021

I noticed that the ADQLParser is not thread safe: I was reusing the same instance and noticed that the generated queries were incorrect.
This is not necessarily a bug, but it should be mentioned in the documentation.

@gmantele
Copy link
Owner

gmantele commented Apr 8, 2021

I am actually quite surprised it is not thread safe currently. It should be, or at least I wanted it to be thread safe. I will have to investigate that.

Do you have lead on what was kept from one call to another with the same instance? I could probably find out, but if you already have an idea, it would spare me time :-).

@gmantele gmantele self-assigned this Apr 8, 2021
@gmantele gmantele added the bug label Apr 8, 2021
@vforchi
Copy link
Contributor Author

vforchi commented Apr 8, 2021

No, I just noticed that the generated queries were wrong and that I had a shared object.

@gmantele
Copy link
Owner

gmantele commented Apr 8, 2021

It is good to know that now. The parser has to be adapted anyway in order to support both ADQL-2.0 and -2.1. So, I will certainly look into this issue before releasing this new version of vollt/ADQL.

@theresadower
Copy link
Contributor

theresadower commented Apr 20, 2021

We have recently received a report at MAST from Brian van Klaveran at SLAC that some sync TAP queries are throwing errors that look like a parser threading issue. Sync queries to MAST's TAP services directly call my fork of the ADQL library with additional MAST-specific geometry support, but the issue is happening with queries that are not using the additional MAST-specific translation.

From the report:

I am able to reproduce this with the following script (doing so from California, in case latency makes any kind of difference):


#!/bin/bash
curl -s -o test1.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test2.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test3.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
curl -s -o test4.table "https://vao.stsci.edu/CAOMTAP/TapService.aspx/sync?REQUEST=doQuery&LANG=ADQL&QUERY=SELECT+*+FROM+TAP_SCHEMA.tables+WHERE+schema_name+like+'ivoa'" &
sleep 2
diff -q test1.table test2.table
diff -q test2.table test3.table
diff -q test3.table test4.table


The errors tend to vary, but it sounds like the service sometimes is processing a garbled query - possibly some kind of thread safety issue with the ADQL parser?

@gmantele
Copy link
Owner

Very interesting...and very surprising (because I never experienced this error).
Thank you @theresadower for this small script. I succeeded to reproduce this issue.

@theresadower
Copy link
Contributor

@gmantele you're welcome! There is still the very real possibility our error is in a different place, but it seemed a useful starting point, and may be able to be successfully tested against other services.

@gmantele
Copy link
Owner

After a quick look into all of this, I already have a bit of an answer:

  • Indeed, ADQLParser is not thread-safe, and on the contrary to what I said before, it can not be otherwise. The parsing needs several variables (managed by JavaCC) which depends on the current parsing state. So that will never be different and I was aware of that while using it for running queries in TAP. @vforchi , as you suggested, I have to write somewhere in the Javadoc + Documentation that ADQLParser is not thread-safe.

  • @theresadower , the reported issue is, as I said, very surprising. For each query (synchronous or not), a new query execution context (meaning a new ADQLParser with a new JDBCConnection and so ADQLTranslator) is created and attached to a single Thread. When the query is over, this context is discarded. For the moment, as far as I can imagine, the only way there is a thread concurrency would be if the same translator/connection or parser was provided by the TAP/ADQLFactory. And even though, the ADQLTranslator should not keep context related data ; instead of ADQLParser it is supposed to be thread safe. How do you specify the translator to use in your TAP implementation? Do you use a tap.properties file?

@gmantele
Copy link
Owner

@theresadower , you should also have a stack-trace somewhere in your log file related to such error (i.e. prefixed by ADQL Translation Error). If you can find any and share it with me, it would be very helpful to understand the problem.

@brianv0
Copy link

brianv0 commented Apr 20, 2021

Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the parseQuery's or reuse ReInit (to do the lock) in ADQLParser.java

@gmantele
Copy link
Owner

FYI, I just tried this script against 2 other services using VOLLT/TAP-Lib (with Postgres+PgSphere) and no problem at all. So I suspect something going wrong with your translator or the way it is created/provided....but knowing so few for the moment, I may be wrong. I continue my investigations :-)

@theresadower
Copy link
Contributor

Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.

@gmantele
Copy link
Owner

gmantele commented Apr 20, 2021

Just a comment, since the code is in Java, you could wrap the parser entry with an ReentrantLock as a quick fix - somewhere around the parseQuery's or reuse ReInit (to do the lock) in ADQLParser.java

Good idea. However, there are other functions (parsing functions or not) able to change anything in the class that may be used during the parsing process. Then, it would mean putting locks or synchronized at a lot of different places in the ADQLParser, which could lead to undesired lock side effects. I keep the idea though...it might be helpful somewhere else in the parsing or TAP requests processing.

@gmantele
Copy link
Owner

Ah, yes, we are cacheing and reusing parser/translator objects at some level, having assumed thread safety. This was done because building the full schema and geometry and UDF awareness is expensive so I wanted to avoid doing so per query. I should be able to move the level at which we're cacheing these data structures to avoid this.

Links between the ADQL items such as the column and table references and their corresponding TAP_SCHEMA counterparts is already done by DBChecker. So, at translation time, you should have all the TAP_SCHEMA metadata you need attached to the element to translate.

About UDF, depending on the way you declare and use them, such information may already be available during the translation as well.

Spoiler alert: in the coming version of TAP, UDF management will be highly easier as there will be the possibility to give an SQL translation pattern (so a simple string like my_udf_in_sql($1, $2)) in tap.properties which would avoid writing a Java class as required now.

And finally, information like allowed geometries and UDFs are already shared to all threads through ServiceConnection (only one instance for all threads).

Generally speaking, having a link with ServiceConnection will most of the time resolve your problem about accessing TAP_SCHEMA metadata and configuration.

Since this issue is now going a bit off topic, @theresadower , don't hesitate to contact me directly by email if you need help for these modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants