-
Notifications
You must be signed in to change notification settings - Fork 12
Search Enhancement: search bar should search for the preprint if it's not already in the database #89
Comments
A couple things to note:
The sources that have APIs, documentation - which I can get hooked up into both search and resolving when a review is requested: If there are other places to search, can we add a prioritized list here? |
@prereview has funded $100.00 to this issue.
|
Because the search field says "Search preprints with PREreviews or requests for review by DOI, arXiv ID or title", I expect that if I paste in a DOI in that field, it will fetch the DOI metadata on the fly to display the paper in Prereview, letting me request a review or add one myself. Currently, it will return no results if the paper has not been added to Prereview before. Recognizing such ids and fetching the corresponding metadata from the relevant services would perhaps be a good first step towards this issue. It is a lot easier than arbitrary search: fetching metadata with a known id is a lot cheaper than searching by free text. In my experience, querying multiple third-party search APIs to return search results to the user in real time is a bit brittle. We used to do this in https://dissem.in/ and that was pretty slow (for instance Crossref's API can be less reliable at times). We now ingest the sources proactively in our database (which is a challenge of its own given the size of these sources, of course). |
@harumhelmy this one seems good to start on, I've got a proposal for your search implementation if you've not already considered it. I'm currently in a company that has implemented search for a product recently by building it from scratch and we found that this was limiting compared to using a third party search API such as one provided by Azure or Google, would you be interested in leveraging these cloud search engines into the application? How this would work is that you have a document store that is in your database. You submit an index to Azure or Google, and then use their APIs to run your search queries. This provides you with a host of features such as search suggestions and a more powerful search engine. I would suggest this approach in your project by constructing an index from all the search sources including your data and third party data proactively as @wetneb suggests, submitting it to one of the search services, and querying their API with your search. You might look into a closer integration into Azure since you are using that platform, the search service can pull and index data from an Azure db without much glue code if you don't mind platform dependence. |
Looking to work on this issue, has anyone read my proposal above? |
@TheGuardianWolf sorry for the delay here! I think this might be a good solution! The rub is that we're also separately working into transitioning the site's backend into postgres (it's currently on couchDB). I don't know much about integrating cloud search engines yet, so I'm wondering whether you know how reusable your fix would be reusable with a postgres backend? |
For this situation, let's imagine I've finished implementing the cloud solution, the end products are:
Of these things, the only thing that needs to be rewritten is the indexer for internal data, as you'd need to fetch via sql rather than nosql. Because you are moving from nosql to sql, I imagine there will be a moderate amount of schema change, I can try to abstract out the data fetching from DB as much as possible in this case to minimse the time spent on rewriting that part. It would be good to get a bit more information about the current data structure vs the proposed new one along with any existing data indexing processes. |
Would you be able to give me a working invite to your slack team? The one on the readme seems to be dead :( I think we could talk about this more effectively via chat |
I'm logging off for the evening (EDT here), but for a bit more (vagueish) context: the new data structure is still WIP, but we should finalize it on Tuesday, or a little bit after, and in the meantime I'll dig up a spreadsheet that might help with elucidating the current data structure |
Unfortunately I can't accept a shared channel request, I don't have the paid version of slack! For the repo, could I suggest ttps://gitter.im for developer discussions? Unless there's a better solution I'm not aware of |
Sorry for the bad invite 😅 can you try this one: https://join.slack.com/t/prereview/shared_invite/zt-9qpk9pc5-6fsyuI6hwMuenjusPxDTCw |
@harumhelmy @rudietuesdays Am I correct to say this issue is now also linked to the New Merge Platform project as issue #14? |
@murkatr correct, though some of the implementation of this in the new merged platform is also related to building the API. Either way, it is related to the building taking place in the new merged platform. cc @harumhelmy |
If a user searches for a preprint that is not already in the database, lookup the preprint on various servers and allow the user to request or add a review
IssueHunt Summary
Backers (Total: $100.00)
Become a backer now!
Or submit a pull request to get the deposits!
Tips
The text was updated successfully, but these errors were encountered: