Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO-CAM production triple store 5-10x slower #738

Closed
lpalbou opened this issue Sep 7, 2021 · 8 comments
Closed

GO-CAM production triple store 5-10x slower #738

lpalbou opened this issue Sep 7, 2021 · 8 comments

Comments

@lpalbou
Copy link
Contributor

lpalbou commented Sep 7, 2021

This query used to take 1-2s and now takes about 9s:
https://api.triplydb.com/s/SEOpV8MLi

Similarly, a query to check which GO-CAMs would have a gene product now takes about 8s where it used to be on average 1-2s:
https://api.triplydb.com/s/t8xFw4jHB

I received last week a message from the Alliance as we were surprised to see the number of GO-CAMs being updated much longer after the page was loaded.. This never happened before because that query was much faster.

This currently affect the alliance pathway section where the number of available GO-CAMs for a given gene only gets updated after a long time.

Tag @balhoff @kltm

Note: this is possibly a follow up of #736 .

@cmungall
Copy link
Member

cmungall commented Sep 7, 2021

Hmm, do we have any idea if the reason is:

  • more models (now including mod imports?)
  • increased traffic on server
  • change in configuration of blazegraph or server on which it runs

I believe the client is not issuing the queries, it is going through the main GO API. If so we can implement some kind of cacheing. Obviously this is not ideal however.

@balhoff
Copy link
Member

balhoff commented Sep 7, 2021

The first query runs slightly faster on my laptop (was just testing). @kltm could you describe the machine the triplestore runs on? Is the database on an SSD?

@lpalbou
Copy link
Contributor Author

lpalbou commented Sep 7, 2021

@cmungall 1) is unlikely as in prod we already had about 3400 models.. unless we started adding graphs that don’t have the right gocam type

Option 2: I wish but I think we can keep the Champaign cold as I highly doubt the usage of that triple store suddenly increased.

I would think it’s 3) or we started adding graphs without the proper gocam type or a lot of meta data (neo?)

@balhoff
Copy link
Member

balhoff commented Sep 7, 2021

@lpalbou this query is somewhat faster: https://api.triplydb.com/s/kceuMqK0N

What is the purpose of ?s enabled_by:/rdf:type ?identifier .? Can it be removed?

@balhoff
Copy link
Member

balhoff commented Sep 7, 2021

@lpalbou this query is much faster: https://api.triplydb.com/s/i05GEjcbV

It's not exactly the same; it has 3 additional results. This is because it allows for inferred links between the MF nodes. Does it suit your needs?

@kltm
Copy link
Member

kltm commented Sep 7, 2021

@balhoff There has been no change to machine, software, or configuration that I'm aware of. It is a spinning media RAID setup with a large amount of RAM allocated to blazegraph instances on a fairly beefy machine. It is a machine shared withother instance of blazegraph.

@vanaukenk
Copy link

@kltm - would you like to leave this open?

@kltm
Copy link
Member

kltm commented Jun 8, 2022

I think this can be closed for now--we're dealing with related issues elsewhere (e.g. geneontology/api-gorest-2021#5)

@kltm kltm closed this as completed Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants