Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Juju times out fetching charm metadata #551

Open
ChrisMacNaughton opened this issue Sep 17, 2018 · 19 comments
Open

Juju times out fetching charm metadata #551

ChrisMacNaughton opened this issue Sep 17, 2018 · 19 comments

Comments

@ChrisMacNaughton
Copy link

Conjure up regularly fails with a timeout fetching charm metadata during a deploy. This makes it impossible to use conjure-up to deploy any solution as the charms are unavailable.

conjure-up timeout

@hatched
Copy link

hatched commented Sep 17, 2018

There are some speed improvements that need to be done to the charmstore but in the meantime you can start by not requesting so much information in that request.

To deploy a charm you shouldn't need a number of those pieces of data. Especially things like stats and bundles-containing both of which are slow requests and not required to deploy. You can also try increasing the timeout from 5s to 10s.

@ryan-beisner
Copy link

ryan-beisner commented Sep 17, 2018

I still get "0 items" pretty frequently at https://jujucharms.com/u/openstack-charmers-next - could this be related?

@ryan-beisner
Copy link

screenshot from 2018-09-17 10-53-21

gnuoy pushed a commit to gnuoy/python-libjuju that referenced this issue Sep 17, 2018
The charm store often does not respond in 5s. Increase the timeout
to 20s. Charm store issue being tracked in canonical/jujucharms.com#551
. Closes issue 261.
johnsca pushed a commit to juju/python-libjuju that referenced this issue Sep 17, 2018
* Increase the timeout for charmstore connections

The charm store often does not respond in 5s. Increase the timeout
to 20s. Charm store issue being tracked in canonical/jujucharms.com#551
. Closes issue 261.

* Allow charm store timeout to be configurable
@ChrisMacNaughton
Copy link
Author

More timeouts

Traceback (most recent call last):
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 57, in _get
    timeout=self.timeout)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/raven/breadcrumbs.py", line 326, in send
    resp = real_send(self, request, *args, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/requests/adapters.py", line 521, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.jujucharms.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/conjureup/controllers/juju/deploy/common.py", line 30, in do_deploy
    await app.juju.client.deploy(fn)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1281, in deploy
    await handler.execute_plan()
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1925, in execute_plan
    result = await method(*step.args)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 2067, in deploy
    charm)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 1338, in _add_store_resources
    entity = await self.charmstore.entity(entity_url)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/juju/model.py", line 2161, in coro
    return await self.loop.run_in_executor(None, method)
  File "/snap/conjure-up/1021/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 136, in entity
    return self._meta(entity_id, includes, channel=channel)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 107, in _meta
    data = self._get(url)
  File "/snap/conjure-up/1021/lib/python3.6/site-packages/theblues/charmstore.py", line 78, in _get
    raise ServerError(message)
theblues.errors.ServerError: Request timed out: https://api.jujucharms.com/charmstore/v5/glance-268/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=owner&include=revision-info&include=published&include=stats&include=resources&include=supported-series&include=terms timeout: 5

@ryan-beisner
Copy link

The CS: timeout issues are prevalent in our CI as well. So to summarize: end users browsing jujucharms.com have a frequent error condition (0 items shown), conjure-up has a high failure rate for end users, my developers are often blocked merges which cannot complete a CI test. Please investigate the back-end of CS.

@fnordahl
Copy link
Member

At present I am greeted with this error on charm search queries like keystone and ceph:
jujucharms_com_search_timeout

@ajkavanagh
Copy link

I'm also greeted with this error:

image

Usually, refreshing solves the problem.

@jcsackett
Copy link

The culprit largely seems to be the stats component of queries on the charmstore. If possible omitting stats from the include in query params can speed things up considerably.

@fnordahl @ajkavanagh -- we are working on a deploy of jujucharms.com that should fix searching through the site, using the stats omission workaround.

@ajkavanagh @ryan-beisner @ChrisMacNaughton is the stats data necessary for your current uses? If not I would recommend removing it.

Charmstore folks are working on the issue as well, but in the meantime this approach could mitigate the problems.

ChrisMacNaughton added a commit to ChrisMacNaughton/theblues that referenced this issue Sep 18, 2018
jujugui added a commit to juju/theblues that referenced this issue Sep 18, 2018
@ryan-beisner
Copy link

ryan-beisner commented Sep 18, 2018

@Jaycee That library has been in place as-is, for better or for worse, for quite some time. What has changed that suddenly causes this to be an issue?

We can, should and will adjust the theblues library. However, it's a messy sprawl of impact. In order to actually profit, conjure-up and other projects will need to be rebuilt and redistributed with the modified theblues project. Also, other consumers of the theblues library will need to be modified to turn off stats, then and those modified projects will need to be redistributed/released.

@ryan-beisner
Copy link

FTR, bundletester, amulet also affected:

(func27-noop) ubuntu@beisner-bastion:~/git/lxd$ tests/gate-basic-bionic-queens 
2018-09-18 10:15:12,876 __init__ INFO: OpenStackAmuletDeployment:  init
2018-09-18 10:15:12,876 _add_services INFO: OpenStackAmuletDeployment:  adding services
2018-09-18 10:15:12,876 _determine_branch_locations INFO: OpenStackAmuletDeployment:  determine branch locations
ERROR:root:Request timed out: https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/nova-cloud-controller/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=stats&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm&include=id timeout: 10.0
Traceback (most recent call last):
  File "tests/gate-basic-bionic-queens", line 24, in <module>
    stable=False)
  File "/home/ubuntu/git/lxd/tests/basic_deployment.py", line 57, in __init__
    self._add_services()
  File "/home/ubuntu/git/lxd/tests/basic_deployment.py", line 95, in _add_services
    other_services)
  File "/home/ubuntu/git/lxd/tests/charmhelpers/contrib/openstack/amulet/deployment.py", line 151, in _add_services
    other_services)
  File "/home/ubuntu/git/lxd/tests/charmhelpers/contrib/amulet/deployment.py", line 67, in _add_services
    constraints=svc.get('constraints'))
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/deployer.py", line 218, in add
    service_name, charm, branch=branch, series=service['series'])
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/charm.py", line 61, in fetch
    series=series)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/amulet/charm.py", line 46, in get_charm
    return Charm(with_series(charm_path, series))
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/charmstore/lib.py", line 168, in __init__
    super(Charm, self).__init__(id, api=api, timeout=timeout)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/charmstore/lib.py", line 115, in __init__
    AVAILABLE_INCLUDES).get('Meta')
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/theblues/charmstore.py", line 107, in _meta
    data = self._get(url)
  File "/home/ubuntu/git/lxd/.tox/func27-noop/local/lib/python2.7/site-packages/theblues/charmstore.py", line 78, in _get
    raise ServerError(message)
theblues.errors.ServerError: Request timed out: https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/nova-cloud-controller/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=stats&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm&include=id timeout: 10.0

@jcsackett
Copy link

@ryan-beisner The stats component of the charmstore has always been slow; however, there's presently some issue going on with the underlying tools charmstore relies on causing a general slow down, which exacerbates the stats to the point that we're seeing these slowdowns.

theblues has been updated w/ @ChrisMacNaughton's fix: https://pypi.org/project/theblues/

Additionally, the fix is now live on jujucharms.com

@ryan-beisner
Copy link

@Jaycee Even though theblues is released with the update, the default behavior of it was not changed. That means that we will have to go find all of the calls to theblues in intermediate modules such as amulet, bundletester, and other tools, modify them to tell theblues not to ask for stats, then release those individual intermediate tools.

FTR, we are still seeing prevalent CS timeouts overnight 2018 Sept 18 to 19.

@jcsackett
Copy link

I'm aware that the change is going to need to be made--as I understand it fixes are already underway for python-jujulib and amulet.

The underlying issues on CS are to my knowledge still being worked on--it requires a sizeable amount of work. I would encourage you to make the updates to use theblues b/c the CS fixes may take some time to rollout.

@ryan-beisner
Copy link

Indeed, we have all of that in flight. Just commenting here so passers-by understand that the theblues changes alone are not going to unblock devs and ci.

@jcsackett
Copy link

Ah, ok, I'm with you now. Thanks!

@gnuoy
Copy link

gnuoy commented Sep 20, 2018

I'm still hitting >10s response times. You can see it in a openstack review https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-aodh/603318/7/2056/test_charm_amulet_full_4171/amulet-full.txt . Also a curl command takes >10s:

$ time curl -H 'Cache-Control: no-cache'  'https://api.jujucharms.com/charmstore/v5/~openstack-charmers-next/bionic/ceph-mon/meta/any?include=bundle-machine-count&include=bundle-metadata&include=bundle-unit-count&include=bundles-containing&include=charm-actions&include=charm-config&include=charm-metadata&include=common-info&include=extra-info&include=revision-info&include=supported-series&include=manifest&include=tags&include=promulgated&include=perm' -D-  | grep -v ceph
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0HTTP/1.1 200 OK
Date: Thu, 20 Sep 2018 05:56:20 GMT
Server: Apache/2.4.7 (Ubuntu)
 Access-Control-Allow-Credentials: true
 Access-Control-Allow-Headers: Bakery-Protocol-Version, Macaroons, X-Requested-With
0Access-Control-Allow-Methods: DELETE,GET,HEAD,PUT,POST,OPTIONS
 Access-Control-Allow-Origin: *
 Access-Control-Cache-Max-Age: 600
 Access-Control-Expose-Headers: WWW-Authenticate
 Content-Type: application/json
 X-Cache: MISS from juju-ps45-cdo-jujucharms-machine-27
0X-Cache-Lookup: MISS from juju-ps45-cdo-jujucharms-machine-27:3128
 Via: 1.1 juju-ps45-cdo-jujucharms-machine-27 (squid/3.3.8)
 Vary: Accept-Encoding
 Transfer-Encoding: chunked
 
100 97763    0 97763    0     0   8828      0 --:--:--  0:00:11 --:--:-- 26865

real	0m11.086s
user	0m0.021s
sys	0m0.012s```

fwiw subsequent hit a cache and are <2s. 

@ryan-beisner
Copy link

@Jaycee ^ that is with stats disabled. We are still hitting prevalent CS timeouts. In an openstack deploy, with ~40 charms, all it takes is 1 charm query to fail to spoil an expensive and lengthy CI run. The result is that our devs cannot get code through our gate to land code currently.

@ryan-beisner
Copy link

screenshot from 2018-11-08 15-24-50

Hi All, the charm store is still timing out on the web UI, reporting 0 items fairly often. What is the status of the solution for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants