Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: query_simplified_user_actions and add test cases #15

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dtgoitia
Copy link
Collaborator

Main change

With this fix, the function query_simplified_user_actions will now return all the simplified actions related to a specific playback (given the playback ID). In this case, simplified means that only the last upvote or downvote action of each user will be returned, together with the skip, if any at all. If there are no actions, an empty list will returned.

Other changes

  • Replace tabs for spaces
  • Add a comment to a function. Please @txomon revise.

Close #2

@txomon
Copy link
Owner

txomon commented Aug 1, 2018

Some comments to this PR:

  • tests are failing because flake8 is complaining
  • There are changes you have done not related to tabs to spaces (python will fail if you mix them), it's some kind of linter misconfiguration. I suggest you look at the autoformatter you are using, as flake8 wasn't complaining before and is not now, therefore we will run into this issue in the future.
  • The test case is too complex to follow now. I suggest instead of using number of loops, you parametrize with all the entries that should be inserted in useractions table
  • The query seems overly complicated, so we will refactor it once the rest of the points are cleared.

@dtgoitia dtgoitia force-pushed the test branch 2 times, most recently from c9aafc8 to 19d843d Compare August 4, 2018 23:19
@coveralls
Copy link

coveralls commented Aug 4, 2018

Pull Request Test Coverage Report for Build 58

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 57: 0.0%
Covered Lines: 508
Relevant Lines: 508

💛 - Coveralls

Copy link
Owner

@txomon txomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes TBD

([
{'user': 1, 'action': 'upvote'},
], {
0: Action.upvote.name,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a dict, and don't use .name, it should be an object!

{'dtid': '1234', 'username': 'username'},
{'id': 1, 'dtid': '1234', 'username': 'username', 'country': None},
),
(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting shouldn't be changed here


for action in user_action:
if action['user'] is None:
user = await user_generator()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create this statically, like 3 users and that's it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create the users statically, we still need to somehow map the action['user'] value to the corresponding user (user1, user2...). Is it not too verbose?

user1 = await user_generator()
user2 = await user_generator()
user3 = await user_generator()
user4 = await user_generator()

if action['user'] == 1:
    user = user1
elif action['user'] == 2:
    user = user2
elif action['user'] == 3:
    user = uset3
elif action['user'] == 4:
    user = user4

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that user_action_generator just needs a dict with an 'id' key, therefore you just need to call it like

await user_action_generator(user={'id': action['user_id']}, ...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually looking for that, but I got confused with the function signature (def user_action_generator(db_conn):). I missed that all the arguments extra arguments are passed down to the generate_user_action function via *... #$@&%*!

@dtgoitia dtgoitia force-pushed the test branch 3 times, most recently from 69f6a14 to 89114e4 Compare August 9, 2018 20:46
mosbot/query.py Outdated
async with ensure_connection(conn) as conn:
result = []
async for user_action in await conn.execute(query):
result.append(dict(user_action))
return result
return result
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at the end of the file...

@dtgoitia dtgoitia force-pushed the test branch 2 times, most recently from 155946f to 14364e5 Compare August 11, 2018 13:13
Copy link
Owner

@txomon txomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts, but overall looks good

await user_action_generator(user={'id': action['user_id']}, playback=playback, action=action['action'])

simplified_user_action = await query_simplified_user_actions(playback_id=playback['id'], conn=db_conn)
actual_result = {i: user_action['action'] for i, user_action in enumerate(simplified_user_action)}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why compare a dict instead of the list of the result direclty. Why are you transforming the result to check it?

left join user_action ua on p.id = ua.playback_id
left join track t on p.track_id = t.id
left join "user" u on ua.user_id = u.id
where p.id = {playback_id}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked a safer way to execute this query? you are literally rendering the parameter directly, instead of escaping it or using it as a parameter. I don't know if aiopg the library we are using can do it but please have a look

https://chartio.com/resources/tutorials/how-to-execute-raw-sql-in-sqlalchemy/

@txomon txomon force-pushed the test branch 2 times, most recently from f79f1b9 to 17545ee Compare August 22, 2018 22:37
@txomon
Copy link
Owner

txomon commented Apr 10, 2019

So going back through this, I have found that the performance is a bit terrible, using explain analyze:

 Sort  (cost=93032.76..93038.25 rows=2198 width=85) (actual time=2020.756..2086.115 rows=226029 loops=1)
   Sort Key: sub3.action_timestamp
   Sort Method: external merge  Disk: 17640kB
   ->  Subquery Scan on sub3  (cost=85749.53..92910.75 rows=2198 width=85) (actual time=1536.322..1882.256 rows=226029 loops=1)
         Filter: ((sub3.rn = 1) OR (sub3.next_action = 'skip'::action))
         Rows Removed by Filter: 14088
         ->  WindowAgg  (cost=85749.53..89605.57 rows=220345 width=85) (actual time=1536.321..1782.844 rows=240117 loops=1)
               ->  Sort  (cost=85749.53..86300.40 rows=220345 width=81) (actual time=1536.315..1605.015 rows=240117 loops=1)
                     Sort Key: sub.username
                     Sort Method: external sort  Disk: 18600kB
                     ->  Subquery Scan on sub  (cost=47937.00..55649.07 rows=220345 width=81) (actual time=962.714..1387.867 rows=240117 loops=1)
                           ->  WindowAgg  (cost=47937.00..53445.62 rows=220345 width=97) (actual time=962.713..1286.604 rows=240117 loops=1)
                                 ->  Sort  (cost=47937.00..48487.86 rows=220345 width=81) (actual time=962.706..1077.643 rows=240117 loops=1)
                                       Sort Key: ua.user_id, p.id, u.username, ua.ts DESC
                                       Sort Method: external merge  Disk: 17080kB
                                       ->  Hash Left Join  (cost=10107.73..17836.53 rows=220345 width=81) (actual time=179.670..692.516 rows=240117 loops=1)
                                             Hash Cond: (ua.user_id = u.id)
                                             ->  Hash Left Join  (cost=10098.62..17246.05 rows=220345 width=72) (actual time=179.446..575.505 rows=240117 loops=1)
                                                   Hash Cond: (p.track_id = t.id)
                                                   ->  Hash Right Join  (cost=7499.76..10969.74 rows=220345 width=24) (actual time=139.819..307.686 rows=240117 loops=1)
                                                         Hash Cond: (ua.playback_id = p.id)
                                                         ->  Seq Scan on user_action ua  (cost=0.00..1527.04 rows=75404 width=20) (actual time=0.009..25.753 rows=75446 loops=1)
                                                         ->  Hash  (cost=3884.45..3884.45 rows=220345 width=8) (actual time=139.252..139.252 rows=220465 loops=1)
                                                               Buckets: 131072  Batches: 4  Memory Usage: 3192kB
                                                               ->  Seq Scan on playback p  (cost=0.00..3884.45 rows=220345 width=8) (actual time=0.008..61.668 rows=220465 loops=1)
                                                   ->  Hash  (cost=1425.05..1425.05 rows=52705 width=56) (actual time=39.358..39.358 rows=52928 loops=1)
                                                         Buckets: 65536  Batches: 2  Memory Usage: 2832kB
                                                         ->  Seq Scan on track t  (cost=0.00..1425.05 rows=52705 width=56) (actual time=0.009..16.515 rows=52928 loops=1)
                                             ->  Hash  (cost=6.27..6.27 rows=227 width=13) (actual time=0.195..0.196 rows=258 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 20kB
                                                   ->  Seq Scan on "user" u  (cost=0.00..6.27 rows=227 width=13) (actual time=0.008..0.108 rows=258 loops=1)
 Planning time: 1.078 ms
 Execution time: 2133.643 ms
(33 rows)

@txomon
Copy link
Owner

txomon commented Apr 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Last simplified queries are not aggregating correctly per user events
3 participants