Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for smt list #311

Merged
merged 2 commits into from
Feb 12, 2016

Conversation

babsey
Copy link
Contributor

@babsey babsey commented Dec 8, 2015

cf discussion on #309 and #310

The concept of this PR is to improve performance for smt list.
I would like to make this PR step by step. After merging I will continue writing codes for this PR.

The first step is to stop the constant writing in file.
Next steps will be limiting amount of records for print or filtering by date, script file etc...

Which of these steps do you desire?

@babsey babsey changed the title option save labels Performance improvement for smt list Dec 8, 2015
@maxalbert
Copy link
Contributor

I hadn't been aware of smt list writing to a file so far. As far as I can tell the only purpose of the generated file .smt/labels is to allow tab completion of labels (see bin/smt-complete.sh).

I can't comment on the performance implications of writing to a file in smt list. My hunch is that the overhead of this is small compared to the time it takes to extract the records from the database (at least for Django record stores), but I haven't profiled this so I may be wrong.

Regardless, if the purpose of .smt/labels really is to allow tab completion then it seems wrong to have this file generated by smt list because this requires a manual step by the user and the file will easily get out of sync with existing records. Instead, it should be generated and automatically updated by smt run and smt delete commands. @apdavison, do you agree with this or am I missing something?

@apdavison
Copy link
Contributor

As far as I can tell the only purpose of the generated file .smt/labels is to allow tab completion of labels (see bin/smt-complete.sh)

This is correct.

The reason this slows things down is not the writing to file; it is that the current implementation ends up calling project.find_records() twice.

it seems wrong to have this file generated by smt list because this requires a manual step by the user and the file will easily get out of sync with existing records. Instead, it should be generated and automatically updated by smt run and smt delete commands.

I agree with this. The smt run part is easy, we just append to the end of .smt/labels. Deletion is trickier: profiling will be needed to see whether it is more efficient to rewrite the file with the deleted labels removed or just regenerate the entire file.

@apdavison
Copy link
Contributor

Concerning the structure of this PR, I would prefer to have a separate PR for each step.

I suggest renaming this pull request to something like "Make bash completion optional" and opening a new one for filtering records (cf #309).

@@ -424,11 +424,13 @@ def list(argv): # add 'report' and 'log' as aliases
default='text',
help="FMT can be 'text' (default), 'html', 'json', 'latex' or 'shell'.")
parser.add_argument('-r', '--reverse', action="store_true", dest="reverse", default=False,
help="list records in reverse order (default: newest first)"),
help="list records in reverse order (default: newest first)")
parser.add_argument('-w', '--write', action="store_true", dest="write", default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option should be named something like --enable-bash-completion.

@maxalbert
Copy link
Contributor

Firstly, if this PR is about making bash completion optional then the command line switch should have a different name to reflect this. (Oh never mind, I just saw that @apdavison already suggested this in a line comment.)

However, making bash completion optional never was the intention of this PR - it was meant to improve performance. I'm not completely opposed to making it optional, but I think we should avoid introducing lots of options just because we can. If it's cheap in terms of performance to keep the bash completion functionality then why have a switch to disable it? (Again, I'm not sure if it really is cheap, but it sounds to me like the performance-critical parts are unrelated to actually writing to the file .smt/labels, which is the only part that affects bash completion).

@babsey babsey changed the title Performance improvement for smt list Make bash completion optional Dec 9, 2015
@babsey babsey changed the title Make bash completion optional Performance improvement for smt list Dec 9, 2015
@babsey
Copy link
Contributor Author

babsey commented Dec 9, 2015

maxalbert is right. Let's keep it enabled. There is a way for fast writing to a file.
Can you tell me if you can confirm an improvement of the last commit?

Sorry for re-renaming the title.

@maxalbert
Copy link
Contributor

I definitely see a speed improvement with @babsey's latest change. In a project with 432 records, smt list takes 60 seconds with project.format_records() and 45 seconds with project.record_store.labels(). I'd be happy to merge this as a first step, although I still think we should break up the writing of .smt/labels into incremental updates in smt run and smt delete.

Generally speaking, 45 seconds for a simple smt list is still pretty sluggish. If we are consistent with keeping .smt/labels up to date then we could consider simply returning its contents unless the --long or --table options are specified, which would at least make a simple smt list make much faster. However, at the end of the day this only fixing the symptoms of the recordstore database being slow so ideally we should tackle the root of the issue.

Btw, if project.record_store.labels() turns out something we intend to use in more places then it could be worth adding a new method project.get_labels() which calls this under the hood - just to have slightly better encapsulation and looser coupling between classes.

@apdavison
Copy link
Contributor

@maxalbert: I agree on all points.

@babsey: please could you squash the commits? Once that's done I can merge this PR, or I could leave it open for further work. Let me know what you would prefer.

@babsey
Copy link
Contributor Author

babsey commented Dec 10, 2015

I also agree. I wrote a new method get_labels() for the project object.

Futhermore, I made the next step for improved speed of plain smt list command. This should be a very fast. The question is where I can implement these code lines. I am open-minded for your suggestions.

@apdavison: I never squashed commits but if you agree on these steps. I will try to squash these commits

@maxalbert
Copy link
Contributor

@babsey I haven't looked at the latest changes yet, but in terms of how to squash commits: I personally really like using git's interactive rebase for this. If you run

git rebase -i master

This will open up an editor which will show you all the commits on your curent branch that are on top of master, and each of them is prepended by "pick". If you keep the first one as "pick" and change all the subsequent ones to "s" or"squash" then this will combine all the commits on top of master into a single one. (Maybe make a backup of your repo before you try it the first time just in case something goes wrong.)

Once you have squashed the commits you will need to push the branch to Github again:

git push --force origin per_impr_CLI

The --force is important, otherwise git will complain because you are trying to push some changes that don't build upon changes that are already present on Github.

fast writing for bash completion

writeline to write

with open(..) as f

add project method get_labels

alternative print for plain smt list

correct code because of failed travis test

correct code because of failed travis test

correct code because of failed travis test

add method "records" to MockRecordStore

quick filter record labels by tags

add tags argument to MockRecordStore.records
apdavison added a commit that referenced this pull request Feb 12, 2016
Performance improvement for smt list
@apdavison apdavison merged commit 2c08724 into open-research:master Feb 12, 2016
@babsey babsey deleted the per_impr_CLI branch February 16, 2016 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants