Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite of batch.inc and added Drush command #39

Open
wants to merge 16 commits into
base: 7.x
Choose a base branch
from

Conversation

janjouketjalsma
Copy link

JIRA Ticket: NA

What does this Pull Request do?

Improve (if you ask me) the code of batch.inc and adds a Drush command.

What's new?

A summary of the changes can be found below

New

  • Added a Drush command

Changes to form

  • Updated description of Maximum number of Islandora links etc. to match what it is being used for: hook_cron().
    Also renamed this field internally to islandora_xmlsitemap_generate_limit_cron.
  • Added field Amount of Islandora links to fetch and process at once which is being used by both hook_cron() and the regenerate buttons to determine how many Objects to process every time the operation is called.

Changes to batch code

  • Last Modified Value is now updated every iteration of the operation so that the batch function can be interrupted and resumed at the level of iterations.
    This fixed our biggest issue as we would have crashes when trying to process a million records at which point the progress was not saved and we had to start over.
  • Now setting $sandbox['finished'] to 1 if an error occurs or we do not get results. This prevents more iterations from starting in case of an error
  • Moved bootstrapping of query processor outside main operation so that it can be tested and reused
  • Added the currently processing PID to the message displayed to the end user so that you have a basic idea of what the batch is doing.
  • Moved query generation to a function so that it can be tested and reused
  • Added a finished callback that gives some feedback on the work that was done. This is especially useful for the Drush command as there is no other feedback going to drush.

Changes to islandora_xmlsitemap_cron():

  • Chunk size is now taken from new form field called islandora_xmlsitemap_generate_chunk_size
  • Using renamed form field islandora_xmlsitemap_generate_limit_cron to determine the maximum number of PIDs to process when hook_cron() is called.

How should this be tested?

  • Previous functionality should be tested if it still works (I have found no issues).
  • The newly introduced Drush command should be tested with and without the options and with invalid values. A desciption of how to use the command is added to the readme.
  • Configuration values in the admin interface should be changed, saved then tried with different values.

Additional Notes:

  • An update has been made to the readme page. This should also be updated on the Wiki (https://wiki.duraspace.org/display/ISLANDORA/Islandora+XML+Sitemap)
  • I have tried several values for the chunk_size in the Drush command but surprisingly found lower values (like 100) to perform better (lower CPU, higher processing rate) than higher values (500) in our installation.

Interested parties

@Islandora/7-x-1-x-committers

@bryjbrown
Copy link
Member

bryjbrown commented Feb 9, 2018

@janjouketjalsma I definitely like the direction this is going in w/r/t better labeling (saying explicitly that the batch limit only applies to cron jobs, more descriptive variable names, etc) but I think we may be able to go a step further with the admin menu title and description, specifically in terms of the phrase

"...when hook_cron() is called"

Though this makes perfect sense to us as Drupal developers and it isn't at all incorrect, I wonder if we could find a more friendly and succinct description? For instance "...when running via cron" or "...when executing islandora_xmlsitemap_cron()" Both of these phrases are saying nearly the same thing, but the first one is a bit less intimidating and the last one points you to exactly what hook_cron() implementation is being called (just in case the reader is confused and wants to investigate further).

My vote would be for the second phrasing (islandora_xmlsitemap_cron()), but I don't feel too strongly about it.

I haven't investigated the rest of the code changes though so do not mistake this for a code review :)

@janjouketjalsma
Copy link
Author

@bryjbrown I agree. We should definitely find a better name things that go to the frontend. Also in the backend I'm open to suggestions. I'm not particularly fond of max_chunk_size for instance. I'll wait for some more feedback before I update the PR.

@DiegoPino
Copy link
Contributor

@bryjbrown just if you have some time: any feedback/ideas we could give @janjouketjalsma for wording things (like the hook_cron mention on the UI) to keep this pull rolling? @Islandora/7-x-1-x-committers anyone else interessted in helping here out? Pull looks pretty interessting and useful. Thanks to both of you.

@rosiel
Copy link
Member

rosiel commented Mar 20, 2018

No specific suggestions from me, since I'm still not sure what those options do... but Drupal Solr has a similar (sorta) function, and they use the phrasing "Number of items to index per cron run". Screenshots of the relevant screens (which are annoying far from each other:
screen shot 2018-03-19 at 11 20 30 pm
screen shot 2018-03-19 at 11 16 15 pm

@@ -34,13 +34,22 @@ function islandora_xmlsitemap_admin_form($form, &$form_state) {
'#description' => t('Solr field in which we can perform sorting and range queries.'),
'#default_value' => variable_get('islandora_xmlsitemap_last_modified_field', 'fgs_lastModifiedDate_dt'),
);
$form['islandora_xmlsitemap_number_of_pids_to_process'] = array(
$form['islandora_xmlsitemap_generate_limit_cron'] = array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janjouketjalsma If you are going to change this variable name, you should add a hook_update_N() function to migrate the old setting from islandora_xmlsitemap_number_of_pids_to_process to islandora_xmlsitemap_generate_limit_cron and then unset the islandora_xmlsitemap_number_of_pids_to_process variable. Otherwise it will hang around forever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, the old refrain, "Don't forget the uninstall hook!"
islandora_xmlsitemap_uninstall() - the variable name should be updated there too.

@whikloj
Copy link
Member

whikloj commented Mar 20, 2018

Additionally @janjouketjalsma you might want to run this through phpcs --standard=Drupal to help clean up some of the style guide violations.

@bryjbrown
Copy link
Member

@DiegoPino @janjouketjalsma Having taken a break from this for a while and coming back to re-read with fresh eyes, I think the only specific wording I would change is the "title" on the islandora_xmlsitemap_generate_limit_cron variable in the admin page. Currently its "Maximum Number of Islandora links to process during hook_cron()", but I would change this last part to make it more like the Drupal Solr wording @rosiel pointed out, such as "Maximum Number of Islandora links to process per cron run". Its not a huge difference, but I think it looks better.

One thing that I would like to see in the wording for both the islandora_xmlsitemap_generate_limit_cron and islandora_xmlsitemap_generate_chunk_size titles and descriptions is how they differ, because right now I'm having a hard time understanding how they are different. They both seem to be setting a number of items to process, but according to my understanding islandora_xmlsitemap_generate_limit_cron only gets used when triggered by cron, but islandora_xmlsitemap_generate_chunk_size gets used by cron AND the admin buttons. So when cron runs, how does it use both variables? I'm sure I could figure it out by digging through the code, but we don't want to ask that of everyone that wants to configure the module. Being a bit more explicit in the description to eliminate that confusion would go a long way. 👍

@DonRichards
Copy link
Member

@janjouketjalsma I'd add @whikloj comment. I'd suggest a more detailed command /var/www/drupal/:$ drush coder-review --reviews=production,security,style,i18n,potx,sniffer --no-empty islandora_xmlsitemap
To be honest, I don't know the difference, this is just the exact command Travis runs.

Recap: This is waiting on the following? Is this correct?

  • hook_update_N() to migrate the old functions
  • uninstall needs to include the new hook
  • change is the "title" on the islandora_xmlsitemap_generate_limit_cron & islandora_xmlsitemap_generate_chunk_size
  • Describe the difference between islandora_xmlsitemap_generate_limit_cron & islandora_xmlsitemap_generate_chunk_size

I assumed this stalled because of a release. Any reason we can't get this one back on track?
@bryjbrown @rosiel @janjouketjalsma @whikloj

My interest in pushing this forward is a ticket I took on that requires the generation of a new file this PR has already created. It would make completing ISLANDORA-2275 easier if this PR is completed.

@bryjbrown
Copy link
Member

@DonRichards I don't think this stalled because of the release, I think it stalled because this PR is owned by @janjouketjalsma and we haven't heard from him. If anyone else wanted to add to this PR specifically, they'd have to make a PR against his fork I think.

As for your assessement of what needs to be done, I'd say you are correct with the slight correction that the update and uninstall hooks are for handling the renamed variable (islandora_xmlsitemap_number_of_pids_to_process to islandora_xmlsitemap_generate_limit_cron) instead of any functions.

If you would like to fix these things in a PR against @janjouketjalsma's fork, that might ease the process if @janjouketjalsma can't get to this before the code freeze.

@janjouketjalsma
Copy link
Author

janjouketjalsma commented Sep 13, 2018 via email

@bryjbrown
Copy link
Member

@janjouketjalsma No worries, if we were to finish this up by making a PR against https://github.com/LeidenUniversityLibrary/islandora_xmlsitemap, could we count on you to merge it in so it would show up here and we could wrap this thang up?

@janjouketjalsma
Copy link
Author

janjouketjalsma commented Sep 13, 2018 via email

@IAMlKeno
Copy link

IAMlKeno commented Dec 2, 2021

Hello, has this pull request been abandoned? I am encountering an issue with a site that has over 2 million objects and think this would help @janjouketjalsma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants