Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Tasks with custom parameters #413

Merged
merged 5 commits into from
May 28, 2021

Conversation

adrianna-chang-shopify
Copy link
Contributor

@adrianna-chang-shopify adrianna-chang-shopify commented May 13, 2021

Closes: #406, #325

This PR introduces support for custom Task parameters by leveraging ActiveModel attributes & validations.

Task authors use ActiveModel::Attributes for their parameters, and can define their own validations to ensure any parameters that are passed from the UI conform to their expectations (ie. Maintenance::ParamsTask uses a presence validator and a regex to ensure that post ids is a comma-delimited list of integers).

Approach taken

  • MaintenanceTasks::Task includes relevant ActiveModel modules
  • In the view, we render a text field for every parameter the Task supports
  • These params are passed from controller => Runner#run. MaintenanceTasks::Run has a field for parameters now, and we pass the parameters to the constructor
  • On run creation, we assign the parameters (as attributes) to a Task instance for the run, and then validate whether the attributes are valid. If they aren't, we take the errors and add them to the run, which gets propagated to the UI in the form of a flash alert.
  • The run how has parameters persisted, and we've also set the parameters on the Task so that any of the methods in the Task (#process, #collection, #count) now have access to these attribute

🎩

Screen Shot 2021-05-13 at 3 13 38 PM

Screen Shot 2021-05-13 at 4 00 18 PM

To Do

  • Documentation

Let me know what you think!

Copy link
Member

@etiennebarrie etiennebarrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a CSS class, the field is almost invisible:

image

Does it make sense to start the job and raise if parameters are invalid?

Potentially the validations would be quite costly, so I think it's ok to make them in before_perform. In the mean time, if the validations are costly, it also means we do them at each interruption. Validations have a concept of context which we could use to decide which ones to run when.

I guess we could assume they're the same kind of validations that would happen in a regular controller flow, without any job involved, for now. If we realize there's a need for deeper more intensive validations, we could add them in the on_start callback to run only once.


Looks great! I have a few comments but overall really neat.

app/controllers/maintenance_tasks/tasks_controller.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/task_data.rb Outdated Show resolved Hide resolved
test/dummy/app/tasks/maintenance/params_task.rb Outdated Show resolved Hide resolved
test/jobs/maintenance_tasks/task_job_test.rb Outdated Show resolved Hide resolved
app/tasks/maintenance_tasks/task.rb Outdated Show resolved Hide resolved
Copy link

@samuelgiles samuelgiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! We were talking in https://github.com/Shopify/arrive-server/pull/17707 about how we needed maintenance_tasks to support parameters to be able to convert some ShipIt tasks and now its here 👏

app/views/maintenance_tasks/tasks/show.html.erb Outdated Show resolved Hide resolved
@dirceu
Copy link

dirceu commented May 14, 2021

Looking at our internal usage, it seems like the most commonly used param types are:

  • String—usually in a constant, so validation should be both straightforward and inherently custom
  • Integer
  • Array of String
  • Array of Integer

The latter two are super common; so common, in fact, that it makes me think that people will just copy that validation regex over and over again. Would it make sense to have first-class support for array of string and array of numbers (maybe with a default, named validation)?

I don't think it's a blocker for this PR though!

@adrianna-chang-shopify
Copy link
Contributor Author

I think we're missing a CSS class, the field is almost invisible:

@etiennebarrie what browser are you using? It renders with a dark outline for me in Chrome:

Screen Shot 2021-05-14 at 10 34 37 AM

I guess we could assume they're the same kind of validations that would happen in a regular controller flow, without any job involved, for now. If we realize there's a need for deeper more intensive validations, we could add them in the on_start callback to run only once.

Good point, I overlooked the fact that these validations would run every time the job interrupts. But, this is potentially a benefit as you mentioned, since they might be evaluated in to a context that changes based on when the job first runs vs when it's being resumed. Let's leave it in #before_perform for now, and revisit moving it to #on_start if they are proving costly.

@adrianna-chang-shopify
Copy link
Contributor Author

@dirceu that's a great idea! The whole reason I went with post_ids for the test Task was because I noticed that Arrays of Integers was one of the most frequent use cases. 😁 It definitely makes sense to simplify this for users and offer them an API that takes care of these details for them, even if ActiveModel::Attributes won't give us Array support directly.

Lemme see what I can whip up

@etiennebarrie
Copy link
Member

what browser are you using?

Firefox!

Let's leave it in #before_perform for now

I feel like this is the least interesting choice. Either we're worried about the potential cost of validations and we do it once before the whole run, or we're not and we can do it at the controller level. "Validation" code can run inside collection to run at each interruption, or inside process to run for each item. So I think overall the most useful place for us to run the validations is directly before creating a Run in the controller. We should just document it clearly so that it's clear.

Would it make sense to have first-class support for array of string and array of numbers

Yeah that would be cool, but either we introduce our own DSL (instead of being a simple Active Model), or we add a type to ActiveModel::Registry, which is too invasive for a gem IMO. The last solution I see would be to have an extensive example creating a Type, but that's a bit overblown I think. If we're worried about the regex being wrongly copied, we could remove it 😄

Active Record deals pretty well with invalid data:

Post.where(id: "2,foo,4".split(",")).to_sql
# => "SELECT \"posts\".* FROM \"posts\" WHERE \"posts\".\"id\" IN (2, 4)"

One other thing I had missed is that this doesn't persist the parameters, so we can't pause/resume unless I'm missing something?

@adrianna-chang-shopify
Copy link
Contributor Author

Okay, might be a Bulma + Firefox thing in that case @etiennebarrie ?

Gotcha, yeah I misinterpreted what you were getting at before re: validations needing to run in certain contexts. Let's just move it into the controller for now.

In terms of types, IMO a nice way to do it would be to to add ActiveModel::Type::Value classes that Task authors can use if they want, without actually adding them to the registry. This is what I've been mocking up:

module MaintenanceTasks
  module Parameters
    class IntegerArrayType < ActiveModel::Type::Value
      def cast(input)
        return unless input.present?

        validate_input(input)
        input.split(",").map(&:to_i)
      end

      private

      def validate_input(input)
        unless /\A(\s?\d+(,\s?\d+\s?)*\s?)\z/.match?(input)
          error_message = "IntegerArrayType expects expects a comma-delimited "\
            "string of integers. Input received: #{input}"
          raise TypeError.new(error_message)
        end
      end
    end
  end
end
module Maintenance
  class ParamsTask < MaintenanceTasks::Task
    attribute :post_ids, MaintenanceTasks::Parameters::IntegerArrayType.new
    validates :post_ids, presence: true
    ....

IMHO this is a much better experience than requiring every user to validate / sanitize input strings from the UI or write their own type.

One other thing I had missed is that this doesn't persist the parameters, so we can't pause/resume unless I'm missing something?

LOL I completely overlooked this 😆 Should we persist params on the run as well? It might be a nice feature to be able to see that data historically in the UI as well (beyond just logging params, making them visible in the UI should increase confidence in terms of being able to track down what happened if someone ran a Task with faulty params)

@adrianna-chang-shopify adrianna-chang-shopify force-pushed the tasks-with-params-active-model branch 8 times, most recently from 096367c to 3c68d96 Compare May 17, 2021 20:18
@adrianna-chang-shopify
Copy link
Contributor Author

I've extracted playing around with ActiveModel::Type::Value objects for complex parameter support here: https://github.com/Shopify/maintenance_tasks/pull/414/files

And then made changes to this PR:

  • Added column to maintenance_tasks_runs so we can persist parameters to the Run
  • Instantiating the Task is now the responsibility of the Run (it no longer happens in the job), and parameter validation happens on creation of the Run
  • Instantiated Run in Runner with parameters, removed the passing around of parameters to the Job class

cc @etiennebarrie ready for some more feedback 🙇‍♀️

Copy link
Member

@etiennebarrie etiennebarrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main things:

  • CLI support
  • parameters vs arguments naming

But it's looking great! ✨

app/controllers/maintenance_tasks/tasks_controller.rb Outdated Show resolved Hide resolved

attr_readonly :task_name

serialize :backtrace
serialize :parameters, Hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it still gets saved as:

--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
post_ids: '4,2'

Can we turn the arguments Hash into a proper Hash somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not support a hash with indifferent access though, since this is how Rails represents hashes by default?

app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/runner.rb Outdated Show resolved Hide resolved
app/views/maintenance_tasks/tasks/show.html.erb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
test/dummy/app/tasks/maintenance/params_task.rb Outdated Show resolved Hide resolved
test/dummy/app/tasks/maintenance/params_task.rb Outdated Show resolved Hide resolved
@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie addressed most of your comments, there were a couple I'm not sure about, feel free to follow up 😄

Re: CLI support, I think it should be almost all the way there (probably just need to add the option + document it in the CLI?) , but I'd prefer to ship that separately and move forward with this.

@adrianna-chang-shopify adrianna-chang-shopify force-pushed the tasks-with-params-active-model branch 3 times, most recently from 86967ed to c2cb0b0 Compare May 25, 2021 16:20
Copy link
Member

@etiennebarrie etiennebarrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not a fan of persisting ruby/hash:ActiveSupport::HashWithIndifferentAccess in the database, and I think we should support CLI before shipping a new gem version supporting tasks with parameters, but we can figure it out later.

We still sleep where we shouldn't though.

test/application_system_test_case.rb Outdated Show resolved Hide resolved
@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie we could serialize it to JSON if you'd prefer?
Yes, definitely intend to have CLI support prior to cutting a new version, but this PR is large enough as is and would prefer to start a clean PR for that 😄

@etiennebarrie
Copy link
Member

we could serialize it to JSON if you'd prefer

Maybe that would make more sense 🤔

app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
test/models/maintenance_tasks/run_test.rb Outdated Show resolved Hide resolved
test/models/maintenance_tasks/run_test.rb Outdated Show resolved Hide resolved
@etiennebarrie
Copy link
Member

Since we have parameters, should we still prevent running multiple tasks with different arguments? It probably requires a few changes because we've always assumed "one Task = zero or one Run", but conceptually, two runs with different arguments are similar to two different Tasks.
It came up to me while writing about #422, users could work around the lack of parallelism by having start/end parameters and running the same Task multiple time with different arguments.

I don't think it's reasonable to tackle this here, but just leaving this here.

@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie re: parallelism, I'd like for us to take the time to explore viable solutions for that, with one of them being allowing multiple instances of the same Task to be run concurrently (with args). It does change some fundamental assumptions within our system as you mentioned, so I think we want to commit to it as the single solution we're going with for concurrency, rather than building it in as a temporary workaround if we intend to build out concurrency fully.

With batch support built in, I suspect a lot of the use cases pushing for concurrency support will actually just be able to use batches.

@@ -38,6 +39,12 @@ def run

private

def task_arguments
return {} unless params[:task_arguments].present?
task_attributes = Task.named(params[:id]).attribute_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously a wrong Task name would result in a ActiveRecord::RecordInvalid raised from Runner and error caught in the action, now it will raise here and will result in a 500. We don't have tests for this because we didn't want controller tests and there's no way to test this with the UI (with a browser test, you'd need to see the UI and remove the task before clicking on Run). We could ignore that for now because like I mentioned in a previous review, parts of this logic should move to the Runner, with the CLI job needing similar handling, and fix it once we support arguments from the CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, interesting. Let's leave it for now, I can remove it in the CLI PR if we choose to go that route and we feel it's redundant. Alternatively, we could keep the use of the Strong Parameters API here and just rescue Task::NotFoundError and return {}. I'm suspecting that we may just want to check the keys in validate_task_arguments though, since we're already rescuing on Task::NotFoundError there. But I'll defer to the CLI PR.

app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/runner.rb Show resolved Hide resolved
app/views/maintenance_tasks/tasks/show.html.erb Outdated Show resolved Hide resolved
test/application_system_test_case.rb Show resolved Hide resolved
test/dummy/db/schema.rb Outdated Show resolved Hide resolved
test/system/maintenance_tasks/runs_test.rb Outdated Show resolved Hide resolved
@adrianna-chang-shopify
Copy link
Contributor Author

@etiennebarrie latest round of changes in 😁 Can you let me know:

@adrianna-chang-shopify adrianna-chang-shopify force-pushed the tasks-with-params-active-model branch from 35a4314 to 0d548e3 Compare May 28, 2021 20:31
@adrianna-chang-shopify
Copy link
Contributor Author

I'm going to 🚢 , if anything additional comes up we can fix forward (along with CLI support).

@adrianna-chang-shopify adrianna-chang-shopify merged commit 6c3b8c0 into main May 28, 2021
@adrianna-chang-shopify adrianna-chang-shopify deleted the tasks-with-params-active-model branch May 28, 2021 20:36
@shopify-shipit shopify-shipit bot temporarily deployed to rubygems June 7, 2021 15:25 Inactive
@etiennebarrie etiennebarrie mentioned this pull request Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore using ActiveModel to power Tasks with parameters
5 participants