Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_external_file_to_file_set vs. add_file_to_file_set issues #332

Open
atz opened this issue Aug 1, 2017 · 3 comments
Open

add_external_file_to_file_set vs. add_file_to_file_set issues #332

atz opened this issue Aug 1, 2017 · 3 comments

Comments

@atz
Copy link
Contributor

atz commented Aug 1, 2017

Serious copy-pasta, 90+% similarity:

Also both are classes, but each defines two other classes inside the outer one. That is something we conventionally do in module namespace, but not in classes. The service classes obscure that that they are actually mostly about Updater classes. That's where much of the commonality is, and that should be broken out or otherwise reused.

Ideally, these should be refactored to use a common base class or establish an inheritance.

Other problems:

type param

YARD says:

@param [RDF::URI or String] type URI for the RDF.type that identifies the file's role within the file_set

but that eventually gets passed to type_to_uri that can raise ArgumentError with the message 'Invalid file type. You must submit a URI or a symbol.' String or symbol, which is it? RDF::URI doesn't care since it just wants #to_s, in fact always calls #to_s on the first param received. Why do we care more that it does? (but not enough to match docs and error message?)

Transactionality

In production these methods will routinely be called in parallel on the same fileset, with batches of hundreds or thousands of objects filling as many job workers as are available. So why would it work to do (in #persist):

if current_file.new_record?
  file_set.save

when the file_set being saved has no guarantee of not overwriting changes from the 11 other processors saving divergently modified versions of what was initially the same object? If there is already some magic in place to avoid this, I haven't found it.

Keep in mind, for our generated FileSet class, the number of ancestors distinct from regular ruby Object ancestors, (FileSet.ancestors - Object.ancestors).count, is 84. Would you be able to guess which of them received the .save in question? The odds are good you would be wrong. Turns out it is ActiveFedora::Associations::Builder::Orders::FixFirstLast, which gets dynamically patched onto our model here:
https://github.com/samvera/active_fedora/blob/master/lib/active_fedora/associations/builder/orders.rb#L38

I can get that far, but of course it calls super (in what is sure to be a long chain), but which one of them maybe applies magic transactional pixiedust, I can't say, but it doesn't matter since it itself calls save again via another method it monkey-patched onto our class.

@atz
Copy link
Contributor Author

atz commented Aug 1, 2017

The risk of FileSet transactionality for Hyrax IngestJob may be limited by filesets being created per upload, but this is still an architectural flaw. During migrations, for example, a dozen add_file_to_file_set calls could quite plausibly be entertained on the same fileset in parallel.

@atz
Copy link
Contributor Author

atz commented Aug 1, 2017

The Updater classes do:

if type.instance_of? Symbol

in one place and in another:

case type ... when String

This confused sniffing leads to actual logical/behavioral differences.

@atz
Copy link
Contributor Author

atz commented Aug 4, 2017

Note, in certain configurations, the #read required of the file param is actually #read(INTEGER, STRING)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant