Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning a file does not alter file name or recharacterize file #1152

Closed
vantuyls opened this issue Jun 6, 2017 · 24 comments · Fixed by #5475
Closed

Versioning a file does not alter file name or recharacterize file #1152

vantuyls opened this issue Jun 6, 2017 · 24 comments · Fixed by #5475
Assignees
Labels
Milestone

Comments

@vantuyls
Copy link

vantuyls commented Jun 6, 2017

Descriptive summary

when loading a new version of a file, the new filename is not reflected on the file showpage and the file does not appear to be characterized.

This is problematic because a user downloading the file ought to be able to trust that the file they are downloading is the one reflected on the show page.

Hyrax Master

Expected behavior

New version's filename should be reflected and the file should be characterized.

Actual behavior

old file's name remains as does old file's characterization.

Steps to reproduce the behavior

  1. upload a file
  2. version the file with a different file name and format
  3. see that the file name and format for the new version are still for the old file
@vantuyls vantuyls added the bug label Jun 6, 2017
@mjgiarlo mjgiarlo added this to the 2.0.0 milestone Jun 6, 2017
@mjgiarlo
Copy link
Member

mjgiarlo commented Jun 6, 2017

@vantuyls I could not reproduce this with master. I was able to upload a JPG, then add a version with a PNG, then a version with a GIF, and each time:

  1. Thumbnail for the FS (and Work, as this was a single-FS Work) changed.
  2. Download filename changed.
  3. Characterization information changed.

Do you have a fresh, vanilla copy of the latest in master? Can you try again, good sir?

@mjgiarlo
Copy link
Member

mjgiarlo commented Jun 6, 2017

What didn't change was the FileSet's title. Not sure it should, though.

@vantuyls
Copy link
Author

vantuyls commented Jun 7, 2017

blarg to it working. (but also yay!)

as for the fileset title not changing, i'm of two minds on that. first, the fileset title is just the filename at this point, and if you don't change the fileset title to match the filename you run into a (dangerous, i think) problem of people thinking they're getting file.txt when really they're getting file.exe. at the very least, this is potentially confusing, at the most it's a security issue.

@jcoyne
Copy link
Member

jcoyne commented Jun 7, 2017

When you download a file, it's giving you the file name from the characterization metadata, not from the FileSet, so I wonder if this is really an issue?

@vantuyls
Copy link
Author

vantuyls commented Jun 7, 2017

oop, our versioning is working now. but the filename...

@mjgiarlo
Copy link
Member

mjgiarlo commented Jun 7, 2017

@vantuyls Users are also free to change FileSet titles. When I upload a new version, would you expect the new filename to replace a user-entered title? I'm hesitant to get in the business of conditional replacement of the title, i.e., to "sniff" the former title to see if it's filename-like.

Might it be sufficient to alert the user via a flash message when a new version is uploaded, recommending to them that they review the FileSet's title to see if it needs changing? Since this is getting into UX, I'm tagging @ggeisler and @samvera-labs/hyrax-ui-ux-advisors for their advice.

@vantuyls
Copy link
Author

vantuyls commented Jun 7, 2017

@jcoyne i think it still is.

i still think that the disconnect between the name one sees on the file showpage (especially because it is presented like a filename) and what one gets when they download is unfortunate. I'm willing to defer to others on this, though.

@mjgiarlo
Copy link
Member

mjgiarlo commented Jun 7, 2017

@vantuyls I'm thinking we should discuss this in a new issue, on the -tech list, or on a tech call. Can you make the tech call in 5m? 😃

@kerchner
Copy link
Contributor

I've observed what may be a related issue, although I'm not sure that in actual usage, this would be a realistic use case. But for what it's worth, it might uncover other issues:

  1. Upload a fileset to a work, with a file that successfully characterizes and "derivatizes" - for example, a JPG or PDF. Note that the thumbnail image (when ready) shows successfully.
  2. Upload a new version of the same fileset, with a file for which a thumbnail is not generated; for instance a text file. While the new version can be correctly downloaded, the item is now shown but with the thumbnail image from the previous version. The expected behavior is that it should show with the "no thumbnail" default image (i.e.
    image)

@vantuyls
Copy link
Author

also, @mjgiarlo, i do not recall what the result of this conversation was a few weeks back. Do we need to revisit this?

@mjgiarlo
Copy link
Member

@kerchner Yep, that's a new bug. Would you mind writing that up as a new issue? Thanks!

@mjgiarlo
Copy link
Member

@vantuyls I think I'm still here: #1152 (comment)

@vantuyls
Copy link
Author

i'm fine with a message asking./reminding to rename the file.

@mjgiarlo
Copy link
Member

@vantuyls OK. Do you think we ought to whip that into a shiny new issue sans all the comment baggage?

@kerchner
Copy link
Contributor

kerchner commented Jul 14, 2017

@mjgiarlo I've created #1350

@mjgiarlo
Copy link
Member

Thx @kerchner!

@vantuyls vantuyls modified the milestones: 2.0.0, Backlog Sep 20, 2017
@mjgiarlo mjgiarlo added this to the Backlog milestone Sep 21, 2017
@vantuyls
Copy link
Author

@mjgiarlo so, a few things are going on with versioning that are related to this issue. we may want to move this into a new issue (?):

  1. When uploading a new version, the characterization information for the new version appears to be adding to the old characterization information (see below).
  2. Date Modified metadata does not change, and i believe it should reflect the date the new version was selected/uploaded.
  3. I still have reservations about the filename not changing to reflect the filename of the item that was uploaded. While this may seem minor, at the very least it is misleading.

screen shot 2017-12-20 at 12 44 23 pm

@mjgiarlo
Copy link
Member

@vantuyls Agree that we at least need new issues for (1) and (2), and it may be worth bundling those two issues plus #1350 into a subsequent release.

@vantuyls
Copy link
Author

@no-reply i'd like to make the change 1, 2, and 3 in my comment from Dec 20, 2017 above. thoughts?

@conorom
Copy link
Contributor

conorom commented Apr 9, 2019

I've been keeping an eye on this issue. Probably what is being done now tries to pay deference to keeping the FileSet object as a sort of "aggregator", given the fact that it can technically have more than just the original_file relation, and each relation can have multiple versions, I presume. Which is fine, in the sense that if you want to do fancy stuff you need to figure that out on your own.

However, from what I've seen in this issue (and some others in our project), I think Hyrax users essentially expect to see the same behavior when a FileSet gets a new version as when the FileSet was created with its original original_file (a fun phrase) in the first place. I think that would be a good starting point, personally. Such that the "recharacterization" is done/stored as cleanly as possible. Old file versions are in "cold storage" in Fedora and their associated values are not shown in Hyrax at all.

At a minimum, the characterization metadata for the previous version should be blown away completely. There's no point hoarding it on the parent FileSet object in my view, especially when the hoarding is being done in ActiveTriples::Relation "arrays"/objects that can't guarantee order. What I mean is that, given the fact the FileSet's Solr document is what you have to work with, there are a lot of dodgy assumptions happening in the FileSetIndexer. You end up with arrays of things like file sizes, checksums, image heights (in one array), image widths (in another) and the indexer pulls first or [0] for each, with no guarantee that the height you get goes with the width you get, not to mention whether any of the indexed values apply to the current version of original_file.

solr_doc['file_size_lts'] = object.file_size[0]

solr_doc['height_is'] = Integer(object.height.first) if object.height.present?

object.original_file.digest.first.to_s

That can't be right? In fact we empty out these original_file "arrays" in heliotrope in an overwritten CharacterizeJob to try to get behavior that suits our needs:
https://github.com/mlibrary/heliotrope/blob/db4014b05cae3c6825991b31a1e900edcbb1fde1/app/jobs/characterize_job.rb#L17

I'm also not certain why adding previous versions' format_label values together on the FileSet makes sense either, which is what @vantuyls is getting at above. Again, ordering will be random. Is there a scenario where that could be useful?

"#{object.mime_type.split('/').last} (#{object.format_label.join(', ')})"

Basically I'm writing this comment (sorry it's so long) because I'm about to add more metadata deletions to our overwritten/custom CharacterizeJob.

Regarding the file name, it seems to me we're talking about both title and label. I agree with @mjgiarlo that title should be left alone and not sniffed (user settable). Also I agree with @jcoyne that the downloaded name is pulled correctly from the characterization metadata, using file.original_name here in the absence of a specific "filename" param. So that's fine at the moment:
https://github.com/samvera/hydra-head/blob/e239ac9427b584967e7ef4413abd00989c79ff0e/hydra-core/app/controllers/concerns/hydra/controller/download_behavior.rb#L83

However, the parent FileSet's label is set from the original_file file name at creation time and not editable:

file_set.label ||= label_for(file)

This is used on the Work show page, and probably in other places, including as a subsequent fall-back for download name in that line of DownloadBehavior I just linked to above. I think this definitely should be changed with the new version's file name. Or I can't see why it wouldn't be. Same for modification date.

Finally, the "set title to label if there's no title" thing in FileSet actor will not be needed if and when the Hydra::Works issue below comes to pass. The thinking there should be kept in sync, I feel.

file_set.title = [file_set.label] if file_set.title.blank?

samvera/hydra-works#325

@rjkati
Copy link

rjkati commented May 17, 2021

In Hyrax 3.0.2, I can upload a new version of the file with a different filename. When I download the new version, it has the correct filename. However, the correct filename does not display in the Hyrax file list.

@jlhardes
Copy link
Contributor

The new file version should update the metadata stored in the FileSet so if the file name has a change, that should be reflected in the FileSet metadata. This metadata is editable so a message suggesting to check that metadata might be helpful but if there are changes, those changes need to be reflected in the metadata about the file.

@jlhardes
Copy link
Contributor

The characterization of the new file does replace the old file characterization so that is improved. There is no alert to check that the Fileset title matches the filename of the new file version but since it is editable I can be OK with this issue closing.

@conorom
Copy link
Contributor

conorom commented Mar 11, 2022

@jlhardes I did mean to come back and reply to your last message but got lazy once the PR was merged.
Rather than sniff title to see if it used to be a filename or some sort, the code (which has been running in heliotrope/Fulcrum for years with no complaints) just goes "if title was the same as label before this characterization, make it the same afterwards too".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants