Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move image metadata extraction to own app #552

Closed
tacruc opened this issue Mar 4, 2021 · 9 comments
Closed

Move image metadata extraction to own app #552

tacruc opened this issue Mar 4, 2021 · 9 comments

Comments

@tacruc
Copy link
Collaborator

tacruc commented Mar 4, 2021

Many of our Issues are do to extraction of image metadata. Therefore I would propose to make this part more modular as it is independent form the main purpose and focus of the maps app. Additionally, an independent app makes it easier to change the used implementation to gather the metadata.

What I would an app image_metadata:

  • Which uses the preview generation process to trigger start the extraction process
  • saves the metadata in one big table, or a table layout inspired by the metadata handling of e.g. digikam
  • provides an (DAV) endpoint to query/search for certain images, by location or date

Most methods, which would be required should already exits in our current extraction process or in the https://github.com/gino0631/nextcloud-metadata app.

Related maps Issues:

External Issues, where this app might be helpful:

Any PR is welcome, as I decide to first focus on the outstanding PR's on the core features of maps.

@UweKrause
Copy link

UweKrause commented Mar 9, 2021

It seems like there already was an attempt for this:
https://help.nextcloud.com/t/mediametadata-app-to-extract-and-store-meta-data-from-media-files/1601

Also, there is this app:
https://github.com/gino0631/nextcloud-metadata
Maybe it will be possible to talk to the maintainer, if he is interested in opening up his efforts to other apps?
see: gino0631/nextcloud-metadata#69

@tacruc
Copy link
Collaborator Author

tacruc commented Mar 10, 2021

I have had a look at nextcloud-metadata already and most methods to extract metadata are quite similar. The problem is not really extracting the metadata of a single file. This can be done on request and thats what for my knowledge nextcloud metadata is doing.

But this is not sufficient to ask for all pictures, which have geodata. Therefore the metadata have to get extracted before the request is made and stored in some sorted way.
As the metadata have to get extracted before the request is started, it has to be done in some kind of background job. But this is the point where the question and problems start:

  • On Background job for all pictures, or on backgroundjob per picture

    • On backgroundjob per picture: Memory leaks in the used libraries accumulated and at some point the process crashes. Addtionally the libaries tend to crash on broken images. All the remaining pictures are not scanned and missing on the map
    • On Backgroundjob per picture. On the initial scan millions of backgroundjobs might be created, such that other backgroundjobs are far behind in the loop and are only executed after a long time. Might get worst if cron is not configured correctly.
    • Additionally Crashing backgroundjobs delay for the intervall cron is executed.
  • Which process schedules and create the background jobs

    • For now we have two ways to create backgroundjobs
    • File change events
      • Unfortunately these events so far seam to miss file share events and changes in groupfolders and external stroages and maybe even more, I lost the overview what all is not working.
    • maps:scan-photos
      • Creates one backgroundjob for each picture, which leads again to problems, if users run this command in a cron job or executed it regularly. Scanning multiple terabyte of pictures will take a while. During this time no other backgroundjob is executed.

As far as I remember any issue with pictures not shown on the map, is related to on of the problems above.
Summarizing: for the maps-app the hard part is to execute the extraction once and only once* for all pictures in a reliable way and not the extraction of the metadata itself.
*) As long as the picture is not changed.

@Galbar
Copy link

Galbar commented Apr 6, 2021

the preview generation app has a background job that generates previews incrementally. It somehow keeps track of new/changed files since the last execution. Maybe the background job for pictures/tracks could do something similar?

@tacruc
Copy link
Collaborator Author

tacruc commented Apr 6, 2021

One Idea of mine, was to just the preview generation process and create a preview provider, which extracts the metadata.
It's a little hacky but it might be worth to investigate in this direction.

@GAS85
Copy link

GAS85 commented Oct 1, 2021

How about to improve existing command? E.g. Face Recognition has timeout option occ face:background_job [-t|--timeout TIMEOUT] that usually set lower than cron job execution period. In this case it is easy to control background job behavior by admin.

@killi199
Copy link

killi199 commented Jan 3, 2022

More related maps Issues: #645 and #655

@tacruc
Copy link
Collaborator Author

tacruc commented Mar 3, 2023

Apparently there is some kind of metadata feature currently added to the DAV Api. Therefore this can be closed.

@tacruc tacruc closed this as completed Mar 3, 2023
@pklampros
Copy link

That sounds great! Any issues/PRs you can point us to so that we can follow?

@tacruc
Copy link
Collaborator Author

tacruc commented Mar 4, 2023

Yes for gps extraction:
nextcloud/server#33511
Documentation: nextcloud/documentation#9659

Improvements Ideas and current limitations: nextcloud/server#36809

But there is even more just search for metadata in the server Pull requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants