Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Download all my photos"-button for face detection #3173

Open
JobDoesburg opened this issue Jun 7, 2023 · 3 comments
Open

"Download all my photos"-button for face detection #3173

JobDoesburg opened this issue Jun 7, 2023 · 3 comments
Labels
app:facedetection Issues regarding the facedetection-app feature Issues regarding a complete new feature priority: low Should be dealt with when nothing else remains.
Milestone

Comments

@JobDoesburg
Copy link
Contributor

Is your feature request related to a problem? Please describe.

I want to download all photos of me

Describe the solution you'd like

A "Download all my photos"-button

Motivation

Describe alternatives you've considered

Additional context

@JobDoesburg JobDoesburg added priority: low Should be dealt with when nothing else remains. feature Issues regarding a complete new feature app:facedetection Issues regarding the facedetection-app labels Jun 7, 2023
@DeD1rk
Copy link
Member

DeD1rk commented Aug 20, 2023

This would be really nice to have. There's also #2770.

I think we could tackle both of them by writing an AWS Lambda that zips a list of s3 objects, and saves it to S3. The original problems for #2770 then no longer apply.

I think it would be nice to keep track of those archives with a model, for zipped albums as well as face detection photos. Then we can:

  • make sure they're recreated when needed (when a photo is added/deleted/(un)hidden from an album, or when a user requests their facedetection photos and the matches have changed).
  • make sure we don't store large orphaned archives when new ones have been created.
  • limit people from requesting their facedetection photos too often (the matches can vary quite frequently and creating archives would probably be very resource-intensive).
  • pre-generate the zips for albums

We could keep track of the exact files that need to be included in an archive by hashing the photos (filenames or even content hashes), and storing that on the zip model. It's really easy and efficient to check whether the stored archive is up to date.

For the UI, when an up-to-date archive already exists, a link to that can just be rendered. For facedetection if an archive doesn't exist yet, we can have JS do an API call to trigger the creation, and either poll until it's done, or have a slow hanging response if we await for lambda completion in the response cycle.

@DeD1rk
Copy link
Member

DeD1rk commented Aug 20, 2023

@T8902 T8902 self-assigned this Jun 9, 2024
@T8902 T8902 added this to the Release 56 milestone Nov 4, 2024
@T8902 T8902 removed their assignment Nov 4, 2024
@T8902 T8902 modified the milestones: Release 56, Release 57 Nov 20, 2024
@DeD1rk
Copy link
Member

DeD1rk commented Nov 23, 2024

@LucAngevare here's some inspiration (definitely not ready to copy-paste) for how parts of the backend could look (this is for doing an album, facedetection would be similar of course):

class DownloadZip(models.Model):
    token = models.CharField(
        max_length=40,
        default=secure_token,
        editable=False,
        help_text="Token used by a Lambda to authenticate "
        "to the API to submit encoding(s) for this source.",
    )

    digest = models.CharField(
        max_length=40,
        editable=False,
        help_text="Digest of the (digests of the) photos in the ZIP file.",
    )

    file = models.FileField(null=True, upload_to="downloads/zip/")


def _get_download_zip_digest(photos):
    """Return a digest over the digest of the provided queryset of photos.

    To make the digest reproducible, the queryset of photos is ordered by the
    photos' individual digests. The SHA1 digest over these represents the files encoded
    in a ZIP file and can be used to check if the ZIP file is up to date.
    """
    hash = hashlib.sha1()
    for photo_digest in photos.values_list("_digest", flat=True).order_by("_digest"):
        hash.update(photo_digest.encode())
    return hash.hexdigest()


def create_download_zip(album: Album):
    photos = album.photo_set.all()
    zip_digest = _get_download_zip_digest(photos)

    album.download_zip = DownloadZip(digest=zip_digest)
    album.download_zip.save()
    album.save()

    # Trigger creating the ZIP file.
    if settings.PHOTOS_ZIP_LAMBDA_ARN is None:
        logger.warning(
            "No ZIP Lambda ARN has been configured. ZIP file will be created locally."
        )

        # TODO: local version or maybe celery task (open and zip the files, save it and process the fact that it's done).

    else:
        s3_client = boto3.client(
            service_name="s3",
            aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
            aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
        )

        # Get a presigned request that allows the lambda to upload the ZIP file directly to S3.
        presigned_post_data = s3_client.generate_presigned_post(
            settings.AWS_STORAGE_BUCKET_NAME,
            "downloads/album/" + album.slug + ".zip",
            Fields={"acl": settings.AWS_DEFAULT_ACL, "Content-Type": "application/zip"},
            Conditions=[
                {"acl": settings.AWS_DEFAULT_ACL},
                {"Content-Type": "application/zip"},
            ],
            ExpiresIn=3600,
        )

        lambda_client = boto3.client(
            service_name="lambda",
            aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
            aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
        )

        # Invoke the lambda to create the ZIP file.
        response = lambda_client.invoke(
            FunctionName=settings.FACEDETECTION_LAMBDA_ARN,
            InvocationType="Event",
            Payload=json.dumps(
                {
                    "api_url": settings.BASE_URL,
                    "token": album.download_zip.token,
                    "upload": presigned_post_data,
                    "files": [
                        {"url": photo.file.url, "name": f"{i:04d}"}
                        for i, photo in enumerate(photos)
                    ],
                }
            ),
        )

        if response["StatusCode"] != 202:
            raise Exception("Lambda response was not 202.")

It'd probably be pretty easy to start off without an actual AWS Lambda and make it run in the webserver (as a celery task). Whenever an album is saved, we can then do simply:

  • Compute the (new) digest of the album.
  • Check if the album has a DownloadZip and if that has the right digest.
  • If not, delete the old one and trigger creating a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app:facedetection Issues regarding the facedetection-app feature Issues regarding a complete new feature priority: low Should be dealt with when nothing else remains.
Projects
None yet
Development

No branches or pull requests

3 participants