Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --archive save the links to the downloaded songs as soon as the songs are downloaded. Closes #2196 #2220

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

Shajal-Kumar
Copy link

Title

Write downloaded songs to the archive file incrementally to prevent data loss on interruption
or
Make --archive save the links to the downloaded songs as soon as the songs are downloaded #2196

Description

This PR modifies the download/downloader.py and utils/archive.py modules to ensure that the archive file is created as soon as the script runs and updated incrementally. The initialize and add_entry methods have been added to archive.py to create the archive file if it does not already exist and add an entry to the archive file while flushing it immediately, respectively. Additionally, the download_multiple_songs method in the Downloader class has been updated to call add_entry as soon as each song is successfully downloaded. This change ensures that even if the download process is interrupted, the archive file will contain the URLs of all songs downloaded up to that point.

Related Issue

This change addresses issue #2196 where the archive file was not being updated during the download process, causing data loss if the process was interrupted.

Motivation and Context

This change is necessary to prevent data loss when using the --archive flag. Previously, the archive file was not created at the beginning of the script and if the download was interrupted, the archive file remained empty, even if most of the songs had been downloaded. By writing to the archive file incrementally, we ensure that users have an up-to-date record of all successfully downloaded songs, even in case of an unexpected interruption.

How Has This Been Tested?

Manually tested by downloading a playlist with songs that would not be present on the download source and interrupting the process at various points to ensure the archive file contains all URLs of downloaded songs.
Verified that the archive file is created at the start of the download process and updated immediately after each successful download.

Screenshots (if appropriate)

SpotDL 1
SpotDL2

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the CONTRIBUTING document
  • I have read the CORE VALUES document
  • I have added tests to cover my changes
  • All new and existing tests passed

@Silverarmor Silverarmor changed the base branch from master to dev November 3, 2024 02:46
@Silverarmor Silverarmor requested a review from xnetcat November 3, 2024 02:46
@Silverarmor
Copy link
Member

Thanks @Shajal-Kumar - I've changed the base to dev branch, can you resolve conflicts in downloader.py?

@Shajal-Kumar
Copy link
Author

@Silverarmor I've resolved the conflicts in downloader.py. Can you please check it?

@Shajal-Kumar
Copy link
Author

Hi @Silverarmor, I wanted to follow up and ask if there are any other changes that I need to make.

@Silverarmor
Copy link
Member

I've requested a code review from @xnetcat who will look into it when they are free :)

@xnetcat
Copy link
Member

xnetcat commented Nov 14, 2024

This won't save the data incrementally. You would have to move the add entry code to the search and download function, while making sure that only one thread has access to the file at the time.

Copy link
Member

@xnetcat xnetcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as stated in the comment

@Shajal-Kumar
Copy link
Author

Hello @xnetcat, I'll make the required changes to the search_and_download function. Could you please tell me how I'm supposed to ensure that only one thread has access to the file at a time? Should I separately create a thread lock or do I have to use a part of the existing code while implementing the incremental updates to the archive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants