Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Purge Threshold #256

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

ChrisLeinbach
Copy link

@ChrisLeinbach ChrisLeinbach commented Dec 13, 2024

Problem

Currently, the threshold at which files are purged from disk is hard coded at 95%. Depending on users and systems this can be either too high or too low. Editing the threshold to adjust for user preferences would have to be done in the script where the disk cleanup is run. This is less than ideal because it puts this control out of scope for less technical users and would be wiped out or cause issues during updates.

Proposed Change

Add a purge threshold setting. This setting allows the user to control at what filled percentage the purge activities are run.

Detailed Description of Changes

  1. Adds a configuration item to advanced.php to set the purge threshold. Also adds relevant description and warnings.

    • The current setting allow a range of 20-99 percent. These felt sensible to me but I'm open to changing them.
    • Note: The purge threshold is still active when the user has set Keep mode. This does present a risk that a user will set a low threshold while in keep mode and the services will be shut down sooner than they expect. This may warrant a more advanced implementation if it is felt that this is a sufficiently large risk.
  2. Adds a snippet to update_birdnet_snippets.sh to set the default purge threshold. The default matches the threshold that was previously hardcoded.

  3. Changes the space equivalence check in disk_check.sh to use the newly defined purge threshold.

  4. Adds the purge threshold to install_config.sh

Adds a purge threshold setting. This setting allows the user to
control at what filled percentage the purge activities are run.
This replaces the default, hard coded 95% point.

The purge threshold defaults at the original 95%. I set a minimum
of 20 and max of 99 because those values felt sensible but am open
to changing those based on feedback.

Note: The purge threshold is still active when the keep option is
set. I added a note for this but this still presents some risk
where users who change this while in Keep mode could have their
services shut down earlier than they expect.
Adds a purge threshold setting. This setting allows the user to
control at what filled percentage the purge activities are run.
This replaces the default, hard coded 95% point.

The purge threshold defaults at the original 95%. I set a minimum
of 20 and max of 99 because those values felt sensible but am open
to changing those based on feedback.

Note: The purge threshold is still active when the keep option is
set. I added a note for this but this still presents some risk
where users who change this while in Keep mode could have their
services shut down earlier than they expect.

Patch: Fix a couple of typos in initial changes and improve
formatting.
@alexbelgium
Copy link

Hi, indeed looks useful for people who want to avoid being at limit of their disks ; or who share their system with data storage with another app

@Nachtzuster
Copy link
Owner

the 'Amount of files to keep for each species' setting is meant to further limit the space if that 95% safety valve does not do.
Have you considered using that?

@ChrisLeinbach
Copy link
Author

the 'Amount of files to keep for each species' setting is meant to further limit the space if that 95% safety valve does not do.

Have you considered using that?

I had tried it and admittedly wasn't very patient nor did I take the time to debug it but for whatever reason it didn't clear up my usage, at least as fast as I had expected. By the time this drew my attention, it was a problem for my Pi so I was trying to take the most direct path.

I'm going to change my threshold back to 95% and set it to keep all files and let it build up some data over the next few days. I'll report back on that as an issue if my debugging of the 'keep x files' functionality truly isn't working as it should be.

Having said that, it feels a bit off to me that two parallel but unlinked methods of managing disk space are implemented. Im wondering if there is appetite for a rewrite of the disk management scripts into a single authoritative script?

@alexbelgium
Copy link

alexbelgium commented Jan 3, 2025

the 'Amount of files to keep for each species' setting is meant to further limit the space if that 95% safety valve does not do.
Have you considered using that?

I had tried it and admittedly wasn't very patient nor did I take the time to debug it but for whatever reason it didn't clear up my usage, at least as fast as I had expected. By the time this drew my attention, it was a problem for my Pi so I was trying to take the most direct path.

Hi, the cron job only every day at 2am. In theory this should be enough. Watch-out that it all depends on the number of birds : if you have 100 species with 100 recordings of mp3 at 4mo, that makes 40 Go :-)

Execute this script and it will show you the number of recordings available for each of your species. Please keep in mind that you will often have more files than the minimum set : the files from the last 30 days are protected, as well as all files with a no-purge tag.

#!/bin/bash

source /etc/birdnet/birdnet.conf
base_dir="$HOME/BirdSongs/Extracted/By_Date"
cd "$base_dir" || true

# Get unique species
bird_names=$(
    sqlite3 -readonly "$HOME"/BirdNET-Pi/scripts/birds.db <<EOF
.mode column
.headers off
SELECT DISTINCT Com_Name FROM detections;
.quit
EOF
)

# Sanitize the bird names (remove single quotes and replace spaces with underscores)
sanitized_names="$(echo "$bird_names" | tr ' ' '_' | tr -d "'" | grep '[[:alnum:]]')"
# Remove trailing underscores
sanitized_names=$(echo "$sanitized_names" | sed 's/_*$//')

# Create an associative array to store species and their file counts
declare -A species_file_counts

# Read each line from the variable and count the files for each species
while read -r species; do
    species_san="${species/-/=}"
    file_count=$(find */"$species" -type f -name "*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]*.*" \
        -not -name "*.png" | wc -l)
    species_file_counts["$species"]=$file_count
done <<<"$sanitized_names"

# Sort the species by file count in descending order and print them
for species in "${!species_file_counts[@]}"; do
    echo "$species : ${species_file_counts[$species]}"
done | sort -t ':' -k2 -nr

@ChrisLeinbach
Copy link
Author

Hi, the cron job only every day at 2am

This is the part I was missing. For whatever reason I assumed that it ran as frequently as the disk check script. I switched to this repo after discovering the issue and realizing mcguirepr89's was unmaintained. The 'keep x files' would have never run based on my series of events.

alexbelgium added a commit to alexbelgium/BirdNET-Pi that referenced this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants