I prefer to take photos in Canon's 'raw' image format (.CR2) and process them later, producing final images (.JPG) that I'm happy with. Typically, I will end up with only a handful of jpegs that I would like to keep despite having shot a great many photos.
I would like to keep only the raw files that correspond to the jpegs that I've produced; .CR2 files are massive and it's simply not worth the drive space keeping all of the shots taken. If I've photographed a big event I will have shot something like 1000 .CR2 files and, after processing, only around 100 .JPG files will remain (the "Keepers").
My original .CR2 files reside in a directory called "RAWs" and the processed images are placed in a "JPEGs" directory. keepraws.py examines the filenames of the images in the JPEGs directory and goes looking for the appropriate .CR2 file in the RAWs directory. A bit of regex is used because Darktable, my image processing software, names the jpegs somewhat differently from the original raw image filename; For example, the raw image from the camera will be of the form "MB__1435.CR2" and the jpeg will thus be "2017-09-21_21-40-00_MB__1435.jpg". If I've worked-up another jpeg from the same raw file, Darktable will name that file "2017-09-21_21-40-00_MB__1435_01.jpg". Additionally, Darktable will save a metadata file for each raw image and this will be named "MB__1435.CR2.xmp" and, yes, I want these keeping too!
The retained .CR2 and .CR2.xmp files will be copied into a new "KeptRaws" directory; The original raw images are not deleted by the script.