Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: AudioStim.close to not breed ffmpeg processes by moviepy #252

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Contributor

As initially enjoyed by @caravanuden while observing "Too many open files" crash message while iterating over the list of audio files with a few hundreds of them in a simple code like:

from __future__ import print_function
import sys
import os
import time
from glob import glob
from pliers.stimuli import VideoStim, AudioStim
from pliers.extractors import MelspectrogramExtractor


spect = MelspectrogramExtractor()
PID = os.getpid()
print("PID: %d" % PID)
for i, f in enumerate(sys.argv[1:]): # [:1]*1000):
    print("%4d %30s \t#file: %d" % (i, f, len(glob('/proc/%d/fd/*' % PID))), end='\r')
    sys.stdout.flush()
    audio_stim = AudioStim(filename=f, sampling_rate=44100)

    # get spectrogram
    spectral_stim = spect.transform(audio_stim)
    audio_stim.close()

We had to add the .close() at the end to prevent breeding of ffmpeg zombies. I am not quite certain why gc didn't pick those up -- something might be holding on to those instances, thus not calling any of the __del__s. (in general in python3 it is not reliable to rely on those -- attributes might disappear before actual content of them thus not being able to close them at all). Overall I was not sure about the life cycle of any object there so this is a minimalistic workaround which worked for us

@tyarkoni
Copy link
Collaborator

tyarkoni commented Feb 5, 2018

Thanks! One concern though is that without a matching open() method, this seems incomplete. If you close() a clip this way, without destroying the whole stim, then a user can potentially try to pass it through a Transformer again later, and it will break unexpectedly. Could you add an open() method that just wraps _load_clip (maybe with a check to see if it's already loaded, and then the initializer can point to open)? Thanks!

@tyarkoni
Copy link
Collaborator

tyarkoni commented Feb 5, 2018

Oh, and we should probably also do the same thing for VideoStim objects. Happy to add it myself if you don't have time.

@caravanuden
Copy link

I'll handle it! I'll simply add open() and close() methods for VideoStims and AudioStims.

@coveralls
Copy link

coveralls commented Feb 6, 2018

Coverage Status

Coverage decreased (-12.9%) to 79.657% when pulling 0ba932d on yarikoptic:bf-ffmpeg-audiofiles into 8dae7ec on tyarkoni:master.

@yarikoptic
Copy link
Contributor Author

yarikoptic commented Feb 6, 2018

sorry, just got to the normal keyboard again.
Sure, you are welcome to pick up and fix it up the best way you see it fit! Few comments though:

  • _load_clip is invoked by the constructor, so thus avoiding open (unless provided outside instance, and in that case probably shouldn't be closed inside)
  • you could benefit from exposing Stim's as a context manager, defining something like
    def __enter__(self):
        if not self.clip:
            self.open()
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

within the base Stim class (didn't look in detail if all of them have .clip and .open etc) so ppl could use constructs such as

with AudioStim('filename') as stim:
   stim.dousefulstuff()

and do not need to mess around with open/close dance

  • looked "deeper". So -- the whole exhaustion is from the use of @_memoize with caches every transform result by default, thus slowly growing that cache of TransformResults which carry .stim so original moviepy instance is held along with its zombified by then ffmpeg etc. Thus another "more proper" way to avoid it for @caravanuden is
from pliers import config
config.set_option('cache_transformers', False)

but I wonder why caching is enabled (not disabled) by default. Also it might be worth make it so it would cache up to a specified number of transformers, with the oldest used last time kicked out first -- would then scale better for larger use-cases i guess

@tyarkoni
Copy link
Collaborator

tyarkoni commented Feb 6, 2018

Thanks for the comments, this is helpful!

I agree that implementing __enter__ and __exit__ makes sense.

Re: caching, the reason it's on by default is that many of the Extractor classes cost money to use, so it made sense to me to memoize the results by default--otherwise we're just costing our users (and ourselves) money unnecessarily. The better approach would be to set the defaults on a class-specific basis, or at least, to have a separate caching settings for paid classes. If nothing else, we should probably add a class-level flag indicating whether a service is free or not.

On the topic of caching, we should probably move to disk-based caching--or at least, make that an option. I initially started out using joblib, but there were some issues I couldn't easily resolve that prompted the move to a simpler solution (my vague recollection is it had to do with pickling).

@tyarkoni
Copy link
Collaborator

tyarkoni commented Apr 2, 2018

Any update on this, @yarikoptic @caravanuden? If you don't have time to get to it, that's fine--@qmac or I can probably get to it soon. Just let us know, as it does seem like a fairly high priority to fix. Thanks!

@adelavega
Copy link
Member

This looks pretty stale to me. Okay to close?

@tyarkoni tyarkoni closed this Nov 12, 2020
@yarikoptic
Copy link
Contributor Author

It was stale but there were no indication here that it was fixed one way or another (it might have been, I do not follow the development)

@tyarkoni tyarkoni reopened this Nov 12, 2020
@tyarkoni
Copy link
Collaborator

don't ever change, @yarikoptic

@yarikoptic
Copy link
Contributor Author

"NEVER!" said Yarik who closed a few dozen stale bugs in DataLad in the past week ;) (in my defense -- to the best of my knowledge they all were either addressed, superseded, or no longer pertinent ;))

@adelavega
Copy link
Member

hey, in my defense there's a difference between closing a PR, and closing the issue ;)

@yarikoptic
Copy link
Contributor Author

Hey - initial description was so rich that I totally missed that it was a PR! Should we convert it to an issue? ;-) or may be there is no issue? Anyways, I guess I was just missing you guys add jumped onto an opportunity for a chat ;-) feel free to re close

@adelavega adelavega closed this Sep 21, 2022
@adelavega adelavega reopened this Sep 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2022

Codecov Report

Base: 75.64% // Head: 75.59% // Decreases project coverage by -0.05% ⚠️

Coverage data is based on head (711c99b) compared to base (6e5e23b).
Patch coverage: 25.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
- Coverage   75.64%   75.59%   -0.06%     
==========================================
  Files          65       65              
  Lines        3876     3880       +4     
==========================================
+ Hits         2932     2933       +1     
- Misses        944      947       +3     
Impacted Files Coverage Δ
pliers/stimuli/audio.py 90.69% <25.00%> (-6.74%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@adelavega adelavega marked this pull request as draft November 14, 2022 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants