Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Change function vak.prep.frame_classification.dataset_arrays.make_npy_files_for_each_split to remove spectrogram/audio files from dataset path after making the npy files * Modify prep_spectrogram_dataset so that it no longer makes a directory 'spectrogram_generated_{timenow} -- that way we don't have to delete the directory when we remove the spectrograms after converting to npy files later * Rename get_train_dur_replicate_split_name -> get_train_dur_replicate_subset_name in src/vak/common/learncurve.py * Modify src/vak/prep/frame_classification/learncurve.py to no longer make duplicate npy files for each subset names, and to add subset names in a separate column from split so that we can specify subsets directly in learncurve * Add subset parameter to src/vak/datasets/frame_classification/frames_dataset.py, that takes precedence over split parameter when selecting part of dataframe to use for grabbing samples * Add subset parameter to src/vak/datasets/frame_classification/window_dataset.py, that takes precedence over split parameter when selecting part of dataframe to use for grabbing samples * Rename split parameter of vak.train.frame_classification to subset, and use when making training dataset instance * Use subset inside of src/vak/learncurve/frame_classification.py * Have StandardizeSpect.fit_dataset_path take subset argument and have it take precedence over split when fitting, as with dataset classes * Use split + subset when calling StandardizeSpect.fit_dataset_path in src/vak/train/frame_classification.py * Use subset not split argument when calling training functions for model families in src/vak/train/train_.py * WIP: Use subset with ParametricUMAPDataset (haven't added argument to dataset class yet) * Add function `make_index_vectors_for_each_subset` to src/vak/prep/frame_classification/learncurve.py, rename `make_learncurve_splits` to `make_subsets_from_dataset_df` and have it call `make_index_vectors` * Revise a couple things in docstring in src/vak/prep/frame_classification/dataset_arrays.py * Have audio_format default to none in src/vak/prep/frame_classification/dataset_arrays.py and raise ValueError if input_type is audio but audio_format is None * Fix parameter order of function in src/vak/prep/frame_classification/learncurve.py to match order of dataset_arrays so it's not confusing, and set default of audio_format to None, raise a ValueError if input_type is audio but audio_format is None * In src/vak/prep/frame_classification/frame_classification.py, call make_subsets_from_data_df with correct arguments (now renamed from make_learncurve_splits_from_dataset_df) * Add src/vak/datasets/frame_classification/helper.py with helper functions that return filenames of indexing vectors for subsets of (training) data * Import helper in src/vak/datasets/frame_classification/__init__.py * Use helper functions to load indexing vectors for subsets in classmethod of src/vak/datasets/frame_classification/window_dataset.py * Use helper functions to load indexing vectors for subsets in classmethod of src/vak/datasets/frame_classification/frames_dataset.py * Rewrite functions in src/vak/prep/frame_classification/frame_classification.py -- realize I can just use frame npy files to make indexing vectors, so I don't need input type, audio format, etc. * Fix args to make_indes_vecotrs_for_each_subset and fix how we concatenate dataset_df in src/vak/prep/frame_classification/learncurve.py * Fix how we use subset in FramesDataset.__init__ * Fix how we use subset in WindowDataset.__init__ * Change word 'split' -> 'subset' in src/vak/learncurve/frame_classification.py * Fix docstrings in src/vak/datasets/frame_classification/window_dataset.py * Fix docstrings in src/vak/datasets/frame_classification/frames_dataset.py * Fix a typo in a docstring in src/vak/datasets/frame_classification/window_dataset.py * Fix subset parameter of classmethod for ParametricUMAPDataset class; move logic from classmethod into __init__ although I'm not sure this is a good idea * Rename frame_classification/dataset_arrays.py to frame_classification/make_splits.py and rewrite 'make_npy_paths' as 'make_splits', have it move/copy/create audio or spectrogram files in split dirs, in addition to making npy files, and update the 'audio_path' or 'spect_path' columns with the files in the split dirs * Remove constants from src/vak/datasets/frame_classification/constants.py that are no longer used for 'frames' files * Use make_splits function in src/vak/prep/frame_classification/frame_classification.py * Modify make_dataframe_of_spect_files function in src/vak/prep/spectrogram_dataset/spect_helper.py so it no longer converts mat files into npz files, instead it just finds/collates all the spect files and returns them in the dataframe; any converting is done by frame_classification.make_splits with the output of this function * Fix typo in list comprehension and add info to docstring in src/vak/prep/frame_classification/make_splits.py * Fix imports in src/vak/prep/frame_classification/__init__.py after renaming module to 'make_splits' * Remove other occurrences of 'spect_output_dir' from src/vak/prep/spectrogram_dataset/spect_helper.py, no longer is a parameter and not used * No longer pass 'spect_output_dir' into 'prep_spectrogram_dataset' in src/vak/prep/spectrogram_dataset/prep.py * Remove unused import in src/vak/prep/spectrogram_dataset/spect_helper.py * Add logger statement in src/vak/prep/frame_classification/make_splits.py * Fix src/vak/prep/frame_classification/learncurve.py so functions use either spect or audio to get frames and make indexing vectors * Fix src/vak/prep/frame_classification/frame_classification.py so we pass needed parameters into make_subsets_from_dataset_df * Make x_path relative to dataset_path in src/vak/prep/frame_classification/frame_classification.py, since that's what downstream functions/classes expect * Rename x_path -> source_path in src/vak/prep/frame_classification/make_splits.py * Rename x_path -> source_path in src/vak/prep/frame_classification/learncurve.py * Rewrite frame_classification.WindowDataset to load audio/spectrograms directly from 'frame_paths' * Add FRAMES_PATH_COL_NAME to src/vak/datasets/frame_classification/constants.py * Rewrite make_splits.py to add frames_path column to dataframe, and have frame_classification models use that column always; this way we keep the original 'audio_path' and 'spect_path' columns as metadata, and avoid if/else logic everywhere in dataset classes * Fix WindowDataset to use constant to load frame paths column, and to validate input type, revise docstring * Fix FramesDataset the same way as WindowDataset: load frame paths with constant, load inside __getitem__ with helper function _load_frames, validate input type, fix order of attributes in docstring * Use self.dataset_path to build frames_path in WindowDataset * Use self.dataset_path to build frames_path in FramesDataset, and pass into transform as 'frames_path', not 'source_path' * Rename 'source_path' -> 'frames_path' inside src/vak/transforms/defaults/frame_classification.py * Rename 'source_path' -> 'frames_path' in FrameClassificationModel methods, in src/vak/models/frame_classification_model.py * Rename 'source_path' -> 'frames_path' in src/vak/predict/frame_classification.py * Add SPECT_KEY to common.constants * Fix how StandardizeSpect.from_dataset_path builds frames_path paths, and use constants.SPECT_KEY when loading from frames path * Use common.constants.SPECT_KEY inside _load_frames method of WindowDataset * Use common.constants.SPECT_KEY inside _load_frames method of FramesDataset * Add newline at end of src/vak/common/constants.py * Add FRAME_CLASSIFICATION_DATASET_AUDIO_FORMAT to src/vak/datasets/frame_classification/constants.py * Add function load_frames to src/vak/datasets/frame_classification/helper.py * Have WindowDataset._load_frames use helper.load_frames * Have FramesDataset._load_frames use helper.load_frames * Rename GENERATED_TEST_DATA -> GENERATED_TEST_DATA_ROOT in tests/scripts/vaktestdata/constants.py * Rename GENERATED_TEST_DATA -> GENERATED_TEST_DATA_ROOT in tests/scripts/vaktestdata/dirs.py * Add tests/scripts/vaktestdata/spect.py * import spect module in tests/scripts/vaktestdata/__init__.py * Call vaktestdata.spect.prep_spects in prep section of script tests/scripts/generate_data_for_tests.py * Fix spect_dir_npz fixture in tests/fixtures/spect.py to use directory of just .spect.npz files that is now generated by the generate_test_data script * Add SPECT_NPZ_EXTENSION to src/vak/common/constants.py * Use common.SPECT_NPZ_EXTENSION in src/vak/prep/spectrogram_dataset/audio_helper.py * Fix prep.frame_classification.make_splits to remove any .spect.npz files remaining in dataset_path, that were not moved into splits * Fix vak.prep.frame_classification.learncurve.make_index_vectors_for_subsets to use frame_paths column instead of 'source' paths (audio_path or spect_path) -- so we are using files that definitely exist and are already assigned to splits * WIP: Rewriting unit tests in tests/test_prep/test_frame_classification/test_learncurve.py * WIP: Rewriting unit tests in tests/test_prep/test_frame_classification/test_make_splits.py * WIP: Add tests/test_datasets/test_frame_classification/test_helper.py * Rename specific_config -> specific_config_toml_path * WIP: Rewriting tests/test_prep/test_frame_classification/test_make_splits.py * Add src/vak/prep/frame_classification/get_or_make_source_files.py * Add src/vak/prep/frame_classification/assign_samples_to_splits.py * Rewrite 'prep_frame_classification_dataset' to use helper functions factored out into other modules: get_or_make_source_files and assign_samples_to_splits * Capitalize in docstring in src/vak/prep/spectrogram_dataset/prep.py * Add TIMEBINS_KEY to src/vak/common/constants.py * Finish fixing unit test for vak.prep.frame_classification.make_splits * Add imports in src/vak/prep/frame_classification/__init__.py * Revise docstring of src/vak/prep/audio_dataset.py to refer to 'source_files_df' * Revise docstring of src/vak/prep/spectrogram_dataset/spect_helper.py to refer to 'source_files_df' * Revise docstring of src/vak/prep/spectrogram_dataset/prep.py to refer to 'source_files_df' * Revise src/vak/prep/frame_classification/get_or_make_source_files.py to refer to 'source_files_df', in docstring and inside function * In 'prep_frame_classification_dataset', differentiate between 'source_files_df' and 'dataset_df' * Delete birdsong-recognition-dataset configs from tests/data_for_tests/configs * Fix a docstring in noxfile.py * Remove tests/scripts/vaktestdata/spect.py * Add model_family field in tests/data_for_tests/configs/configs.json, remove configs for birdsong-recognition-dataset * Add model_family field to ConfigMetadata dataclass in tests/scripts/vaktestdata/config_metadata.py * Remove call to vaktestdata.spect.prep_spects() since we are going to call other functions that will make spectrograms * Change parameters order of frame_classification.get_or_make_source_files, add pre-conditions/validators * Fix order of args to get_or_make_source_files in src/vak/prep/frame_classification/frame_classification.py * Add more to docstring of src/vak/prep/frame_classification/get_or_make_source_files.py * Add 'spect_output_dir' and 'data_dir' fields to tests/data_for_tests/configs/configs.json * Rewrite ConfigMetadata dataclass, add docstring and converters, add spect_output_dir and data_dir attributes * Add functions to make more directories in tests/data_for_tests/generated in tests/scripts/vaktestdata/dirs.py * Import get_or_make_source_files in tests/scripts/vaktestdata/__init__.py * Add more constants with names of directories to make in tests/data_for_tests/generated in tests/scripts/vaktestdata/constants.py * Add tests/scripts/vaktestdata/get_or_make_source_files.py * Add 'spect-output-dir/' to data_dir paths in tests/data_for_tests/configs/configs.json * Rename tests/scripts/vaktestdata/get_or_make_source_files.py -> tests/scripts/vaktestdata/source_files.py, rewrite function that makes source files + csv files we use with tests * Fix tests/scripts/vaktestdata/__init__.py to import source_files module, remove import of get_or_make_source_files module that was renamed to source_files * Import missing module constants and fix order of arguments to prep_spectrogram_dataset in src/vak/prep/frame_classification/get_or_make_source_files.py * Change 3 configs to have spect_format option set to npz * Remove import of module spect in tests/scripts/vaktestdata/__init__.py * Flesh out function in tests/scripts/vaktestdata/source_files.py * Add log statements in tests/scripts/generate_data_for_tests.py * Fix typo in src/vak/prep/frame_classification/get_or_make_source_files.py * Add SPECT_FORMAT_EXT_MAP to src/vak/common/constants.py * Use vak.commonconstants.SPECT_FORMAT_EXT_MAP in src/vak/prep/spectrogram_dataset/prep.py so that we correctly remove source file extension to pair with annotation file * Fix attributes of ConfigMetadata so we don't convert None to 'None' * Copy annotation files to spect_output_dir so we can prep from that dir, in tests/scripts/vaktestdata/source_files.py * Change name of logger in tests/scripts/generate_data_for_tests.py * Fix attributes in ConfigMetadata so we don't convert strings to bool * Remove fixtures from tests/fixtures/annot.py after removing corresponding source data * Fix import in src/vak/prep/frame_classification/__init__.py * Fix import in src/vak/prep/frame_classification/frame_classification.py * Add tests/fixtures/source_files with fixtures to get csv files * Add fixtures that return dataframes directly in tests/fixtures/source_files.py * Add tests/test_prep/test_frame_classification/test_get_or_make_source_files.py * Add tests/test_prep/test_frame_classification/test_assign_samples_to_splits.py * Fix factory functions in tests/fixtures/source_files.py * Fix assembled path in tests/fixtures/source_files.py * Fix unit test in tests/test_prep/test_frame_classification/test_make_splits.py to use fixture so it's faster and less verbose * Remove fixtures that no longer exist from specific_annot_list fixture in tests/fixtures/annot.py * Remove fixtures for data that doesn't exist in tests/fixtures/audio.py * Remove birdsong-rec from parametrize in tests/test_cli/test_predict.py * Remove birdsongrec from parametrize in tests/test_cli/test_prep.py * Remove birdsongrec from parametrize in tests/test_cli/test_train.py * Remove birdsongrec and other data no longer in source from parametrizes in tests/test_common/test_annotation.py * Remove birdsongrec from parametrize in tests/test_predict/test_frame_classification.py * Remove birdsongrec from parametrize in tests/test_prep/test_frame_classification/test_frame_classification.py * Remove birdsongrec from parametrize in tests/test_prep/test_prep.py * Remove birdsongrec from parametrize in tests/test_prep/test_sequence_dataset.py * Remove birdsongrec from parametrize in tests/test_train/test_frame_classification.py * Remove birdsongrec from parametrize in tests/test_train/test_train.py * Remove unit tests from tests/test_common/test_files/test_files.py that test on data removed from source data * Remove parametrize that uses wav/textgrid data removed from source data * Fix fixture in tests/fixtures/spect.py * Actually write unit tests in tests/test_datasets/test_frame_classification/test_helper.py * Fix prep.frame_classification.make_splits to not convert frame labels npy paths to 'None' when they are None * Fix assert helper in tests/test_prep/test_frame_classification/test_frame_classification.py * Remove spect_key and audio_format parameters from functions in src/vak/prep/frame_classification/learncurve.py, no longer used * Change order of params for make_subsets_from_dataset_df * Change order of args in call to make_subsets_from_dataset_df inside prep_fram_classification_dataset * Rename some variables to 'subset_df' in src/vak/prep/frame_classification/learncurve.py and revise docstrings * Finish adding/fixing unit tests in tests/test_prep/test_frame_classification/test_learncurve.py * Fix bug in unit test in tests/test_prep/test_frame_classification/test_make_splits.py * Fix unit tests in tests/test_prep/test_spectrogram_dataset/test_prep.py * Fix unit test in tests/test_prep/test_spectrogram_dataset/test_spect_helper.py * Fix unit test in tests/test_transforms/test_transforms.py * Use torch.testing.assert_close instead of assert_allclose in tests/test_nn/test_loss/test_dice.py
- Loading branch information