Releases: DCMLab/dimcat
Releases · DCMLab/dimcat
v3.3.0
3.3.0 (2024-11-18)
Features
- make_histogram() and make_violin_plot() (ec89ce2)
Bug Fixes
- avoids duplicate column when applying stage criterion (66aa952)
- check column directly for the time being (fields are not yet updated automatically) (aa3376c)
- correct inner range index even if its start positions overlap those of the outer start mask (eb45b17)
- stop Dataset.iter_features() after iterating through all (6035174)
- workaround for avoiding PermissionError on Windows when using tempfile (e5f3823)
Documentation
- adds developer's guide notebook (4fcd554)
- Dataset (7717f16)
- Dataset for developers (b52692d)
- fixes table in CONTRIBUTING.rst (813da3f)
- more work on the developers docs (fed676b)
- moves architecture primer to developers guide (19c7373)
- removes pip install (pre-commit is part of the [docs] optional requirements) (ba2258f)
- several docstrings (a735464)
v3.2.0
3.2.0 (2024-01-30)
Highlights
- new property
DimcatResource.metadata
, populated byDataset
- New
PrevalenceAnalyzer
and its result types:- PrevalenceMatrix
- RelativePrevalenceMatrix
- CulledPrevalenceMatrix
- CulledRelativePrevalenceMatrix
- GroupwisePrevalenceMatrix
- additional plot types
- make_line_plot
- make_scatter_3d_plot
Features
- adds basis of PrevalenceAnalyzer and PrevalenceMatrix(Result) (1f5469f)
- adds DimcatResource.join_on_index() (269c12c)
- adds methods .get_culled_matrix() and .get_relative_matrix() to PrevalenceMatrix, together with the relevant subclasses as result types (523418d)
- adds plotting.make_line_plot(), factoring out _make_plots() boilerplate that all plotting function share (a1ba183)
- adds plotting.make_scatter_3d_plot() (ec13ba8)
- adds property DimcatResource.metadata which the Dataset populates upon feature extraction but is not serialized (c5e2eee)
- adds relevant properties and methods to PrevalenceMatrix (328fff6)
- adds utils.str2pd_interval (da24cb9)
- allows friendly comparison for FriendlyEnums, such as "desc" == SortOrder.DESCENDING -> True (03d57f6)
- implements PrevalenceAnalyzer.compute() staticmethod, .init(), .groupby_apply() and Schema (25d558b)
- implements PrevalenceMatrix.combine_results() (35fbb39)
Bug Fixes
- catches pandas 2.2.0 warning(s) (88e86bd)
- enables DimcatResource.from_resource_path() by expecting a "corpus" and a "piece" column (6c70e61)
- import Self from typing_extensions (not typing) to maintain Python 3.10 compatibility (a104358)
- infer_schema_from_df() now can deal with column MultiIndex that involves integer values (a43fdf7)
- plotting functions allow for a single string as argument to 'hover_data' (57230df)
Documentation
- docstring for DimcatConfig (fe35049)
v3.1.0
3.1.0 (2024-01-16)
Features
- adds analyzers.PhraseDataAnalyzer() which takes features.PhraseAnnotations and produces results.PhraseData (a4a7dd5)
- adds basic HarmonyLabelSlicer (a9b48a8)
- adds convenience module
dimcat.enums
for easily importing any enum from DiMCAT. (f626f90) - adds DimcatResource.store_resource() (57a12f5)
- adds helper functions to resources.utils (b6f5cd2)
- adds Metadata.get_corpus_names() to retrieve the names in chronological order (a5c3988)
- adds methods .get_steps() and .get_last_step() to Pipeline and to Dataset (61af067)
- adds plotting.make_box_plot() (313fe12)
- adds resources.utils.transpose_notes_to_c() (7f37551)
- adds SmallestUnit.CORPUS_GROUP member for completeness and streamlines .get_grouping_levels() methods (2de9524)
- enhances DimcatConfig.meatches() with 'variant' and 'covariant' arguments; adds base.make_config_from_specs() to mirror base.make_object_from_specs() (f097ed5)
- enhances utility functions and adds Resources.get_resource_name() (9c59c6b)
- first version of make_phrase_selection_masks() (72f03fd)
Bug Fixes
- adds 'ignore_exceptions' argument to Dataset.extract_feature() which defaults to True (remedy for prviously unprocessed features added to the Dataset in the case of exceptions) (28e91fd)
Documentation
- updates notebooks submodule (12b4818)
v3.0.0
3.0.0 (2023-12-13)
⚠ BREAKING CHANGES
- eliminates .apply_steps() in favour of a single .apply_step(*step), that is, with variadic argument. For backward compatibility, the method still accepts a single list or tuple
Features
- adds four additional columns to HarmonyLabels and BassNotes which contain the (main) chord tones expressed as scale degrees (396dce9)
- adds Result.compute_entropy() and Transitions.compute_information_gain() (c1257a8)
- AdjacencyGroupSlicers now process the required_feature during .fit_to_dataset(), store it as property .slice_metadata and join it onto any processed Metadata object. In the future, there could also be a mode where this metadata is joined onto any processed feature. (cea586e)
- empowers NgramTable to make_bi/ngram_tables and NgramTuples with components made up from different columns and with individual join_str and fillna settings (c8488cf)
- enables adding context_columns for the NgramTable's methods .get_bi/ngram_tuples() and get_bi/ngram_table(). The NgramAnalyzer therefore adds the relevant column names in post-processing. (fe7ee3a)
- enables applying Slicers to Metadata by joining them on the SliceIntervals (DimcatIndex) (a4c3929)
- enables dropping ngram rows which include/correspond to terminals (89a2552)
- enables the detailed control of terminals which may differ for different n-gram components (except the first one). (f6a807f)
- HarmonyLabels and BassNotes features now come with an intervals_over_bass and (for the former) with an intervals_over_root column (be8d06d)
- includes "root" as auxiliary column for BassNotes (3f7bd35)
- makes the 'data' argument to PipelineStep.process() a variadic one, too (concordant with .apply_step()), while still accepting a single argument that can be a list or tuple (7a37aaa)
- Metadata.get_composition_years() now with 'group_cols' parameter to compute composition year means of groups (e.g. corpora) (fef9860)
- methods .make_ngram_table() and .make_bigram_table() of NgramTable now actually return a new NgramTable, whereas the previous functions of that name (which returned dataframes) have been renamed to .make_bigram_df() and .make_ngram_df(). (8dbff20)
- NgramTable gets the convenience method .compute_information_gain() to skip an intermediate call to .get_transitions() (5b37414)
- NgramTable._get_transitions() is cached and now complete with the terminal_symbols argument (bd12568)
- reduces the amount of parentheses in n-grams by not turning 'single' components (with only one column) into tuples (02f91d4)
- streamlines turning n-grams into strings and allows for doing it recursively (useful when columns making up n-gram components contain tuples themselves) (745df2e)
Bug Fixes
- adapts scipy.stats.entropy() to fix bug caused by pd.Float64Dtype (4938170)
- allow DimcatResource.filter_index_level() to just drop the level without filtering rows (5c07d97)
- applying a Grouper needs to be an inner join. Also, the index levels should come in systematic order, first the grouper levels, then the remaining ones (8f80fc2)
- enables (de-)serialization for Filter objects (976c179)
- fills up missing 'quarterbeats_all_endings' column for older parts of the dataset (390e0a5)
- Groupers that use metadata now should use Dataset.get_metadata(raw=True) (5d35b20)
- grouping by a single level that contains tuples resulted in several levels in the resulting MultiIndex; this fix applied for completeness before the whole function is simplified (0ed6091)
- NgramTable.get_default_analysis() returns Transitions (b90f0ae)
- omit duplicate computation of 'proportions' by Transitions._sort_combined_result() (2f09bbb)
- raise NotImplementedError when trying to use convenience methods directly on Transitions object (7cf61c4)
- re-inserts missing import (02bd96c)
- singular ngram_components should also become strings (even if they are not joined on 'join_str') (3987162)
- when an index level is dropped, make sure to remove it from the default_groupby (260f8f1)
- when applying a Filter with drop_level=True, do not turn a Dataset into a GroupedDataset (as per virtue of the respective parent Grouper) (937002c)
- when Counter is used with smallest_unit=GROUP, it recurs to self.compute() (737b6a6)
Reverts
- eliminates .apply_steps() in favour of a single .apply_step(*step), that is, with variadic argument. For backward compatibility, the method still accepts a single list or tuple (fab8e13)
v2.3.0
2.3.0 (2023-12-09)
Features
- all schemas retrieved via the .schema or .pickled_schema property allow for loading dicts without 'dtype' key by assuming their own dtype as default (9ff060e)
- new category of objects: Filters. They extend any Grouper by adding the init args 'keep_values', 'drop_values', and 'drop_level' to it. They use these arguments to post-process any resource first processed by the corresponding grouper. This required renaming the relatively new HasCadenceAnnotations and HasHarmonyLabels to HasCadenceAnnotationsGrouper and HasHarmonyLabelsGrouper, to differentiate them from the new HasCadenceAnnotationsFilter and HasHarmonyLabelsFilter. The other two filters that have been implemented so far are the CorpusFilter and the PieceFilter. As an aside, Groupers do not complain anymore when they are applied to a resource that has already been grouped by a Grouper of the same type. If the grouping level exists but isn't the first one, it is systematically made the first one. This applies, by extension, to the Filters (for now) (ec3d1f7)
Bug Fixes
- adapts NgramAnalyzer's init args & schema (3e51f97)
- align_with_grouping() did not work for NgramTables because pandas prevents merge with diverging column nlevels, even if one of the sides has no columns (e51625f)
- allows passing a list of list (instead of a list of tuples) to DimcatIndex.from_tuples(), useful for de-serializing from JSON (0cff3c1)
- extends app_tests.test_analyze() to the actual plotting; warns about non-Analyzer PipelineSteps applied after an Analyzer (72ef210)
- facet titles be strings (ed185f7)
- improves (de-)serialization of DimcatIndex and, by extension, the MappingGroupers' 'grouped_units' field (f59673d)
- parses music21.key.KeySignature the same way as usic21.key.Key (5aa7902)
- the frictionless workaround for copying a resource with no path specified is now complete (5d1426d)
- the frictionless workaround for copying a resource with no path specified is now complete (98ee01d)
v2.2.0
2.2.0 (2023-12-07)
Features
- adds HasHarmonyLabels grouper (4fa92de)
- enables .get_feature("metadata") for Dataset and DimcatPackage which, in return, enables Dataset.get_metadata(raw=False) (default), i.e. returning a processed Metadata feature (old behaviour, i.e. without processing, via Dataset.get_metadata(raw=True)) (731c4d1)
Bug Fixes
v2.1.0
2.1.0 (2023-12-07)
Features
- adds 'dimension_column' as argument for all Analyzers; enables default_analyzer for Metadata (d192a07)
- adds convenience module
dimcat.enums
for easily importing any enum from DiMCAT. (7cd3a3f) - adds PieceGrouper (55c6d54)
- enable .make_ranking_table() for NgramTable (convenience for calling .make_ngram_tuples() first) (5a788d5)
- enables group_cols and group_modes for bubble_plots, too (0d8ba17)
- includes the UnitOfAnalysis enum as 'group_cols' argument for Result's methods (efd7fdb)
- introduces new HarmonyLabelsFormat "ROMAN_REDUCED" (0a08952)
- NgramTable.get_transitions() returns new result type Transitions (0723fe1)
- NgramTable.make_ngram_tuples() now actually returns tuples, not tables (which are retrieved via .make_ngram_table()). They come as a new Result type, NgramTuples, which also allows for .make_ranking_table() (3746437)
- NgramTable() uses the new Transitions for both .plot() and .plot_grouped() (b12b130)
- PieceGrouper and CorpusGrouper move the respective index level to level 0 (1b7f436)
- Transitions result type plots methods return Plotly heatmaps (3df7880)
Bug Fixes
- base.resolve_object_spec() needs to check if config first, then if DimcatObject (8c8c4d2)
- do not convert "count" column to "Int64" by default (because of Plotly bug); instead convert integer columns when making ranking tables to prevent counts coming as floats (59bd92a)
- Pipeline calls step.process_resource() instead of ._process_resource() because otherwise the call to .check_resource() is skipped (024bf65)
Documentation
v2.0.0
2.0.0 (2023-11-27)
⚠ BREAKING CHANGES
- the Enums HarmonyLabelsFormat and NotesFormat are losing the formats that are currently not implemented
- renaming of arguments and properties: grouped_pieces => grouped_units; piece_groups => grouping. This enables future subclasses of MappingGrouper able to flexibly group both pieces and slices, depending on the 'smallest_unit'
Features
- adds CadenceFormat, CadenceCounter() analyzer and CadenceCounts() result type where .plot_grouper() defaults to .make_pie_chart() (2d56ce1)
- adds CriterionGrouper() and its first subclass HasCadenceAnnotations() (94290f2)
- adds method DimcatIndex.filter() (ff8deba)
- adds new feature CadenceLabels and lets all Feature._format_dataframe() end on the new ._sort_columns() (5873550)
- adds property Dataset.extractable_features (961532d)
- adds Result.make_ranking_table() and allows Results.combine_results() to re-combine if the groups are a subset of the columns (4d0f313)
- adds two new methods to Package, .get_resource() and .replace_resource() (8f74b10)
- both Dataset and DimcatResource now have a method .apply_steps() that creates a Pipeline and applies it, and .apply_step() that creates a step directly, without turning it into a Pipeline first. DimcatResources no get methods .make_bar_plot(), .make_bubble_plot(), and .make_pie_chart() which follow the same principle as .plot() and .plot_grouped(): Apply the specified PipelineSteps or the default Analyzer and call the respective method on the result. (eef8e1b)
- enables Result.make_pie_chart() (e156d45)
- enables two new BassNotesFormat values: SCALE_DEGREE_MAJOR and SCALE_DEGREE_MINOR (d52b8b0)
- factors out MappingGrouper() from CustomPieceGrouper() (4a1ebf5)
- feature pass their format arg to super().init(); removes unused format values (to be extended later) and properly integrates NotesFormat and HarmonyLabelsFormat. Also, the Notes feature does not ignore arguments 'merge_ties' and 'weight_grace_notes' anymore but acts on them. (d3daaef)
- introduces DimcatResource.from_resource_and_dataframe() to copy properties of existing resource but detach the new resource and set a new dataframe (usually a transformation of the previous one) (4ad7a6b)
- introduces font_size argument to all plotting functions for convenience [saves one to type, e.g., layout=dict(font=dict(size=30)) ] (ecb7817)
- makes all enum values specified in DimcatConfigs case-insensitive by subclassing Marshmallow's enum field and having 'by_value' default to True. Likewise, get_class() now accepts dtypes in case-insensitive manner (45cb9d0)
- moves CADENCE_COLORS to dimcat.plotting (cce4f38)
- Result.combine_results() now returns a new Result object. The creationg of the combined dataframe has been moved to ._combine_results(), which is used by the plotting methods (8251b7a)
Bug Fixes
- .plot_grouped() shows bar plot when no grouper has been applied (and no grouping level has been requested) or a bubble plot in all other cases. This includes adding the arguments df, x_col, and y_col to Result.make_bubble_plot(). (b37d8fd)
- adapts mwd notebook to be showing grouped plots (b438d33)
- adds missing column names (4252ad9)
- allows any FeatureSpecs for the 'features' argument of the FeatureProcessingStep (i.e., in its Marshmallow schema) (59a5c71)
- avoids duplicating convenience columns (d91a0d5)
- colorlover is not an optional dependency anymore (f8b93a0)
- DcmlAnnotations was missing "chord" column (669be4b)
- DimcatResource._extract_feature() makes use of all config options (2eda6bf)
- includes DimcatResource._drop_rows_with_missing_values() (2569c97)
- pass x_col and y_col to plotting.update_figure_layout() and set any axis called 'piece' to type "categorical" to avoid automatic conversion to dates (a54b66e)
- removes column name in faceted plots (68bd7a7)
- replaces segmenting approach of CadenceLabels() (incl. label-to-label durations) with bare occurrences (c326bf2)
- ResourceTransformation() should not copy the resource's "descriptor_filename" because an existing descriptor might not apply anymore (4982b4d)
- superclass FeatureProcessingStep should not reject any types if _allowed_features is None (26a506e)
- type annotations for .get_metadata() (ce1bdfb)
- when copying or transforming, get kwargs from existing DimcatResource and pass them to the respective constructor (e1c0cda)
Documentation
v1.1.0
1.1.0 (2023-11-25)
Features
- adds ClassVar DimcatResource._default_formatted_column and property DimcatResource.formatted_column to allow producing Results (and plots) with both the original and the formatted values. The properties formatted_column and value_column cannot be set directly (anymore). The former is to be controlled by the parameter 'format', which all Features now accept and serialize, too. Whereas 'value_column' will probably remain immutable. (fc64c36)
- eliminates results.LineOfFifthsDistribution() by merging the line-of-fifths plotting functionality into Result(), which decides based on the analyzed_resource's 'format' property whether it '.uses_line_of_fifths_colors' or not and, whenever necessary, removes GroupMode.COLOR from the 'group_modes'. This also gets rid of the special result types PitchClassDurations() and ScaleDegreeDurations() and of the special analyzer proportions.ScaleDegreeVectors() (5390ae7)
- homogenizes code between dimcat.plotting.make_lof_bubble_plot() and make_lof_bar_plot(), making the latter accept an 'x_names_col' argument, too (ce8ae53)
- introduces TypeVar R for DimcatResource (14e0615)
- Results are now initialized (and serialized) with parameters value_column, dimension_column (required) and formatted_column (optional). This allows, for example, for using the values to organize markers along the x-axis (e.g. numerically) while formatted_column may determine how the values are displayed. The dimension column comes from the new ClassVar Analyzer._dimension_column_name (136833b)
Bug Fixes
- extend functions that add convenience columns first check if they aren't already present (e.g. because the Feature is being created during a FeatureTransformation) (a3e53a4)
- tighter checks in Analyzer.check_resource() (fa6575b)
Documentation
- includes v1.0.0 retrospectively (479baee)
v1.0.1
1.0.1 (2023-11-24)
Bug Fixes
- has store_as_json_or_yaml() create target directories automatically (e5fcf4f)
- includes the key columns in BassNotes feature (864edc1)
Documentation
- configures please-release-action to use docs/CHANGELOG.md and converts the previous /CHANGELOG.rst (d9ef418)
- enables the inclusion of markdown documents in the documentation and enables MyST extensions (86c19a9)
- enables unittest_metacorpus submodule for RTD (15d3317)
- includes jupyter_sphinx Sphinx extension for rendering interactive Plotly figures (d363468)
- moves changelog, authors, and license under the top-level heading "Imprint" (7032162)
- updates docs requirements to the latest release of dimcat[docs] (dc04b76)