Skip to content

Releases: google-deepmind/clrs

v2.0.1 CLRS-Text Algorithmic Reasoning Language Benchmark

18 Jul 19:03
Compare
Choose a tag to compare

Main Changes

Added the CLRS-Text Algorithmic Reasoning Language Benchmark to the codebase

CLRS-Text is a textual version of the traces generated by thirty algorithms selected from the third edition of the standard "Introduction to Algorithms" textbook by Cormen, Leiserson, Rivest, and Stein. It serves to consolidate and unify previous lines of research in this direction and offers a robust test bed for evaluating language models' out-of-distribution reasoning capabilities.

CLRS-Text is Hugging Face compatible and can generate data in many formats, such as JSON.

For more details, refer to the [README.md].

Base CLRS script update
Revamped the base script incorporating modeling and data improvements from the A Generalist Neural Algorithmic Learner paper.

What's Changed

  • Add ml_collections to requirements.txt. by @copybara-service in #139
  • Added script to generate json for all algorithms. by @copybara-service in #140
  • Adding huggingface generators for clrs text by @mcleish7 in #135
  • Add clrs_text to init.py by @copybara-service in #144
  • Roll forward PR #104 by @RerRayne in #146
  • Add CLRS-Text details in the README files. by @copybara-service in #148

New Contributors

Full Changelog: v1.0.0...v2.0.0

CLRS 1.0.0

01 Jun 16:07
Compare
Choose a tag to compare

Main changes

  • Extended the benchmark from 21 to 30 tasks by adding the following:
    • Activity selection (Gavril, 1972)
    • Longest common subsequence
    • Articulation points
    • Bridges
    • Kosaraju's strongly connected components algorithm (Aho et al., 1974)
    • Kruskal's minimum spanning tree algorithm (Kruskal, 1956)
    • Segment intersection
    • Graham scan convex hull algorithm (Graham, 1972)
    • Jarvis' march convex hull algorithm (Jarvis, 1973)
  • Added new baseline processors:
    • Deep Sets (Zaheer et al., NIPS 2017) and Pointer Graph Networks (Veličković et al., NeurIPS 2020) as particularisations of the existing Message-Passing Neural Network processor.
    • End-to-End Memory Networks (Sukhbaatar et al., NIPS 2015)
    • Graph Attention Networks v2 (Brody et al., ICLR 2022)

Detailed changes

  • Add PyPI installation instructions. by @copybara-service in #6
  • Fix README typo. by @copybara-service in #7
  • Expose Sampler base class in public API. by @copybara-service in #8
  • Add dataset reader. by @copybara-service in #12
  • Patch imbalanced samplers for DFS-based algorithms. by @copybara-service in #15
  • Disk-based samplers for convex hull algorithms. by @copybara-service in #16
  • Avoid dividing by zero in F_1 score computaton. by @copybara-service in #18
  • Sparsify the graphs generated for Kruskal. by @copybara-service in #20
  • Option to add an lstm after the processor. by @copybara-service in #19
  • Include dataset class and creation using tensorflow_datasets format. by @copybara-service in #23
  • Change types of DataPoint and DataPoint members. by @copybara-service in #22
  • Remove unnecessary data loading procedures. by @copybara-service in #24
  • Modify example to run with the tf.data.Datasets dataset. by @copybara-service in #25
  • Expose processors in CLRS by @copybara-service in #21
  • Update CLRS-21 to CLRS-30. by @copybara-service in #26
  • Update README with new algorithms. by @copybara-service in #27
  • Add dropout to example. by @copybara-service in #28
  • Make example download dataset. by @copybara-service in #30
  • Force full dataset pipeline to be on the CPU. by @copybara-service in #31
  • Set default dropout to 0.0 for now. by @copybara-service in #32
  • Added support for GATv2 and masked GATs. by @copybara-service in #33
  • Pad memory in MemNets and disable embeddings. by @copybara-service in #34
  • baselines.py refactoring (2/N) by @copybara-service in #36
  • baselines.py refactoring (3/N). by @copybara-service in #38
  • Update readme. by @copybara-service in #37
  • Generate more samples in tasks where the number of signals is small. by @copybara-service in #40
  • Fix MemNet embeddings by @copybara-service in #41
  • Supporting multiple attention heads in GAT and GATv2. by @copybara-service in #42
  • Use GATv2 + add option to use different number of heads. by @copybara-service in #43
  • Fix GAT processors. by @copybara-service in #44
  • Fix samplers_test by @copybara-service in #47
  • Update requirements.txt by @copybara-service in #45
  • Bug in hint loss for CATEGORICAL type. The number of unmasked datapoints (jnp.sum(unmasked_data)) was computed over the whole time sequence instead of the pertinent time slice. by @copybara-service in #53
  • Use internal rng for batch selection. Makes batch sampling deterministic given seed. by @copybara-service in #49
  • baselines.py refactoring (6/N) by @copybara-service in #52
  • Time-chunked datasets. by @copybara-service in #48
  • Potential bug in edge diff decoding. by @copybara-service in #54
  • Losses for chunked data. by @copybara-service in #55
  • Changes to hint losses, mostly for decode_diffs=True. Before, only one of the terms of the MASK type loss was masked by gt_diff. Also, the loss was averaged over all time steps, including steps without diffs and therefore contributing 0 to the loss. Now we average only over the non-zero-diff steps. by @copybara-service in #57
  • Adapt baseline model to process multiple algorithms with a single processor. by @copybara-service in #59
  • Explicitly denote a hint learning mode, to delimit the tasks of interest to CLRS. by @copybara-service in #60
  • Give names to encoder and decoder params. This facilitates analysis, especially in multi-algorithm training. by @copybara-service in #63
  • Symmetrise the weights of sampled weighted undirected Erdos-Renyi graphs. by @copybara-service in #62
  • Fix dataset size for augmented validation + test sets. by @copybara-service in #65
  • Bug when hint mode is 'none': the multi-algorithm version needs something in the list diff decoders. by @copybara-service in #66
  • Change requirements to a fixed tensorflow datasets nightly build. by @copybara-service in #68
  • Patch KMP algorithm to incorporate the "reset" node. by @copybara-service in #69
  • Allow for multiple-batch evaluation in example run script. by @copybara-service in #70
  • Bug in SearchSampler: arrays should be sorted. by @copybara-service in #71
  • Record separate hint eval scores for analysis. by @copybara-service in #72
  • Symmetrised edges for PGN. by @copybara-service in #73
  • Option for noise in teacher forcing by @copybara-service in #74
  • Regularize PGN_MASK losses by predicting min_value-1 at missing edges instead of -10^5 by @copybara-service in #75
  • Make encoded_decoded_nodiff default mode, and add flag to control teacher forcing noise. by @copybara-service in #76
  • Detailed evaluation of hints in verbose mode. by @copybara-service in #79
  • Pass processor factory instead of processor string when creating model. This makes it easier to add new processors as processor parameters don't need to be passed down to model and net. by @copybara-service in #81
  • Update README. by @copybara-service in #82
  • Use large negative number instead of 0 to discard non-connected edges for max aggregation in PGN processor. by @copybara-service in #83
  • Add tensorflow requirement. by @copybara-service in #84
  • Change deprecated tree_multimap to tree_map by @copybara-service in #85
  • Increase version number for PyPI release. by @copybara-service in #87

Full Changelog: v0.0.2...v1.0.0

CLRS 0.0.2

26 Aug 17:40
Compare
Choose a tag to compare

The CLRS Algorithmic Reasoning Benchmark.

CLRS 0.0.1

26 Aug 17:24
Compare
Choose a tag to compare

Initial release of CLRS Algorithmic Reasoning Benchmark.