Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huffman container #20

Merged
merged 5 commits into from
Jun 14, 2024
Merged

Huffman container #20

merged 5 commits into from
Jun 14, 2024

Conversation

antiguru
Copy link
Owner

@antiguru antiguru commented Feb 27, 2024

Implement a Huffman-based region that encodes its data based on previously collected frequencies.

This is an import of the Huffman container from Differential Dataflow, with minor modifications to adapt it to this crate. It's not fully integrated yet as it hard-codes the container for encoded slices and unencoded data, which should be region-allocated themselves. This is possible, but comes with a long list of trait bounds. I'll post a comment below that captures this.

h/t @frankmcsherry for the initial implementation.

@antiguru antiguru changed the title Import Huffman container from Differential Huffman container Feb 27, 2024
@antiguru
Copy link
Owner Author

Capturing a work-in-progress diff to turn the underlying storage in the huffman container into regions. Note that this does not yet compile because the trait bound aren't complete. It's unfortunate that the iterator type needs to leak into the trait bounds to enable the CopyIter(..): CopyOnto<RR> bound.

diff --git a/src/impls/huffman.rs b/src/impls/huffman.rs
index ff369d3..870f2a8 100644
--- a/src/impls/huffman.rs
+++ b/src/impls/huffman.rs
@@ -9,16 +9,22 @@ use self::huffman::Huffman;
 use self::wrapper::Wrapped;
 
 /// A container that contains slices `[B]` as items.
-pub struct HuffmanContainer<B: Ord + Clone> {
+pub struct HuffmanContainer<B: Ord + Clone, BR, RR>
+where
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
+{
     /// Either encoded data or raw data.
-    inner: Result<(Huffman<B>, Vec<u8>), Vec<B>>,
+    inner: Result<(Huffman<B>, BR), RR>,
     /// Counts of the number of each pattern we've seen.
     stats: BTreeMap<B, i64>,
 }
 
-impl<B> HuffmanContainer<B>
+impl<B, BR, RR> HuffmanContainer<B, BR, RR>
 where
     B: Ord + Clone,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
     /// Prints statistics about encoded containers.
     pub fn print(&self) {
@@ -32,9 +38,25 @@ where
     }
 }
 
-impl<B> Region for HuffmanContainer<B>
+impl<B, BR, RR> Default for HuffmanContainer<B, BR, RR>
+where
+    B: Ord + Clone,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
+{
+    fn default() -> Self {
+        Self {
+            inner: Err(Vec::new()),
+            stats: Default::default(),
+        }
+    }
+}
+
+impl<B, BR, RR> Region for HuffmanContainer<B, BR, RR>
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
     type ReadItem<'a> = Wrapped<'a, B>;
 
@@ -91,11 +113,13 @@ where
     }
 }
 
-impl<B> CopyOnto<HuffmanContainer<B>> for &[B]
+impl<B, BR, RR> CopyOnto<HuffmanContainer<B, BR, RR>> for &[B]
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         for x in self.iter() {
             *target.stats.entry(x.clone()).or_insert(0) += 1;
         }
@@ -114,47 +138,59 @@ where
     }
 }
 
-impl<B, const N: usize> CopyOnto<HuffmanContainer<B>> for [B; N]
+impl<B, BR, RR, const N: usize> CopyOnto<HuffmanContainer<B, BR, RR>> for [B; N]
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         self.as_slice().copy_onto(target)
     }
 }
 
-impl<B, const N: usize> CopyOnto<HuffmanContainer<B>> for &[B; N]
+impl<B, BR, RR, const N: usize> CopyOnto<HuffmanContainer<B, BR, RR>> for &[B; N]
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         self.as_slice().copy_onto(target)
     }
 }
 
-impl<B> CopyOnto<HuffmanContainer<B>> for Vec<B>
+impl<B, BR, RR> CopyOnto<HuffmanContainer<B, BR, RR>> for Vec<B>
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         self.as_slice().copy_onto(target)
     }
 }
 
-impl<B> CopyOnto<HuffmanContainer<B>> for &Vec<B>
+impl<B, BR, RR> CopyOnto<HuffmanContainer<B, BR, RR>> for &Vec<B>
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         self.as_slice().copy_onto(target)
     }
 }
 
-impl<'a, B> CopyOnto<HuffmanContainer<B>> for Wrapped<'a, B>
+impl<B, BR, RR> CopyOnto<HuffmanContainer<B, BR, RR>> for Wrapped<'_, B>
 where
     B: Ord + Clone + Sized + 'static,
+    for<'a> BR: Region<ReadItem<'a>=&'a [u8], Index=(usize, usize)> + 'a,
+    for<'a> CopyIter(super::huffman::Encode<'a, B, std::slice::Iter<'a, B>>): CopyOnto<BR>,
+    for<'a> RR: Region<Index=(usize, usize)> + 'a,
+    &[B]: CopyOnto<RR>,
 {
-    fn copy_onto(self, target: &mut HuffmanContainer<B>) -> (usize, usize) {
+    fn copy_onto(self, target: &mut HuffmanContainer<B, BR, RR>) -> (usize, usize) {
         match self.decode() {
             Ok(decoded) => {
                 for x in decoded {
@@ -179,28 +215,15 @@ where
                 (start, raw.len())
             }
             (Err(symbols), Ok((huffman, bytes))) => {
-                let start = bytes.len();
-                bytes.extend(huffman.encode(symbols.iter()));
-                (start, bytes.len())
+                CopyIter(huffman.encode(symbols.iter())).copy_onto(bytes)
             }
             (Err(symbols), Err(raw)) => {
-                let start = raw.len();
-                raw.extend(symbols.iter().cloned());
-                (start, raw.len())
+                symbols.copy_onto(raw)
             }
         }
     }
 }
 
-impl<B: Ord + Clone> Default for HuffmanContainer<B> {
-    fn default() -> Self {
-        Self {
-            inner: Err(Vec::new()),
-            stats: Default::default(),
-        }
-    }
-}
-
 mod wrapper {
     use std::fmt::Debug;
 
@@ -592,13 +615,13 @@ mod huffman {
 
 #[cfg(test)]
 mod tests {
-    use crate::{CopyOnto, Region};
+    use crate::{CopyOnto, Region, CopyRegion};
 
     use super::*;
 
     #[test]
     fn test_huffman() {
-        let copy = |r: &mut HuffmanContainer<u8>, item: [u8; 3]| {
+        let copy = |r: &mut HuffmanContainer<u8, CopyRegion<u8>, CopyRegion<u8>>, item: [u8; 3]| {
             let index = item.copy_onto(r);
             assert_eq!(Wrapped::decoded(item.as_slice()), r.index(index));
         };

antiguru added 2 commits June 7, 2024 16:36
Add a bare-bones Region and CopyOnto implementation.

h/t @frankmcsherry for the original implementation.

Signed-off-by: Moritz Hoffmann <[email protected]>
src/impls/huffman_container.rs Show resolved Hide resolved
src/impls/huffman_container.rs Outdated Show resolved Hide resolved
@antiguru antiguru marked this pull request as ready for review June 14, 2024 19:07
@antiguru antiguru merged commit 889b7a4 into main Jun 14, 2024
8 checks passed
@antiguru antiguru deleted the huffman branch June 14, 2024 19:13
@github-actions github-actions bot mentioned this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants