Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly high memory usage #9

Open
YarekTyshchenko opened this issue Aug 17, 2023 · 5 comments
Open

Unexpectedly high memory usage #9

YarekTyshchenko opened this issue Aug 17, 2023 · 5 comments

Comments

@YarekTyshchenko
Copy link

YarekTyshchenko commented Aug 17, 2023

Perhaps I'm missing some configuration options, but I would have expected Bloom filter to have static memory usage:

[MemoryDiagnoser]
public class Benchmark
{
    private static readonly int items = 30_000_000;
    [Benchmark]
    public void BloomFilter()
    {
        var bf = FilterBuilder.Build(1000, 0.01);
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name");
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.Contains($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }

    [Benchmark]
    public void Dictionary()
    {
        var bf = new Dictionary<string, bool>();
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name", true);
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.ContainsKey($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }
}
BenchmarkDotNet v0.13.7, macOS Big Sur 11.6.5 (20G527) [Darwin 20.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2


|      Method |     Mean |    Error |   StdDev |         Gen0 |        Gen1 |       Gen2 | Allocated |
|------------ |---------:|---------:|---------:|-------------:|------------:|-----------:|----------:|
| BloomFilter |  9.538 s | 0.0471 s | 0.0440 s | 3774000.0000 |           - |          - |  11.03 GB |
|  Dictionary | 17.162 s | 0.2386 s | 0.1993 s | 1008000.0000 | 131000.0000 | 11000.0000 |   6.37 GB |

// * Hints *
Outliers
  Benchmark.Dictionary: Default -> 2 outliers were removed (17.85 s, 18.53 s)

Tested with various different options, and Dictionary uses consistently less memory

@vla
Copy link
Owner

vla commented Sep 7, 2023

Allocated memory does not mean actual usage,bloom use BitArray as storage, it actually takes up less memory than Dict.

[MemoryDiagnoser]
public class Issues9
{
    public int DataSize = 3_000_000;

    private IList<byte[]> filterData;

    private IList<string> dictData;

    [GlobalSetup]
    public void Setup()
    {
        filterData = new List<byte[]>(DataSize);
        dictData = new List<string>(DataSize);

        for (var i = 0; i < DataSize; i++)
        {
            filterData.Add(Encoding.UTF8.GetBytes($"property_{i}_name"));
            dictData.Add($"property_{i}_name");
        }
    }

    [Benchmark]
    public void BloomFilter()
    {
        var filter = FilterBuilder.Build(1000, 0.01);

        for (var i = 0; i < DataSize; i++)
        {
            filter.Add(filterData[i]);
        }

        for (var i = 0; i < DataSize; i++)
        {
            if (!filter.Contains(filterData[i]))
            {
            }
        }
    }

    [Benchmark]
    public void Dictionary()
    {
        var bf = new Dictionary<string, bool>();

        for (var i = 0; i < DataSize; i++)
        {
            bf.Add(dictData[i], true);
        }

        for (var i = 0; i < DataSize; i++)
        {
            if (!bf.ContainsKey(dictData[i]))
            {
            }
        }
    }
}

@YarekTyshchenko
Copy link
Author

Hmm, I still can't make sense of the results:

|      Method |     Mean |    Error |  StdDev |        Gen0 |      Gen1 |      Gen2 | Allocated |
|------------ |---------:|---------:|--------:|------------:|----------:|----------:|----------:|
| BloomFilter | 471.4 ms |  6.60 ms | 5.85 ms | 153000.0000 |         - |         - | 457.77 MB |
|  Dictionary | 814.9 ms | 10.60 ms | 9.91 ms |   1000.0000 | 1000.0000 | 1000.0000 | 309.41 MB |

@YarekTyshchenko
Copy link
Author

YarekTyshchenko commented Sep 21, 2023

Since bloom filter is a probabilistic structure, it should occupy a fraction of memory of the dict, unless the bf configuration has storage that's larger than all 3 million byte arrays.
/shrug, I'm sure I'm misunderstanding something here, I'm looking to use it as a way to reduce cache misses, so I'm happy with high error rate, I'm also looking for a nicely bounded memory usage scaling to basically infinity of elements

Is there any other way of measuring total size of the dict and bf? (short of making a test application and seeing its actual memory usage). I appreciate that Allocations are actually measuring the wrong thing.

@vla
Copy link
Owner

vla commented Sep 28, 2023

Write a console program, and do not close the program after the test is successful. Then, check how much memory the process currently occupies.

@vla
Copy link
Owner

vla commented Sep 28, 2023

Is there any other way of measuring total size of the dict and bf?

The size has been allocated during initialization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants