Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPLIB-1237 Implement Parallel Benchmarks LDJSON multi-file import #1166

Merged
merged 8 commits into from
Sep 20, 2023

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Sep 19, 2023

Fix PHPLIB-1237

Parallel Benchmarks specs: LDJSON multi-file import

Implementations:

  • Using Driver's BulkWrite in a single thread
  • Using library's Collection::insertMany in a single thread
  • Using multiple forked threads
  • Using amphp/parallel-functions with worker pool

To get the fastest result:

  • Reading files is done using stream_get_line
  • Document insertion is done using Driver's BulkInsert

@GromNaN GromNaN requested review from jmikola and alcaeus September 19, 2023 21:11
@GromNaN GromNaN self-assigned this Sep 19, 2023
Copy link
Member Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Multi-File import is implemented for now, with 3 stratedies:

  • Single process, no parallel
  • Forked processes using pcntl extension
  • Worker processes using AMP async framework

Results on my Mac:

\MongoDB\Benchmark\DriverBench\ParallelBench

    benchMultiFileImport....................I2 - Mo10.596s (±1.69%)
    benchMultiFileImportFork # 1 proc.......I2 - Mo12.458s (±0.29%)
    benchMultiFileImportFork # 2 proc.......I2 - Mo6.255s (±0.77%)
    benchMultiFileImportFork # 3 proc.......I2 - Mo3.915s (±4.47%)
    benchMultiFileImportFork # 4 proc.......I2 - Mo2.938s (±0.85%)
    benchMultiFileImportFork # 5 proc.......I2 - Mo2.368s (±1.87%)
    benchMultiFileImportFork # 7 proc.......I2 - Mo1.909s (±2.03%)
    benchMultiFileImportFork # 9 proc.......I2 - Mo1.683s (±7.78%)
    benchMultiFileImportFork # 12 proc......I2 - Mo1.763s (±6.02%)
    benchMultiFileImportFork # 15 proc......I2 - Mo1.806s (±6.66%)
    benchMultiFileImportFork # 19 proc......I2 - Mo1.667s (±9.02%)
    benchMultiFileImportFork # 24 proc......I2 - Mo1.827s (±4.88%)
    benchMultiFileImportFork # 30 proc......I2 - Mo1.474s (±3.26%)
    benchMultiFileImportAmp # 1 proc........I2 - Mo10.494s (±0.62%)
    benchMultiFileImportAmp # 2 proc........I2 - Mo4.849s (±4.03%)
    benchMultiFileImportAmp # 3 proc........I2 - Mo3.314s (±1.43%)
    benchMultiFileImportAmp # 4 proc........I2 - Mo2.504s (±1.19%)
    benchMultiFileImportAmp # 5 proc........I2 - Mo2.066s (±1.31%)
    benchMultiFileImportAmp # 7 proc........I2 - Mo1.667s (±1.84%)
    benchMultiFileImportAmp # 9 proc........I2 - Mo1.505s (±2.48%)
    benchMultiFileImportAmp # 12 proc.......I2 - Mo1.672s (±5.32%)
    benchMultiFileImportAmp # 15 proc.......I2 - Mo1.699s (±5.74%)
    benchMultiFileImportAmp # 19 proc.......I2 - Mo2.015s (±7.34%)
    benchMultiFileImportAmp # 24 proc.......I2 - Mo1.932s (±3.17%)
    benchMultiFileImportAmp # 30 proc.......I2 - Mo2.243s (±16.66%)

Results in CI

23 minutes to run the benchmark. I'll have to reduce the benchmarked cases.

subject set mem_peak mode rstdev
benchMultiFileImport   13.679mb 18.958s ± 0.38%
benchMultiFileImportFork 1 proc 1.221mb 22.659s ± 1.09%
benchMultiFileImportFork 2 proc 1.221mb 11.921s ± 0.35%
benchMultiFileImportFork 3 proc 1.221mb 12.206s ± 0.79%
benchMultiFileImportFork 4 proc 1.221mb 12.049s ± 0.06%
benchMultiFileImportFork 5 proc 1.221mb 12.149s ± 0.50%
benchMultiFileImportFork 7 proc 1.221mb 11.988s ± 1.32%
benchMultiFileImportFork 9 proc 1.221mb 12.171s ± 0.54%
benchMultiFileImportFork 12 proc 1.221mb 12.109s ± 0.53%
benchMultiFileImportFork 15 proc 1.221mb 12.256s ± 1.09%
benchMultiFileImportFork 19 proc 1.221mb 12.417s ± 0.83%
benchMultiFileImportFork 24 proc 1.221mb 12.916s ± 1.73%
benchMultiFileImportFork 30 proc 1.221mb 13.463s ± 0.90%
benchMultiFileImportAmp 1 proc 2.827mb 19.740s ± 0.63%
benchMultiFileImportAmp 2 proc 2.865mb 10.671s ± 2.21%
benchMultiFileImportAmp 3 proc 2.903mb 10.877s ± 1.17%
benchMultiFileImportAmp 4 proc 2.941mb 10.565s ± 0.38%
benchMultiFileImportAmp 5 proc 2.982mb 10.671s ± 0.97%
benchMultiFileImportAmp 7 proc 3.056mb 10.568s ± 0.42%
benchMultiFileImportAmp 9 proc 3.136mb 10.973s ± 0.96%
benchMultiFileImportAmp 12 proc 3.259mb 11.088s ± 0.76%
benchMultiFileImportAmp 15 proc 3.372mb 11.192s ± 0.41%
benchMultiFileImportAmp 19 proc 3.557mb 11.520s ± 0.33%
benchMultiFileImportAmp 24 proc 3.754mb 11.909s ± 0.30%
benchMultiFileImportAmp 30 proc 3.983mb 12.520s ± 0.74%

@@ -0,0 +1,24 @@
{
"name": "mongodb/mongodb-benchmark",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a proposal to define different dependencies for the benchmark: create a dedicated composer project. There is the PHP version and now the amphp dependency (which requires PHP 8).

The library mongodb/mongodb is installed from the parent path.

@GromNaN GromNaN marked this pull request as ready for review September 19, 2023 21:33
#[Revs(1)]
public function benchMultiFileImportAmp(array $params): void
{
wait(parallelMap(
Copy link
Member Author

@GromNaN GromNaN Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation uses a worker process that communicates with the parent process to receive serialized functions to calls and sends serialized results.

benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
Comment on lines 55 to 58
$fileContents = str_repeat(file_get_contents(Data::LDJSON_FILE_PATH), 5_000);
foreach (self::getFileNames() as $file) {
file_put_contents($file, $fileContents);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you prepare the files for each run instead of relying on the archive that contains a few hundred MB of the same data 👍

benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
composer.json Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
Copy link
Member Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review applied. Here is the new results on my machine.

I switched to using a driver bulk write to get best performances and remove intermediate PHP objects for command and other things. It's 8.5% faster.

benchMultiFileImportBulkWrite...........I0 - Mo9.959s (±0.00%)
benchMultiFileImportInsertMany..........I0 - Mo10.801s (±0.00%)
benchMultiFileImportFork # 1 proc.......I0 - Mo12.016s (±0.00%)
benchMultiFileImportFork # 2 proc.......I0 - Mo6.000s (±0.00%)
benchMultiFileImportFork # 4 proc.......I0 - Mo3.002s (±0.00%)
benchMultiFileImportFork # 8 proc.......I0 - Mo1.548s (±0.00%)
benchMultiFileImportFork # 13 proc......I0 - Mo1.504s (±0.00%)
benchMultiFileImportFork # 20 proc......I0 - Mo1.524s (±0.00%)
benchMultiFileImportFork # 34 proc......I0 - Mo1.469s (±0.00%)
benchMultiFileImportAmp # 1 proc........I0 - Mo10.034s (±0.00%)
benchMultiFileImportAmp # 2 proc........I0 - Mo5.013s (±0.00%)
benchMultiFileImportAmp # 4 proc........I0 - Mo2.637s (±0.00%)
benchMultiFileImportAmp # 8 proc........I0 - Mo1.705s (±0.00%)
benchMultiFileImportAmp # 13 proc.......I0 - Mo1.644s (±0.00%)
benchMultiFileImportAmp # 20 proc.......I0 - Mo1.783s (±0.00%)
benchMultiFileImportAmp # 34 proc.......I0 - Mo2.011s (±0.00%)

In GitHub Actions:

benchmark subject set mem_peak mode
ParallelBench benchMultiFileImportBulkWrite   13.361mb 17.157s
ParallelBench benchMultiFileImportInsertMany   14.100mb 16.229s
ParallelBench benchMultiFileImportFork 1 proc 1.220mb 18.366s
ParallelBench benchMultiFileImportFork 2 proc 1.220mb 9.754s
ParallelBench benchMultiFileImportFork 4 proc 1.220mb 9.783s
ParallelBench benchMultiFileImportFork 8 proc 1.220mb 9.927s
ParallelBench benchMultiFileImportFork 13 proc 1.220mb 10.010s
ParallelBench benchMultiFileImportFork 20 proc 1.220mb 9.969s
ParallelBench benchMultiFileImportFork 34 proc 1.220mb 10.232s
ParallelBench benchMultiFileImportAmp 1 proc 2.833mb 16.461s
ParallelBench benchMultiFileImportAmp 2 proc 2.871mb 9.082s
ParallelBench benchMultiFileImportAmp 4 proc 2.946mb 8.973s
ParallelBench benchMultiFileImportAmp 8 proc 3.099mb 9.245s
ParallelBench benchMultiFileImportAmp 13 proc 3.303mb 9.487s
ParallelBench benchMultiFileImportAmp 20 proc 3.601mb 9.796s
ParallelBench benchMultiFileImportAmp 34 proc 4.158mb 10.494s

It looks like the my Macbook has 10 Cores (8 performance and 2 efficiency) while the GitHub Actions runner may have 2.

benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
composer.json Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One actual suggestion to fix a bug, but LGTM after that.

I'll defer to you on whether file() is preferable to reading in a stream.

benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
benchmark/src/Utils.php Outdated Show resolved Hide resolved
benchmark/src/DriverBench/ParallelBench.php Outdated Show resolved Hide resolved
@GromNaN
Copy link
Member Author

GromNaN commented Sep 20, 2023

I'll defer to you on whether file() is preferable to reading in a stream.

file() is actually slower than fgets() and SplFileObject when xdebug is disabled.

stream_get_line is the winner.

with PHP version 8.2.9, xdebug ✅, opcache ❌

benchReadUsingFile......................I9 - Mo1.905ms (±3.01%)
benchReadUsingSplFileObject.............I9 - Mo3.293ms (±5.11%)
benchReadUsingFgets.....................I9 - Mo1.847ms (±25.05%)
benchReadUsingStreamGetLine.............I9 - Mo1.566ms (±5.72%)

with PHP version 8.2.9, xdebug ❌, opcache ✅

benchReadUsingFile......................I9 - Mo1.863ms (±4.92%)
benchReadUsingSplFileObject.............I9 - Mo1.396ms (±11.91%)
benchReadUsingFgets.....................I9 - Mo1.230ms (±5.81%)
benchReadUsingStreamGetLine.............I9 - Mo876.894μs (±5.04%)
#[Revs(10)]
#[Warmup(1)]
#[Iterations(10)]
final class ReadFileBench
{
    private const FILE = __DIR__.'/data.txt';

    public function benchReadUsingFile(): void
    {
        foreach (file(self::FILE, FILE_IGNORE_NEW_LINES | FILE_NO_DEFAULT_CONTEXT) as $line) {
            // process $line
        }
    }

    public function benchReadUsingSplFileObject(): void
    {
        $file = new \SplFileObject(self::FILE);
        foreach ($file as $line) {
            if ($line !== '') {
                // process $line
            }
        }
    }

    public function benchReadUsingFgets(): void
    {
        $fh = fopen([$file](self::FILE), 'r');
        while (($line = fgets($fh)) !== false) {
            if ($line !== '') {
                // process $line
            }
        }
        fclose($fh);
    }

    public function benchReadUsingStreamGetLine(): void
    {
        $fh = fopen(self::FILE, 'r');
        while (($line = stream_get_line($fh, 10_000, "\n")) !== false) {
            // process $line
        }
        fclose($fh);
    }
}

run: "vendor/bin/phpbench run --report=aggregate --report=bar_chart_time --report=env --output html"

- name: Upload HTML report
uses: actions/upload-artifact@v3
with:
name: phpbench-${{ github.sha }}.html
path: .phpbench/html/index.html
path: ./benchmark/.phpbench/html/index.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document instructions for referencing benchmarks somewhere? This may warrant a new heading in CONTRIBUTING.md.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal is to run benchmarks on evergreen and publish reports there. PHPLIB-1187

@GromNaN GromNaN merged commit 82a6397 into mongodb:master Sep 20, 2023
12 checks passed
@GromNaN GromNaN deleted the PHPLIB-1237 branch September 20, 2023 18:18
@GromNaN GromNaN changed the title PHPLIB-1237 Implement Parallel Benchmarks PHPLIB-1237 Implement Parallel Benchmarks LDJSON multi-file import Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants