Skip to content

Commit

Permalink
Upper limit on the number of processors set to 8
Browse files Browse the repository at this point in the history
  • Loading branch information
yohasebe committed May 13, 2023
1 parent 52dc4d7 commit 930cbb0
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 5 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ WP2TXT extracts text and category data from Wikipedia dump files (encoded in XML

## Changelog

**May 2023**

- Problems caused by too many parallel processors are addressed by setting the upper limit on the number of processors to 8.

**April 2023**

- File split/delete issues fixed
Expand Down Expand Up @@ -186,7 +190,7 @@ Command line options are as follows:
-g, --category-only Extract only article title and categories
-s, --summary-only Extract only article title, categories, and summary text before first heading
-f, --file-size=<i> Approximate size (in MB) of each output file (default: 10)
-n, --num-procs Number of proccesses to be run concurrently (default: max num of available CPU cores minus two)
-n, --num-procs Number of proccesses (up to 8) to be run concurrently (default: max num of available CPU cores minus two)
-x, --del-interfile Delete intermediate XML files from output dir
-t, --title, --no-title Keep page titles in output (default: true)
-d, --heading, --no-heading Keep section titles in output (default: true)
Expand Down
8 changes: 5 additions & 3 deletions bin/wp2txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# frozen_string_literal: true

DEBUG_MODE = false
MAX_PROCESSORS = 8

require_relative "../lib/wp2txt"
require_relative "../lib/wp2txt/utils"
Expand Down Expand Up @@ -34,7 +35,7 @@ class WpApp
opt :category_only, "Extract only article title and categories", default: false, short: "-g"
opt :summary_only, "Extract only article title, categories, and summary text before first heading", default: false, short: "-s"
opt :file_size, "Approximate size (in MB) of each output file", default: 10, short: "-f"
opt :num_procs, "Number of proccesses to be run concurrently (default: max num of CPU cores minus two)", short: "-n"
opt :num_procs, "Number of proccesses (up to #{MAX_PROCESSORS}) to be run concurrently (default: max num of CPU cores minus two)", type: Integer, short: "-n"
opt :del_interfile, "Delete intermediate XML files from output dir", short: "-x", default: false
opt :title, "Keep page titles in output", default: true, short: "-t"
opt :heading, "Keep section titles in output", default: true, short: "-d"
Expand All @@ -55,10 +56,11 @@ class WpApp
output_dir = opts[:output_dir]
tfile_size = opts[:file_size]
num_processors = Etc.nprocessors
num_processes = if opts[:num_procs] && opts[:num_procs].to_i <= num_processors
num_processes = if opts[:num_procs] && opts[:num_procs].to_i <= num_processors && opts[:num_procs].to_i <= MAX_PROCESSORS
opts[:num_procs]
else
num_processors - 2
minus2 = num_processors - 2
minus2 < MAX_PROCESSORS ? minus2 : MAX_PROCESSORS
end
num_processes = 1 if num_processes < 1

Expand Down
2 changes: 1 addition & 1 deletion lib/wp2txt/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# frozen_string_literal: true

module Wp2txt
VERSION = "1.1.2"
VERSION = "1.1.3"
end

0 comments on commit 930cbb0

Please sign in to comment.