Remove deprecated torque options

This commit removes removes the deprecated torque/openpbs queue options: * QUEUE_QUERY_TIMEOUT * NUM_NODES * NUM_CPUS_PER_NODE * QSTAT_OPTIONS * MEMORY_PER_JOB
equinor · Dec 20, 2024 · 13a3e24 · 13a3e24
1 parent 2e9c71a
commit 13a3e24
Show file tree

Hide file tree

Showing 15 changed files with 19 additions and 473 deletions.
diff --git a/docs/ert/reference/configuration/keywords.rst b/docs/ert/reference/configuration/keywords.rst
@@ -1825,9 +1825,8 @@ in :ref:`queue-system-chapter`. In brief, the queue systems have the following o
   ``BHIST_CMD``, ``SUBMIT_SLEEP``, ``PROJECT_CODE``, ``EXCLUDE_HOST``,
   ``MAX_RUNNING``
 * :ref:`TORQUE <pbs-systems>` — ``QSUB_CMD``, ``QSTAT_CMD``, ``QDEL_CMD``,
-  ``QSTAT_OPTIONS``, ``QUEUE``, ``CLUSTER_LABEL``, ``MAX_RUNNING``, ``NUM_NODES``,
-  ``NUM_CPUS_PER_NODE``, ``MEMORY_PER_JOB``, ``KEEP_QSUB_OUTPUT``, ``SUBMIT_SLEEP``,
-  ``QUEUE_QUERY_TIMEOUT``
+  ``QUEUE``, ``CLUSTER_LABEL``, ``MAX_RUNNING``, ``KEEP_QSUB_OUTPUT``,
+  ``SUBMIT_SLEEP``
 * :ref:`SLURM <slurm-systems>` — ``SBATCH``, ``SCANCEL``, ``SCONTROL``, ``SACCT``,
   ``SQUEUE``, ``PARTITION``, ``SQUEUE_TIMEOUT``, ``MAX_RUNTIME``, ``INCLUDE_HOST``,
   ``EXCLUDE_HOST``, ``MAX_RUNNING``

diff --git a/docs/ert/reference/configuration/queue.rst b/docs/ert/reference/configuration/queue.rst
@@ -251,12 +251,6 @@ The following is a list of all queue-specific configuration options:
     QUEUE_OPTION TORQUE QSTAT_CMD /path/to/my/qstat
     QUEUE_OPTION TORQUE QDEL_CMD /path/to/my/qdel
 
-.. _torque_qstat_options:
-.. topic:: QSTAT_OPTIONS
-
-  Options to be supplied to the ``qstat`` command. This defaults to :code:`-x`,
-  which tells the ``qstat`` command to include exited processes.
-
 .. _torque_queue:
 .. topic:: QUEUE
 
@@ -283,37 +277,6 @@ The following is a list of all queue-specific configuration options:
 
   If ``n`` is zero (the default), then it is set to the number of realizations.
 
-.. _torque_nodes_cpus:
-.. topic:: NUM_NODES, NUM_CPUS_PER_NODE
-
-  The support for running a job over multiple nodes is deprecated in Ert,
-  but was previously accomplished by setting NUM_NODES to a number larger
-  than 1.
-
-  NUM_CPUS_PER_NODE is deprecated, instead please use NUM_CPU to specify the
-  number of CPU cores to reserve on a single compute node.
-
-.. _torque_memory_per_job:
-.. topic:: MEMORY_PER_JOB
-
-  You can specify the amount of memory you will need for running your
-  job. This will ensure that not too many jobs will run on a single
-  shared memory node at once, possibly crashing the compute node if it
-  runs out of memory.
-
-  You can get an indication of the memory requirement by watching the
-  course of a local run using the ``htop`` utility. Whether you should set
-  the peak memory usage as your requirement or a lower figure depends on
-  how simultaneously each job will run.
-
-  The option to be supplied will be used as a string in the ``qsub``
-  argument. You must specify the unit, either ``gb`` or ``mb`` as in
-  the example::
-
-    QUEUE_OPTION TORQUE MEMORY_PER_JOB 16gb
-
-  By default, this value is not set.
-
 .. _torque_keep_qsub_output:
 .. topic:: KEEP_QSUB_OUTPUT
 
@@ -332,23 +295,6 @@ The following is a list of all queue-specific configuration options:
 
     QUEUE_OPTION TORQUE SUBMIT_SLEEP 0.5
 
-.. _torque_queue_query_timeout:
-.. topic:: QUEUE_QUERY_TIMEOUT
-
-  The driver allows the backend TORQUE/PBS system to be flaky, i.e. it may
-  intermittently not respond and give error messages when submitting jobs
-  or asking for job statuses. The timeout (in seconds) determines how long
-  ERT will wait before it will give up. Applies to job submission (``qsub``)
-  and job status queries (``qstat``). Default is 126 seconds.
-
-  ERT will do exponential sleeps, starting at 2 seconds, and the provided
-  timeout is a maximum. Let the timeout be sums of series like 2+4+8+16+32+64
-  in order to be explicit about the number of retries. Set to zero to disallow
-  flakyness, setting it to 2 will allow for one re-attempt, and 6 will give two
-  re-attempts. Example allowing six retries::
-
-    QUEUE_OPTION TORQUE QUEUE_QUERY_TIMEOUT 254
-
 .. _torque_project_code:
 .. topic:: PROJECT_CODE
 

diff --git a/docs/everest/config_generated.rst b/docs/everest/config_generated.rst
@@ -1037,27 +1037,12 @@ Simulation settings
     The kill command
 
 
-**qstat_options (optional)**
-    Type: *Optional[str]*
-
-    Options to be supplied to the qstat command. This defaults to -x, which tells the qstat command to include exited processes.
-
-
 **cluster_label (optional)**
     Type: *Optional[str]*
 
     The name of the cluster you are running simulations in.
 
 
-**memory_per_job (optional)**
-    Type: *Optional[str]*
-
-    You can specify the amount of memory you will need for running your job. This will ensure that not too many jobs will run on a single shared memory node at once, possibly crashing the compute node if it runs out of memory.
-    You can get an indication of the memory requirement by watching the course of a local run using the htop utility. Whether you should set the peak memory usage as your requirement or a lower figure depends on how simultaneously each job will run.
-    The option to be supplied will be used as a string in the qsub argument. You must specify the unit, either gb or mb.
-
-
-
 **keep_qsub_output (optional)**
     Type: *Optional[int]*
 
@@ -1070,15 +1055,6 @@ Simulation settings
     To avoid stressing the TORQUE/PBS system you can instruct the driver to sleep for every submit request. The argument to the SUBMIT_SLEEP is the number of seconds to sleep for every submit, which can be a fraction like 0.5
 
 
-**queue_query_timeout (optional)**
-    Type: *Optional[int]*
-
-
-    The driver allows the backend TORQUE/PBS system to be flaky, i.e. it may intermittently not respond and give error messages when submitting jobs or asking for job statuses. The timeout (in seconds) determines how long ERT will wait before it will give up. Applies to job submission (qsub) and job status queries (qstat). Default is 126 seconds.
-    ERT will do exponential sleeps, starting at 2 seconds, and the provided timeout is a maximum. Let the timeout be sums of series like 2+4+8+16+32+64 in order to be explicit about the number of retries. Set to zero to disallow flakyness, setting it to 2 will allow for one re-attempt, and 6 will give two re-attempts. Example allowing six retries:
-
-
-
 **project_code (optional)**
     Type: *Optional[str]*
 

diff --git a/src/ert/config/parsing/config_schema_deprecations.py b/src/ert/config/parsing/config_schema_deprecations.py
@@ -181,36 +181,6 @@
         "for the Ensemble Smoother update algorithm. "
         "Please use ENKF_ALPHA and STD_CUTOFF keywords instead.",
     ),
-    DeprecationInfo(
-        keyword="QUEUE_OPTION",
-        message="QUEUE_QUERY_TIMEOUT as QUEUE_OPTION is ignored. "
-        "Please remove the line.",
-        check=lambda line: "QUEUE_QUERY_TIMEOUT" in line,
-    ),
-    DeprecationInfo(
-        keyword="QUEUE_OPTION",
-        message="QSTAT_OPTIONS as QUEUE_OPTION to the TORQUE is ignored. "
-        "Please remove the line.",
-        check=lambda line: "QSTAT_OPTIONS" in line,
-    ),
-    DeprecationInfo(
-        keyword="QUEUE_OPTION",
-        message="NUM_CPUS_PER_NODE as QUEUE_OPTION to Torque is deprecated and will removed in "
-        "the future. Replace by NUM_CPU.",
-        check=lambda line: "NUM_CPUS_PER_NODE" in line,
-    ),
-    DeprecationInfo(
-        keyword="QUEUE_OPTION",
-        message="NUM_NODES as QUEUE_OPTION to Torque is deprecated and will removed in "
-        "the future. Replace by NUM_CPU on a single compute node.",
-        check=lambda line: "NUM_NODES" in line,
-    ),
-    DeprecationInfo(
-        keyword="QUEUE_OPTION",
-        message="MEMORY_PER_JOB as QUEUE_OPTION to TORQUE is deprecated and will be removed in "
-        "the future. Replace by REALIZATION_MEMORY.",
-        check=lambda line: "MEMORY_PER_JOB" in line,
-    ),
     DeprecationInfo(
         keyword="QUEUE_OPTION",
         message="Memory requirements in LSF should now be set using REALIZATION_MEMORY and not"

diff --git a/src/ert/config/queue_config.py b/src/ert/config/queue_config.py
@@ -126,34 +126,19 @@ class TorqueQueueOptions(QueueOptions):
     qstat_cmd: NonEmptyString | None = None
     qdel_cmd: NonEmptyString | None = None
     queue: NonEmptyString | None = None
-    memory_per_job: NonEmptyString | None = None
-    num_cpus_per_node: pydantic.PositiveInt = 1
-    num_nodes: pydantic.PositiveInt = 1
     cluster_label: NonEmptyString | None = None
     job_prefix: NonEmptyString | None = None
     keep_qsub_output: bool = False
 
-    qstat_options: str | None = pydantic.Field(default=None, deprecated=True)
-    queue_query_timeout: str | None = pydantic.Field(default=None, deprecated=True)
-
     @property
     def driver_options(self) -> dict[str, Any]:
         driver_dict = asdict(self)
         driver_dict.pop("name")
         driver_dict["queue_name"] = driver_dict.pop("queue")
         driver_dict.pop("max_running")
         driver_dict.pop("submit_sleep")
-        driver_dict.pop("qstat_options")
-        driver_dict.pop("queue_query_timeout")
         return driver_dict
 
-    @pydantic.field_validator("memory_per_job")
-    @classmethod
-    def check_memory_per_job(cls, value: str | None) -> str | None:
-        if not torque_memory_usage_format.validate(value):
-            raise ValueError("wrong memory format")
-        return value
-
 
 @pydantic.dataclasses.dataclass
 class SlurmQueueOptions(QueueOptions):
@@ -322,23 +307,6 @@ def from_dict(cls, config_dict: ConfigDict) -> QueueConfig:
             if tags:
                 queue_options.project_code = "+".join(tags)
 
-        if selected_queue_system == QueueSystem.TORQUE:
-            _check_num_cpu_requirement(
-                config_dict.get("NUM_CPU", 1), queue_options, raw_queue_options
-            )
-
-        for _queue_vals in all_validated_queue_options.values():
-            if (
-                isinstance(_queue_vals, TorqueQueueOptions)
-                and _queue_vals.memory_per_job
-                and realization_memory
-            ):
-                _throw_error_or_warning(
-                    "Do not specify both REALIZATION_MEMORY and TORQUE option MEMORY_PER_JOB",
-                    "MEMORY_PER_JOB",
-                    selected_queue_system == QueueSystem.TORQUE,
-                )
-
         return QueueConfig(
             job_script,
             realization_memory,
@@ -369,22 +337,6 @@ def submit_sleep(self) -> float:
         return self.queue_options.submit_sleep
 
 
-def _check_num_cpu_requirement(
-    num_cpu: int, torque_options: TorqueQueueOptions, raw_queue_options: list[list[str]]
-) -> None:
-    flattened_raw_options = [item for line in raw_queue_options for item in line]
-    if (
-        "NUM_NODES" not in flattened_raw_options
-        and "NUM_CPUS_PER_NODE" not in flattened_raw_options
-    ):
-        return
-    if num_cpu != torque_options.num_nodes * torque_options.num_cpus_per_node:
-        raise ConfigValidationError(
-            f"When NUM_CPU is {num_cpu}, then the product of NUM_NODES ({torque_options.num_nodes}) "
-            f"and NUM_CPUS_PER_NODE ({torque_options.num_cpus_per_node}) must be equal."
-        )
-
-
 def _parse_realization_memory_str(realization_memory_str: str) -> int:
     if "-" in realization_memory_str:
         raise ConfigValidationError(

diff --git a/src/ert/scheduler/openpbs_driver.py b/src/ert/scheduler/openpbs_driver.py
@@ -124,9 +124,6 @@ def __init__(
         queue_name: str | None = None,
         project_code: str | None = None,
         keep_qsub_output: bool | None = None,
-        memory_per_job: str | None = None,
-        num_nodes: int | None = None,
-        num_cpus_per_node: int | None = None,
         cluster_label: str | None = None,
         job_prefix: str | None = None,
         qsub_cmd: str | None = None,
@@ -139,9 +136,6 @@ def __init__(
         self._queue_name = queue_name
         self._project_code = project_code
         self._keep_qsub_output = keep_qsub_output
-        self._memory_per_job = memory_per_job
-        self._num_nodes: int | None = num_nodes
-        self._num_cpus_per_node: int | None = num_cpus_per_node
         self._cluster_label: str | None = cluster_label
         self._job_prefix = job_prefix
         self._max_pbs_cmd_attempts = 10
@@ -158,45 +152,15 @@ def __init__(
         self._finished_job_ids: set[str] = set()
         self._finished_iens: set[int] = set()
 
-        if self._num_nodes is not None and self._num_nodes > 1:
-            logger.warning(
-                "OpenPBSDriver initialized with num_nodes > 1, "
-                "this behaviour is deprecated and will be removed"
-            )
-
-        if self._num_cpus_per_node is not None and self._num_cpus_per_node > 1:
-            logger.warning(
-                "OpenPBSDriver initialized with num_cpus_per_node, "
-                "this behaviour is deprecated and will be removed. "
-                "Use NUM_CPU in the config instead."
-            )
-
     def _build_resource_string(
         self, num_cpu: int = 1, realization_memory: int = 0
     ) -> list[str]:
         resource_specifiers: list[str] = []
 
         cpu_resources: list[str] = []
-        if self._num_nodes is not None:
-            cpu_resources += [f"select={self._num_nodes}"]
-        if self._num_cpus_per_node is not None:
-            num_nodes = self._num_nodes or 1
-            if num_cpu != self._num_cpus_per_node * num_nodes:
-                raise ValueError(
-                    f"NUM_CPUS_PER_NODE ({self._num_cpus_per_node}) must be equal "
-                    f"to NUM_CPU ({num_cpu}). "
-                    "Please remove NUM_CPUS_PER_NODE from the configuration"
-                )
         if num_cpu > 1:
             cpu_resources += [f"ncpus={num_cpu}"]
-        if self._memory_per_job is not None and realization_memory > 0:
-            raise ValueError(
-                "Overspecified memory pr job. "
-                "Do not specify both memory_per_job and realization_memory"
-            )
-        if self._memory_per_job is not None:
-            cpu_resources += [f"mem={self._memory_per_job}"]
-        elif realization_memory > 0:
+        if realization_memory > 0:
             cpu_resources += [f"mem={realization_memory // 1024**2 }mb"]
         if cpu_resources:
             resource_specifiers.append(":".join(cpu_resources))

diff --git a/src/everest/config/simulator_config.py b/src/everest/config/simulator_config.py
@@ -119,21 +119,10 @@ class SimulatorConfig(BaseModel, HasErtQueueOptions, extra="forbid"):  # type: i
     qsub_cmd: str | None = Field(default="qsub", description="The submit command")
     qstat_cmd: str | None = Field(default="qstat", description="The query command")
     qdel_cmd: str | None = Field(default="qdel", description="The kill command")
-    qstat_options: str | None = Field(
-        default="-x",
-        description="Options to be supplied to the qstat command. This defaults to -x, which tells the qstat command to include exited processes.",
-    )
     cluster_label: str | None = Field(
         default=None,
         description="The name of the cluster you are running simulations in.",
     )
-    memory_per_job: str | None = Field(
-        default=None,
-        description="""You can specify the amount of memory you will need for running your job. This will ensure that not too many jobs will run on a single shared memory node at once, possibly crashing the compute node if it runs out of memory.
-    You can get an indication of the memory requirement by watching the course of a local run using the htop utility. Whether you should set the peak memory usage as your requirement or a lower figure depends on how simultaneously each job will run.
-    The option to be supplied will be used as a string in the qsub argument. You must specify the unit, either gb or mb.
-    """,
-    )
     keep_qsub_output: int | None = Field(
         default=0,
         description="Set to 1 to keep error messages from qsub. Usually only to be used if somethign is seriously wrong with the queue environment/setup.",
@@ -142,13 +131,6 @@ class SimulatorConfig(BaseModel, HasErtQueueOptions, extra="forbid"):  # type: i
         default=0.5,
         description="To avoid stressing the TORQUE/PBS system you can instruct the driver to sleep for every submit request. The argument to the SUBMIT_SLEEP is the number of seconds to sleep for every submit, which can be a fraction like 0.5",
     )
-    queue_query_timeout: int | None = Field(
-        default=126,
-        description="""
-    The driver allows the backend TORQUE/PBS system to be flaky, i.e. it may intermittently not respond and give error messages when submitting jobs or asking for job statuses. The timeout (in seconds) determines how long ERT will wait before it will give up. Applies to job submission (qsub) and job status queries (qstat). Default is 126 seconds.
-    ERT will do exponential sleeps, starting at 2 seconds, and the provided timeout is a maximum. Let the timeout be sums of series like 2+4+8+16+32+64 in order to be explicit about the number of retries. Set to zero to disallow flakyness, setting it to 2 will allow for one re-attempt, and 6 will give two re-attempts. Example allowing six retries:
-    """,
-    )
     project_code: str | None = Field(
         default=None,
         description="String identifier used to map hardware resource usage to a project or account. The project or account does not have to exist.",

diff --git a/src/everest/config_keys.py b/src/everest/config_keys.py
@@ -123,7 +123,6 @@ class ConfigKeys:
     TORQUE_QDEL_CMD = "qdel_cmd"
     TORQUE_QUEUE_NAME = "name"
     TORQUE_CLUSTER_LABEL = "cluster_label"
-    TORQUE_MEMORY_PER_JOB = "memory_per_job"
     TORQUE_KEEP_QSUB_OUTPUT = "keep_qsub_output"
     TORQUE_SUBMIT_SLEEP = "submit_sleep"
     TORQUE_PROJECT_CODE = "project_code"

diff --git a/src/everest/queue_driver/queue_driver.py b/src/everest/queue_driver/queue_driver.py
@@ -32,8 +32,6 @@
     (ConfigKeys.TORQUE_QDEL_CMD, "QDEL_CMD"),
     (ConfigKeys.TORQUE_QUEUE_NAME, "QUEUE"),
     (ConfigKeys.TORQUE_CLUSTER_LABEL, "CLUSTER_LABEL"),
-    (ConfigKeys.CORES_PER_NODE, "NUM_CPUS_PER_NODE"),
-    (ConfigKeys.TORQUE_MEMORY_PER_JOB, "MEMORY_PER_JOB"),
     (ConfigKeys.TORQUE_KEEP_QSUB_OUTPUT, "KEEP_QSUB_OUTPUT"),
     (ConfigKeys.TORQUE_SUBMIT_SLEEP, "SUBMIT_SLEEP"),
     (ConfigKeys.TORQUE_PROJECT_CODE, "PROJECT_CODE"),

diff --git a/tests/ert/unit_tests/config/config_dict_generator.py b/tests/ert/unit_tests/config/config_dict_generator.py
@@ -167,8 +167,6 @@ def valid_queue_values(option_name, queue_system):
     elif option_name in queue_options_by_type["posfloat"][queue_system]:
         return small_floats.map(str)
     elif option_name in queue_options_by_type["posint"][queue_system]:
-        if option_name in {"NUM_NODES", "NUM_CPUS_PER_NODE"}:
-            return st.just("1")
         return positives.map(str)
     elif option_name in queue_options_by_type["bool"][queue_system]:
         return booleans.map(str)