Refactor everserver #9367

frode-aarstad · 2024-11-27T14:08:21Z

Issue
Resolves #9290

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

verveerpj

A few comments. Rebasing will probably cause some issues now...

verveerpj · 2024-12-02T10:02:06Z

src/everest/detached/jobs/everest_server_api.py

+from everest.util import makedirs_if_needed
+
+
+def _get_machine_name() -> str:


I think there is a get_machine_name in ERT already, can we use that?

In general, there might be other port/authentication related stuff in ERT. Maybe there should be a utils file for that somewhere?

Will replace with the one in ert_shared

verveerpj · 2024-12-02T10:05:38Z

src/everest/detached/jobs/everest_server_api.py

+            "running": status.running,
+            "waiting": status.waiting,
+            "pending": status.pending,
+            "complete": status.complete,
+            "failed": status.failed,
+        },


This code is going to change slightly in PR #9342, somebody will have to adapt

verveerpj · 2024-12-02T10:10:23Z

src/everest/detached/jobs/everest_server_api.py

+        uvicorn.run(
+            self.app,
+            host="0.0.0.0",
+            port=self.port,
+            ssl_keyfile=self.key_path,
+            ssl_certfile=self.cert_path,
+            ssl_version=ssl.PROTOCOL_SSLv23,
+            ssl_keyfile_password=self.key_pw,
+        )


Recently a log_level=logging.CRITICAL was added to that call, otherwise we got log messages in the console. If there is a better solution that would be fine, otherwise maybe add it here.

Will add it here

codecov-commenter · 2024-12-05T09:09:27Z

Codecov Report

Attention: Patch coverage is 95.17544% with 11 lines in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (90c14a9) to head (43eedff).
Report is 19 commits behind head on main.

Files with missing lines	Patch %	Lines
src/everest/detached/jobs/everest_server_api.py	95.73%	7 Missing ⚠️
src/everest/detached/__init__.py	75.00%	2 Missing ⚠️
src/everest/detached/jobs/everserver.py	96.07%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9367      +/-   ##
==========================================
+ Coverage   91.83%   91.86%   +0.03%     
==========================================
  Files         434      434              
  Lines       26735    26815      +80     
==========================================
+ Hits        24551    24633      +82     
+ Misses       2184     2182       -2

Flag	Coverage Δ
cli-tests	`39.37% <ø> (+0.04%)`	⬆️
everest-models-test	`34.76% <29.38%> (+0.02%)`	⬆️
gui-tests	`72.17% <ø> (+0.04%)`	⬆️
integration-test	`37.14% <84.64%> (+0.14%)`	⬆️
performance-tests	`51.91% <ø> (-0.01%)`	⬇️
test	`40.82% <92.54%> (+0.13%)`	⬆️
unit-tests	`73.97% <29.38%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

oyvindeide · 2024-12-05T09:45:25Z

src/everest/detached/jobs/everserver.py

+        response = requests.post(
+            url + "/" + START_ENDPOINT, verify=cert, auth=auth, proxies=PROXY


Did you consider starting the experiment here:

ert/src/everest/bin/everest_script.py

Line 132 in 6c86e6a

print("Everest server found!")

, where we start the server initially? Then passing the config through here.

What is the difference between everserver.py and everest_script.py ?

Something like:

@app.post("/experiments/") async def submit_experiment(experiment: EverestConfig, ...):

Whats the advantage of doing this?

We force more separation by starting the experiment through an api instead of relying on files on disk. We would also like to extend this concept to ert, and having a clear separation helps then.

What is the difference between everserver.py and everest_script.py ?

everserver.py is the actual server, while everest_script the client which starts the server and queries for status

So the current client does:

submit_server_to_cluster(path_to_config_file) wait_for_server monitor

think in the future we would like:

connect_to_already_running_server requests.post(server_url, json=everest_config) monitor

Then we are one step closer to having the experiment server, and more clear separation between the frontend and the backend.

oyvindeide

Looks good, I was wondering a bit about the structure? Have asked some questions below.

oyvindeide · 2024-12-12T09:58:57Z

src/everest/detached/jobs/everserver.py

@@ -308,26 +171,57 @@ def main():
        return

    try:
+        # wait until the api server is running
+        is_running = False
+        while not is_running:


Might need some timeouts here? We should be doing something of the same for dark_storage i guess? Also starting a server there, would it be possible to alight this and that?

I think that is work for another PR if we want to start evereserver as we do the storage server.

But here we can throw an exception after a few retries

oyvindeide · 2024-12-12T10:04:54Z

src/everest/detached/jobs/everest_server_api.py

+class ExperimentRunner(threading.Thread):
+    def __init__(self, everest_config, state: dict):
+        super().__init__()
+
+        self.everest_config = everest_config
+        self.state = state
+        self.status: Optional[ExperimentRunnerStatus] = None
+
+    def run(self):
+        run_model = EverestRunModel.create(
+            self.everest_config,
+            simulation_callback=partial(_sim_monitor, shared_data=self.state),
+            optimization_callback=partial(_opt_monitor, shared_data=self.state),
+        )
+
+        evaluator_server_config = EvaluatorServerConfig(
+            custom_port_range=range(49152, 51819)
+            if run_model.ert_config.queue_config.queue_system == QueueSystem.LOCAL
+            else None
+        )
+
+        try:
+            run_model.run_experiment(evaluator_server_config)
+            self.status = ExperimentRunnerStatus(
+                status="Experiment finished", exit_code=run_model.exit_code
+            )
+        except Exception:
+            self.status = ExperimentRunnerStatus(
+                status="Experiment failed", message=traceback.format_exc()
+            )
+
+    def get_status(self) -> Optional[ExperimentRunnerStatus]:
+        return self.status


Should we consider using a background task for this instead? https://fastapi.tiangolo.com/tutorial/background-tasks/ Would be fairly similar, but can offload more to fastapi.

something like:

class ExperimentTask: def __init__(self, model: EverestRunModel ) -> None: self._model = model self.model_type = str(model.name()) self._events: List[StatusEvents] = [] def cancel(self) -> None: if self._model is not None: self._model.cancel() async def run(self): loop = asyncio.get_running_loop() logger.info(f"Starting experiment") port_range = None if self._model.queue_system == QueueSystem.LOCAL: port_range = range(49152, 51819) evaluator_server_config = EvaluatorServerConfig(custom_port_range=port_range) simulation_future = loop.run_in_executor( None, lambda: self._model.start_simulations_thread( evaluator_server_config ), )

then the endpoint can do:

@app.post("/experiments/", response_model=ExperimentOut) async def submit_experiment(experiment: EverestConfig, background_tasks: BackgroundTasks): try: run_model = EverestRunModel.create( self.everest_config, simulation_callback=partial(_sim_monitor, shared_data=self.state), optimization_callback=partial(_opt_monitor, shared_data=self.state), ) except ValueError as e: return HTTPException( status_code=420, detail="Experiment was not valid, failed with: {e}", ) task = ExperimentTask(model=run_model) experiments[experiment_id] = task background_tasks.add_task(task.run) return ExperimentOut(id=uuid_here, type=task.model_type)

then the next step after this PR is to extract the events from the run model.

oyvindeide · 2024-12-12T10:32:55Z

src/everest/detached/jobs/everest_server_api.py

+security = HTTPBasic()
+
+
+class EverestServerAPI(threading.Thread):


Looks like this code is mostly moved around? What is the advantage of using a class here instead of:

def everserver_thread(output_dir: str, optimization_output_dir: str): app = FastAPI() state = { SIM_PROGRESS_ENDPOINT: {}, STOP_ENDPOINT: False, } runner: Optional[ExperimentRunner] = None authentication = _generate_authentication() cert_path, key_path, key_pw = _generate_certificate( ServerConfig.get_certificate_dir(output_dir) ) host = ert_shared_get_machine_name() port = _find_open_port(host, lower=5000, upper=5800) _write_hostfile( ServerConfig.get_hostfile_path(output_dir), host, port, cert_path, authentication ) def check_user(credentials: HTTPBasicCredentials) -> None: if credentials.password != authentication: raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials", headers={"WWW-Authenticate": "Basic"}, ) def log(request: Request) -> None: logging.getLogger("everserver").info( f"{request.scope['path']} entered from {request.client.host if request.client else 'unknown host'} with HTTP {request.method}" ) @app.get("/") def get_status(request: Request, credentials: HTTPBasicCredentials = Depends(security)): log(request) check_user(credentials) ... ... uvicorn.run( app, host="0.0.0.0", port=port, ssl_keyfile=key_path, ssl_certfile=cert_path, ssl_version=ssl.PROTOCOL_SSLv23, ssl_keyfile_password=key_pw, log_level=logging.CRITICAL, )

Personally I find it easier to read the endpoints when they are decorated, but there might be good reasons for making a class instead.

Just to make the everserver.py file smaller and the code more readable

I have no problem with moving it, but think the history between moving/refactoring and adding the new end-point, as it was a bit difficult to tell them apart. Personally I prefer the decorated functions instead of the class approach, as I find them easier to read.

… experiment

verveerpj · 2024-12-20T12:52:04Z

I see that there is some rebasing needed in any case. My PR will also cause some need for manually fixing when rebasing, in particular the way optimization exit code is handled. Since you are going to need to resolve things anyway, I will merge my PR, let me know if you need some help with the rebase.

frode-aarstad · 2024-12-20T13:00:31Z

I see that there is some rebasing needed in any case. My PR will also cause some need for manually fixing when rebasing, in particular the way optimization exit code is handled. Since you are going to need to resolve things anyway, I will merge my PR, let me know if you need some help with the rebase.

Sounds good

frode-aarstad marked this pull request as draft November 27, 2024 14:08

frode-aarstad force-pushed the refactor-everserver branch 2 times, most recently from 1f0ef43 to 080ce06 Compare November 28, 2024 12:00

frode-aarstad self-assigned this Nov 28, 2024

frode-aarstad added the release-notes:maintenance Automatically categorise as maintenance change in release notes label Nov 28, 2024

frode-aarstad marked this pull request as ready for review November 28, 2024 12:17

frode-aarstad marked this pull request as draft November 28, 2024 12:18

frode-aarstad marked this pull request as ready for review November 28, 2024 13:23

frode-aarstad force-pushed the refactor-everserver branch 2 times, most recently from eed62a1 to bc30caf Compare November 28, 2024 19:16

verveerpj reviewed Dec 2, 2024

View reviewed changes

frode-aarstad force-pushed the refactor-everserver branch from bc30caf to cc87699 Compare December 5, 2024 08:53

oyvindeide reviewed Dec 5, 2024

View reviewed changes

frode-aarstad force-pushed the refactor-everserver branch 4 times, most recently from 8690547 to 43eedff Compare December 10, 2024 09:54

oyvindeide reviewed Dec 12, 2024

View reviewed changes

frode-aarstad force-pushed the refactor-everserver branch from ce686fd to cfa67d4 Compare December 12, 2024 12:16

Split everserver functionality between starting server and submitting…

cfa871f

… experiment

frode-aarstad force-pushed the refactor-everserver branch from cfa67d4 to cfa871f Compare December 12, 2024 12:24

eivindjahren added the christmas-review label Dec 13, 2024

oyvindeide removed the christmas-review label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor everserver #9367

Refactor everserver #9367

frode-aarstad commented Nov 27, 2024 •

edited

Loading

verveerpj left a comment

verveerpj Dec 2, 2024

frode-aarstad Dec 5, 2024

verveerpj Dec 2, 2024

verveerpj Dec 2, 2024

frode-aarstad Dec 5, 2024

codecov-commenter commented Dec 5, 2024 •

edited

Loading

oyvindeide Dec 5, 2024

frode-aarstad Dec 5, 2024

oyvindeide Dec 5, 2024

frode-aarstad Dec 5, 2024

oyvindeide Dec 5, 2024

oyvindeide Dec 5, 2024 •

edited

Loading

oyvindeide Dec 6, 2024

oyvindeide left a comment

oyvindeide Dec 12, 2024

frode-aarstad Dec 12, 2024

frode-aarstad Dec 12, 2024

oyvindeide Dec 12, 2024

oyvindeide Dec 12, 2024

frode-aarstad Dec 12, 2024

oyvindeide Dec 18, 2024

verveerpj commented Dec 20, 2024

frode-aarstad commented Dec 20, 2024

		from everest.util import makedirs_if_needed


		def _get_machine_name() -> str:

		response = requests.post(
		url + "/" + START_ENDPOINT, verify=cert, auth=auth, proxies=PROXY

		security = HTTPBasic()


		class EverestServerAPI(threading.Thread):

Refactor everserver #9367

Are you sure you want to change the base?

Refactor everserver #9367

Conversation

frode-aarstad commented Nov 27, 2024 • edited Loading

When applicable

verveerpj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 5, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oyvindeide Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oyvindeide left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

verveerpj commented Dec 20, 2024

frode-aarstad commented Dec 20, 2024

frode-aarstad commented Nov 27, 2024 •

edited

Loading

codecov-commenter commented Dec 5, 2024 •

edited

Loading

oyvindeide Dec 5, 2024 •

edited

Loading