Status of running coffea 202x on lxplus #1025

ikrommyd · 2024-02-02T19:35:01Z

ikrommyd
Feb 2, 2024
Maintainer

I'm starting this discussion to ask about the status of running analyses in lxplus using coffea 202x.
One very common thing people within CMS want when running code, is for it to be able to run on lxplus. Not everyone can get an LPC account and I think a lot of people are used to just automatically sshing into lxplus for work.
I've tried using https://github.com/cernops/dask-lxplus in the past with coffea 202x but workers just wouldn't spawn. I could be doing something wrong obviously so my question is: Are we able to run coffea 202x on lxplus?
If not, I believe we should be able to.

@lgray @nsmith- @JaLuka98 @valsdav mentioning a few people who might know the answer.

ikrommyd · 2024-02-20T18:41:24Z

ikrommyd
Feb 20, 2024
Maintainer Author

I was able to run this on lxplus using a pocketcoffea apptainer image @valsdav created with coffea 2023 and dask lxplus installed.
One worker spawned to compute the Pt of the two partitions.
I will attempt a more complicated example instead of just grabbing the Pt soon.

import json
import socket
import os

import dask
from distributed import Client
from dask_lxplus import CernCluster
from coffea.nanoevents import NanoEventsFactory

files = {
    "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root": "Events",
    "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012C_DoubleMuParked.root": "Events",
}

if __name__ == "__main__":
    env_extra = [
        "export XRD_RUNFORKHANDLER=1",
        f"export X509_USER_PROXY=/tmp/x509up_u154691",
        f"export PYTHONPATH=$PYTHONPATH:{os.getcwd()}",
    ]

    cluster = CernCluster(
        cores=1,
        memory="4GB",
        disk="1GB",
        image_type="singularity",
        worker_image="/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-analysis/general/pocketcoffea:lxplus-cc7-coffea2023",
        death_timeout="3600",
        scheduler_options={"port": 8786, "host": socket.gethostname()},
        job_extra={
            "log": "dask_job_output.log",
            "output": "dask_job_output.out",
            "error": "dask_job_output.err",
            "should_transfer_files": "Yes",
            "when_to_transfer_output": "ON_EXIT",
            "+JobFlavour": '"longlunch"',
        },
        env_extra=env_extra,
    )
    cluster.adapt(minimum=0, maximum=10)
    client = Client(cluster)
    events = NanoEventsFactory.from_root(files).events()
    out = events.Muon.pt
    print(dask.compute(out))

0 replies

ikrommyd · 2024-02-23T11:24:37Z

ikrommyd
Feb 23, 2024
Maintainer Author

I'm able to run more complicated stuff on lxplus with the dask-lxplus plugin using @valsdav's apptainer image with coffea 2023 and dask-lxplus. It appears it's a good idea to create coffea images with dask-lxplus installed.
However, most of the times the workers take a while to spawn (could be like an hour) and the condor jobs stay on idle for that time so some communication with lxplus to give dask workers higher priority may be required.

[ikrommyd@lxplus990 egamma-tnp]$ condor_q


-- Schedd: bigbird18.cern.ch : <188.185.71.83:9618?... @ 02/23/24 12:18:01
OWNER    BATCH_NAME        SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
ikrommyd CernCluster-28   2/23 12:00      _      1      _      1 5615362.0
ikrommyd CernCluster-0    2/23 12:00      _      1      _      1 5615363.0
ikrommyd CernCluster-19   2/23 12:00      _      1      _      1 5615364.0
ikrommyd CernCluster-26   2/23 12:00      _      1      _      1 5615365.0
ikrommyd CernCluster-29   2/23 12:00      _      1      _      1 5615366.0
ikrommyd CernCluster-23   2/23 12:00      _      1      _      1 5615367.0
ikrommyd CernCluster-45   2/23 12:00      _      1      _      1 5615368.0
ikrommyd CernCluster-27   2/23 12:00      _      1      _      1 5615369.0
ikrommyd CernCluster-34   2/23 12:00      _      1      _      1 5615370.0
ikrommyd CernCluster-44   2/23 12:00      _      1      _      1 5615371.0
ikrommyd CernCluster-37   2/23 12:00      _      1      _      1 5615372.0
ikrommyd CernCluster-5    2/23 12:00      _      1      _      1 5615373.0
ikrommyd CernCluster-30   2/23 12:00      _      1      _      1 5615374.0
ikrommyd CernCluster-47   2/23 12:00      _      1      _      1 5615375.0
ikrommyd CernCluster-20   2/23 12:00      _      1      _      1 5615376.0
ikrommyd CernCluster-11   2/23 12:00      _      1      _      1 5615377.0
ikrommyd CernCluster-31   2/23 12:00      _      1      _      1 5615378.0
ikrommyd CernCluster-16   2/23 12:00      _      1      _      1 5615379.0
ikrommyd CernCluster-13   2/23 12:00      _      1      _      1 5615380.0
ikrommyd CernCluster-40   2/23 12:00      _      1      _      1 5615381.0
ikrommyd CernCluster-43   2/23 12:00      _      1      _      1 5615382.0
ikrommyd CernCluster-21   2/23 12:00      _      1      _      1 5615383.0
ikrommyd CernCluster-8    2/23 12:01      _      1      _      1 5615384.0
ikrommyd CernCluster-6    2/23 12:01      _      1      _      1 5615385.0
ikrommyd CernCluster-2    2/23 12:01      _      1      _      1 5615386.0
ikrommyd CernCluster-10   2/23 12:01      _      1      _      1 5615387.0
ikrommyd CernCluster-32   2/23 12:01      _      1      _      1 5615388.0
ikrommyd CernCluster-14   2/23 12:01      _      1      _      1 5615389.0
ikrommyd CernCluster-48   2/23 12:01      _      1      _      1 5615390.0
ikrommyd CernCluster-35   2/23 12:01      _      1      _      1 5615391.0
ikrommyd CernCluster-9    2/23 12:01      _      1      _      1 5615392.0
ikrommyd CernCluster-42   2/23 12:01      _      1      _      1 5615393.0
ikrommyd CernCluster-46   2/23 12:01      _      1      _      1 5615394.0
ikrommyd CernCluster-17   2/23 12:01      _      1      _      1 5615395.0
ikrommyd CernCluster-24   2/23 12:01      _      1      _      1 5615396.0
ikrommyd CernCluster-7    2/23 12:01      _      1      _      1 5615397.0
ikrommyd CernCluster-4    2/23 12:01      _      1      _      1 5615398.0
ikrommyd CernCluster-12   2/23 12:01      _      1      _      1 5615399.0
ikrommyd CernCluster-18   2/23 12:01      _      1      _      1 5615400.0
ikrommyd CernCluster-39   2/23 12:01      _      1      _      1 5615401.0
ikrommyd CernCluster-15   2/23 12:01      _      1      _      1 5615402.0
ikrommyd CernCluster-1    2/23 12:01      _      1      _      1 5615403.0
ikrommyd CernCluster-25   2/23 12:01      _      1      _      1 5615404.0
ikrommyd CernCluster-36   2/23 12:01      _      1      _      1 5615405.0
ikrommyd CernCluster-38   2/23 12:01      _      1      _      1 5615406.0
ikrommyd CernCluster-33   2/23 12:01      _      1      _      1 5615407.0
ikrommyd CernCluster-41   2/23 12:01      _      1      _      1 5615408.0
ikrommyd CernCluster-3    2/23 12:01      _      1      _      1 5615409.0
ikrommyd CernCluster-22   2/23 12:01      _      1      _      1 5615410.0
ikrommyd CernCluster-49   2/23 12:16      _      _      1      1 5615415.0
ikrommyd CernCluster-55   2/23 12:16      _      _      1      1 5615416.0
ikrommyd CernCluster-54   2/23 12:16      _      _      1      1 5615417.0
ikrommyd CernCluster-56   2/23 12:16      _      _      1      1 5615418.0
ikrommyd CernCluster-52   2/23 12:16      _      _      1      1 5615419.0
ikrommyd CernCluster-51   2/23 12:16      _      _      1      1 5615420.0
ikrommyd CernCluster-53   2/23 12:16      _      _      1      1 5615422.0
ikrommyd CernCluster-50   2/23 12:16      _      _      1      1 5615423.0
ikrommyd CernCluster-58   2/23 12:16      _      _      1      1 5615424.0
ikrommyd CernCluster-57   2/23 12:16      _      _      1      1 5615425.0
ikrommyd CernCluster-65   2/23 12:16      _      _      1      1 5615426.0
ikrommyd CernCluster-77   2/23 12:16      _      _      1      1 5615427.0
ikrommyd CernCluster-72   2/23 12:16      _      _      1      1 5615428.0
ikrommyd CernCluster-69   2/23 12:16      _      _      1      1 5615429.0
ikrommyd CernCluster-82   2/23 12:16      _      _      1      1 5615430.0
ikrommyd CernCluster-64   2/23 12:16      _      _      1      1 5615431.0
ikrommyd CernCluster-66   2/23 12:16      _      _      1      1 5615432.0
ikrommyd CernCluster-68   2/23 12:16      _      _      1      1 5615433.0
ikrommyd CernCluster-61   2/23 12:16      _      _      1      1 5615434.0
ikrommyd CernCluster-84   2/23 12:16      _      _      1      1 5615435.0
ikrommyd CernCluster-80   2/23 12:16      _      _      1      1 5615436.0
ikrommyd CernCluster-60   2/23 12:16      _      _      1      1 5615437.0
ikrommyd CernCluster-62   2/23 12:16      _      _      1      1 5615438.0
ikrommyd CernCluster-81   2/23 12:16      _      _      1      1 5615439.0
ikrommyd CernCluster-83   2/23 12:16      _      _      1      1 5615440.0
ikrommyd CernCluster-70   2/23 12:16      _      _      1      1 5615441.0
ikrommyd CernCluster-73   2/23 12:16      _      _      1      1 5615442.0
ikrommyd CernCluster-67   2/23 12:16      _      _      1      1 5615443.0
ikrommyd CernCluster-63   2/23 12:16      _      _      1      1 5615444.0
ikrommyd CernCluster-85   2/23 12:16      _      _      1      1 5615445.0
ikrommyd CernCluster-76   2/23 12:16      _      _      1      1 5615446.0
ikrommyd CernCluster-71   2/23 12:16      _      _      1      1 5615447.0
ikrommyd CernCluster-59   2/23 12:16      _      _      1      1 5615448.0
ikrommyd CernCluster-86   2/23 12:16      _      _      1      1 5615449.0
ikrommyd CernCluster-74   2/23 12:16      _      _      1      1 5615450.0
ikrommyd CernCluster-78   2/23 12:16      _      _      1      1 5615451.0
ikrommyd CernCluster-79   2/23 12:16      _      _      1      1 5615452.0
ikrommyd CernCluster-75   2/23 12:16      _      _      1      1 5615453.0
ikrommyd CernCluster-94   2/23 12:16      _      _      1      1 5615454.0
ikrommyd CernCluster-90   2/23 12:16      _      _      1      1 5615455.0
ikrommyd CernCluster-98   2/23 12:16      _      _      1      1 5615456.0
ikrommyd CernCluster-87   2/23 12:16      _      _      1      1 5615457.0
ikrommyd CernCluster-88   2/23 12:16      _      _      1      1 5615458.0
ikrommyd CernCluster-96   2/23 12:16      _      _      1      1 5615460.0
ikrommyd CernCluster-95   2/23 12:16      _      _      1      1 5615461.0
ikrommyd CernCluster-91   2/23 12:16      _      _      1      1 5615462.0
ikrommyd CernCluster-92   2/23 12:16      _      _      1      1 5615463.0
ikrommyd CernCluster-89   2/23 12:16      _      _      1      1 5615464.0
ikrommyd CernCluster-93   2/23 12:16      _      _      1      1 5615465.0
ikrommyd CernCluster-97   2/23 12:17      _      _      1      1 5615466.0
ikrommyd CernCluster-99   2/23 12:17      _      _      1      1 5615467.0

Total for query: 100 jobs; 0 completed, 0 removed, 51 idle, 49 running, 0 held, 0 suspended
Total for ikrommyd: 100 jobs; 0 completed, 0 removed, 51 idle, 49 running, 0 held, 0 suspended
Total for all users: 8363 jobs; 994 completed, 0 removed, 3796 idle, 3564 running, 9 held, 0 suspended

0 replies

valsdav · 2024-02-23T11:32:10Z

valsdav
Feb 23, 2024

Great that the image improves the workflow! About workers being slow to start I guess that is just the HTCondor queue.. not much to be done.

Asking for a shorter queue gives usually workers faster (like avoid "nextweek" queue).

0 replies

lgray · 2024-02-23T15:54:12Z

lgray
Feb 23, 2024
Maintainer

There's nothing stopping you from using the shorter queues (so long as it is not too short)... If you lose critical data dask will (try to) recalculate it.

We should figure out if the cluster admins would be ok with a more rapid-acquisition, shortlived queue. Even "longlunch" takes some time to spin up workers.

0 replies

ikrommyd · 2024-02-23T15:59:16Z

ikrommyd
Feb 23, 2024
Maintainer Author

What about docker/apptainer images by the way?. Should the default coffea image have dask-lxplus installed or should this be a separate image?

0 replies

oshadura · 2024-02-23T16:02:43Z

oshadura
Feb 23, 2024

We are moving images here https://github.com/CoffeaTeam/af-images (for for old coffea and new coffea) and we can add a separate image for dask-lxplus as well as it will be done for other facilities.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of running coffea 202x on lxplus #1025

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Status of running coffea 202x on lxplus #1025

ikrommyd Feb 2, 2024 Maintainer

Replies: 6 comments

ikrommyd Feb 20, 2024 Maintainer Author

ikrommyd Feb 23, 2024 Maintainer Author

valsdav Feb 23, 2024

lgray Feb 23, 2024 Maintainer

ikrommyd Feb 23, 2024 Maintainer Author

oshadura Feb 23, 2024

ikrommyd
Feb 2, 2024
Maintainer

ikrommyd
Feb 20, 2024
Maintainer Author

ikrommyd
Feb 23, 2024
Maintainer Author

valsdav
Feb 23, 2024

lgray
Feb 23, 2024
Maintainer

ikrommyd
Feb 23, 2024
Maintainer Author

oshadura
Feb 23, 2024