-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll_tuned_dynamic_rules_filename option #8157
Comments
Yea, both these features are fully supported. Let me first talk about the second one, the configuration file passed through the
The first approach, via |
@bosilca Thank you very much for fast reply! Your examples are very useful. As far as I understand, collective operation IDs are not defined in coll_tuned.h explicitly, but they match the sequence number as they defined in this file, e.g. Allgather has ID 0, Barries has ID 5, etc. Also, in your example with Alltoall operation:
Is it a mistake? comment should look like
|
You're right we moved the collective id in base/coll_base_functions.h. For the comments they might indeed not be very accurate, I play with the files and missed to update the comments. I'll fix them in my answer. |
@miharulidze all algorithms must specify a rule for message size of zero (https://github.com/open-mpi/ompi/blob/master/ompi/mca/coll/tuned/coll_tuned_dynamic_file.c#L200). Otherwise, coll/tuned will switch to fixed rules. |
@bosilca , @mkurnosov Thank you for support! Maybe it's a good idea to add some sort of generic template for such rules file to documentation?
|
@miharulidze @bosilca I suggest to add a list of the algorithms.
|
Based on prior experiences we are not really good at investing time in maintaining the documentation. Instead of listing the algorithms themselves I would add text explaining how a user can list all algorithms for each collective using ompi_info. |
Not sure if it's just me or actually a bug in the file processing part of the code. I'm trying to play with self-defined dynamic rules for Scatter, because the current fixed decision has the following logic,
I'm simply trying to raise the switch point from 512B to 8192B. So I have the following definition file.
However when I try to run it, looks to me that only the first character of the message size was parsed, i.e., 8192->8. For example,
I did multiple tries and they all showed the same behavior. Am I missing something here? |
Comm size is 64 in your file, so it shouldn’t be using your tuning vars.
|
@wckzhang Well, apparently it is using the rules if you look at the osu_scatter output I attached, just changing the switch point from 512B to 8B, instead of 8192B I intended to use. Also my understanding of the comm size is same as message size, which should be interpreted as "anything larger than comm size 64". I could be wrong though since I have not read the code. BTW, forgot to mention, I also played with multiple of comm sizes and no effect to above behavior. |
Yeah, anything larger than comm size 64 would be using the message size. What comm size are you using? I’ve never seen the dynamic tuning detect the wrong number. The parsing code is fairly sensitive and if you have slight formatting errors it will disregard the file completely. The function for parsing config files is - ompi_coll_tuned_read_rules_config_file – you can fairly easily add some prints/turn on logging in this function to check if the file is being read properly. My best guess is that the file is being parsed incorrectly. I don’t know if the formatting is lost in e-mail, but the newline and # code (ompi_coll_base_file_getnext_long and skiptonewline) is a bit finicky. I really don’t see how - rc = fscanf(fptr, "%li", val); - could read 8192 as 8.
William
|
My comm size is 1280 for above testing.
This is exactly as I thought as well. If it is dropped completely then I know my format is wrong so I can fix it. But if you take a look at the result I posted, apparently it worked for 2 message size regions that I set. Just the second region started from 8B instead of 8192B. |
I played a bit more with this and looks like my earlier comment on parsing the first character of the message size was wrong. It appears to behave like this because my comm size is 1280 so that was just a coincidence. The actual config file that gives me what I need looks like below,
The second range 10485760 is calculated from (8192 * 1280) with 8192 being the message range I'd like it to start from and 1280 is the comm size. Why so? I don't know because that still looks like a bug to me. My understanding is this value is comm size agnostic. |
Ah...you're hitting this issue. So message size has a vague definition and I brought up an issue about this at one point on the discrepancies, let me see if I can find the issue. |
See: #7672 |
I don't really like where we're at with the message sizes, but there's a table in that issue you can refer to for correct sizing. |
There's also an issue in the collectives-tuning repo - open-mpi/ompi-collectives-tuning#24 |
Oh the table is a little outdated now since I revised scatter and gather to use datatype size * com size * s count |
So this explains it. But I have to say that it is very counter-intuitive and inconsistent with existing fixed decision rules. For example, take a look at the code snippet that I posted for scatter.
If I just want to do similar with dynamic rule file, but simply change 512 to 8192 so that I can achieve something like: for all comm size that is >= 64, when total_dsize < 8192 use algo 2 and algo 3 for the rest. How to achieve that? With the current restriction of the file format, do we have to list all comm size and calculate the msg size for that? |
Yeah I also think it's counter-intuitive and inconsistent since com size is already taken into account in the tuning file, why does it need to be taken into account again? Unfortunately there isn't a way to do that with the dynamic code today. @bosilca has major interests in this area, should we re-discuss the message size issue? |
Agreed. |
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Distribution tarball from open-mpi.org.
./configure --prefix=$(pwd)/build --with-ucx=/path-to-ucx-installation/ --enable-orterun-prefix-by-default
Please describe the system on which you are running
Details of the problem
Dear OpenMPI developers,
I'm trying to provide tuned selection of collective algorithms for
tuned
component.It seems like there are two ways:
--mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_<COLL NAME>_algorithm <ALGORITHM ID>
. This method works fine and I notice the big difference between algorithms while running OSU benchmarks. At the same time, this method not allows me to do fine-grained tuning, like specifying a communicator size, message thresholds, etc.--mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_dynamic_rules_filename <PATH TO ALGORITHM RULES>
. This paper (actually, this the only example of rules file I found in Google) shows an example of Alltoall algorithm selection tuning for different use cases. I also done some experiments with Alltoall, but it showed no difference between several algorithms/thresholds at all.Here is my questions:
I'll be grateful for any help.
Thank you in advance!
The text was updated successfully, but these errors were encountered: