Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code refactor for mscclang #3

Merged
merged 76 commits into from
May 7, 2024
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
58b7d9f
integration branch
Binyang2014 Mar 20, 2024
e039484
WIP
Binyang2014 Mar 20, 2024
3527e32
fix
Binyang2014 Mar 22, 2024
24c0863
WIP
Binyang2014 Mar 23, 2024
c32229d
WIP
Binyang2014 Mar 23, 2024
4929ef6
WIP need algo
Binyang2014 Mar 23, 2024
787645b
WIP
Binyang2014 Mar 24, 2024
0448b1d
WIP
Binyang2014 Mar 24, 2024
e23dc18
WIP need fuse
Binyang2014 Mar 24, 2024
2e08484
WIP
Binyang2014 Mar 24, 2024
e634e6f
WIP
Binyang2014 Mar 25, 2024
f8fe329
need more fuse
Binyang2014 Mar 25, 2024
b4c08c9
WIP
Binyang2014 Mar 25, 2024
e406632
WIP
Binyang2014 Mar 26, 2024
3321c5f
WIP
Binyang2014 Mar 26, 2024
7e4bd8b
WIP
Binyang2014 Mar 26, 2024
7074c01
WIP
Binyang2014 Mar 26, 2024
f31d9b4
Now for deps
Binyang2014 Mar 27, 2024
dc8d44e
let make instance work
Binyang2014 Mar 27, 2024
085be4a
enable instance
Binyang2014 Mar 28, 2024
e613558
fix
Binyang2014 Mar 28, 2024
79f450a
update ignore
Binyang2014 Mar 28, 2024
c4a10dd
bug fix
Binyang2014 Mar 29, 2024
ec4a112
update
Binyang2014 Apr 2, 2024
82de232
update
Binyang2014 Apr 2, 2024
171e894
fix
Binyang2014 Apr 5, 2024
99ff31c
WIP
Binyang2014 Apr 7, 2024
93683b5
WIP
Binyang2014 Apr 8, 2024
10b648c
WIP
Binyang2014 Apr 8, 2024
52fd030
update
Binyang2014 Apr 8, 2024
7dd76b6
WIP
Binyang2014 Apr 8, 2024
451f31d
WIP
Binyang2014 Apr 8, 2024
b2ceb13
WIP
Binyang2014 Apr 8, 2024
b683d7f
update
Binyang2014 Apr 8, 2024
3cf049e
WIP
Binyang2014 Apr 8, 2024
b1fd952
Done for today
Binyang2014 Apr 8, 2024
a4728fa
update packet algo
Binyang2014 Apr 19, 2024
42c4a7d
fix comments
Binyang2014 Apr 19, 2024
b494b75
OPT
Binyang2014 Apr 19, 2024
c2bd38f
Fix
Binyang2014 Apr 22, 2024
bb3aebe
WIP
Binyang2014 Apr 24, 2024
40217f9
WIP
Binyang2014 Apr 24, 2024
2e5bac6
WIP
Binyang2014 Apr 24, 2024
eb4f612
fix
Binyang2014 Apr 24, 2024
2617280
fix
Binyang2014 Apr 24, 2024
66d0495
fix
Binyang2014 Apr 24, 2024
8cfcc30
Fix UT
Binyang2014 Apr 25, 2024
2ba1760
update
Binyang2014 Apr 25, 2024
52783e9
WIP
Binyang2014 Apr 25, 2024
01a0745
WIP
Binyang2014 Apr 25, 2024
13e902a
update
Binyang2014 Apr 25, 2024
e52cabf
revert
Binyang2014 Apr 29, 2024
8f46276
revert
Binyang2014 Apr 29, 2024
775d9d6
fix
Binyang2014 Apr 29, 2024
8a58c84
WIP
Binyang2014 Apr 29, 2024
da282fb
WIP
Binyang2014 Apr 29, 2024
1f5114b
WIP
Binyang2014 Apr 29, 2024
5e978bf
WIP
Binyang2014 Apr 29, 2024
a90647e
WIP
Binyang2014 Apr 29, 2024
0e77d88
fix
Binyang2014 Apr 29, 2024
e05edcf
WIP
Binyang2014 Apr 29, 2024
54463bf
WIP
Binyang2014 Apr 29, 2024
ca3f10a
WIP
Binyang2014 Apr 29, 2024
b9c9734
add back
Binyang2014 Apr 29, 2024
a3b9745
update
Binyang2014 Apr 29, 2024
d81936f
Fix
Binyang2014 Apr 30, 2024
2651faf
revert
Binyang2014 Apr 30, 2024
bb7a584
address comments
Binyang2014 May 6, 2024
d072a34
WIP
Binyang2014 May 6, 2024
dda74f9
Fix
Binyang2014 May 6, 2024
f47dfe9
Fix
Binyang2014 May 6, 2024
daef76a
WIP
Binyang2014 May 6, 2024
3d2d838
WIP
Binyang2014 May 6, 2024
5a34cd5
WIP
Binyang2014 May 6, 2024
02b5de9
WIP
Binyang2014 May 6, 2024
06d7776
done
Binyang2014 May 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Initialize CodeQL
uses: github/codeql-action/init@v1
uses: github/codeql-action/init@v3
with:
languages: python

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1
uses: github/codeql-action/analyze@v3
47 changes: 44 additions & 3 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ jobs:

strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
python-version: ['3.8', '3.9', '3.10']

name: Test with Python ${{ matrix.python-version }}

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install msccl and dependencies
Expand All @@ -28,3 +28,44 @@ jobs:
- name: Run tests and check at least 90% coverage
run: |
pytest

compare_outputs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
name: Compare outputs with Python ${{ matrix.python-version }}

steps:
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Checkout current branch
uses: actions/checkout@v4
- name: Install msccl and dependencies
Binyang2014 marked this conversation as resolved.
Show resolved Hide resolved
run: |
pip install --upgrade pip
pip install -r requirements.txt
- name: Copy test script/config to temp directory
run: |
cp tests/generate_example_results.py $RUNNER_TEMP/
cp tests/configs/example-config.json $RUNNER_TEMP/
- name: generate outputs
run: |
python $RUNNER_TEMP/generate_example_results.py examples/mscclang/ $RUNNER_TEMP/example-config.json $RUNNER_TEMP/tests/pr-outputs/
- name: Checkout main branch
Binyang2014 marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/checkout@v4
with:
ref: main
- name: Install msccl and dependencies
Binyang2014 marked this conversation as resolved.
Show resolved Hide resolved
run: |
pip install --upgrade pip
pip install -r requirements.txt
- name: generate outputs
run: |
python $RUNNER_TEMP/generate_example_results.py examples/mscclang/ $RUNNER_TEMP/example-config.json $RUNNER_TEMP/tests/main-outputs/
- name: Compare outputs
run: |
diff -rw $RUNNER_TEMP/tests/main-outputs/ $RUNNER_TEMP/tests/pr-outputs/

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,6 @@ dmypy.json

# Pyre type checker
.pyre/

# vscode
.vscode/
4 changes: 2 additions & 2 deletions msccl/autosynth/ndv4_plans.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from msccl.programs.alltoall_a100_yifan import alltoall_hierarchical
from msccl.programs.alltoall_a100_8kp1 import alltoall_three_step
from msccl.topologies import fully_connected
from msccl.language.ir import ThreadblockPolicy
from msccl.language.types import ThreadblockPolicy

def register_ndv4_plans():

Expand Down Expand Up @@ -47,4 +47,4 @@ def ndv4_alltoall_three_step(prog, nodes):
def ndv4_alltoall_hierarchical_config2(prog, nodes):
alltoall_hierarchical(num_nodes=nodes, gpus_per_node=8)


8 changes: 4 additions & 4 deletions msccl/autosynth/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import humanfriendly

from msccl.language import MSCCLProgram, ir_to_xml
from msccl.language.ir import ThreadblockPolicy
from msccl.language.types import ThreadblockPolicy
import msccl.language.collectives as lang_collectives
from msccl.topologies import distributed_fully_connected

Expand Down Expand Up @@ -62,7 +62,7 @@ def wrapped(machines):
return decorator


def register_msccl_program(local_topology, collective, machine_type, machines=lambda x: True, sizes=None, protocol='Simple',
def register_msccl_program(local_topology, collective, machine_type, machines=lambda x: True, sizes=None, protocol='Simple',
chunk_factor=1, priority=0, collective_obj=None, instances=1, inplace=False, threadblock_policy=ThreadblockPolicy.auto,
interleaved_replication=True, dependence_nop=False):
def decorator(fun):
Expand All @@ -81,7 +81,7 @@ def wrapped(machines):
co = lang_collectives.ReduceScatter(topology.num_nodes(), chunk_factor, inplace)
else:
raise RuntimeError(f'No collective_obj in msccl.language.collectives known for "{collective}"')
prog = MSCCLProgram(name, topology, co, instances, protocol, threadblock_policy=threadblock_policy,
prog = MSCCLProgram(name, topology, co, instances, protocol, threadblock_policy=threadblock_policy,
interleaved_replication=interleaved_replication, dependence_nop=dependence_nop)
with prog:
fun(prog, machines)
Expand All @@ -96,4 +96,4 @@ def wrapped(machines):
machine_type, machines, sizes, protocol, priority)
# Return the original function to not break other usage
return fun
return decorator
return decorator
Loading
Loading