Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Windows support #1513

Open
zcbenz opened this issue Oct 22, 2024 · 12 comments
Open

[Feature] Windows support #1513

zcbenz opened this issue Oct 22, 2024 · 12 comments

Comments

@zcbenz
Copy link
Contributor

zcbenz commented Oct 22, 2024

I tried building MLX on Windows, of course it didn't build but I think it is not very hard to make it work there.

A few problems I met so far:

  • CMake can not find OpenBLAS automatically on Windows, I had to hardcode the path:
    -DBLAS_LIBRARIES=openblas64/lib/libopenblas.a -DLAPACK_LIBRARIES=openblas64/lib/libopenblas.a -DBLAS_INCLUDE_DIRS=openblas64/include -DLAPACK_INCLUDE_DIRS=openblas64/include
  • MSVC has no _Complex but lapack_complex_float is defined to it.
  • gguf-tools does not support Windows but we can turn off it with -DMLX_BUILD_GGUF=OFF.
  • io/load.h uses unix system calls for IO, I can make it work on Windows by replacing them with stdio.h equivalents.
  • The make_compiled_preamble.sh needs a Windows version.
  • mx.compile is going to need many works to find and call MSVC.

And there will likely be much more problems since compilations are mostly blocked by io/load.h.

I don't really like Windows nor enjoy working on it, but the majority of world is using it and I figured out whatever I'm going to work on I would need Windows support to reach wider audience. What do you think?

@zcbenz
Copy link
Contributor Author

zcbenz commented Oct 25, 2024

I found someone already did some work on this: https://github.com/lighttransport/mlx/commits/lte/

@awni
Copy link
Member

awni commented Dec 9, 2024

Thanks for all the PRs to get this working. I wonder what you think about a good long-term strategy for supporting Windows.

We don't have plans to test or optimize regularly for Windows machines. So that might make maintenance a little tricky -> compilation could inadvertently break, performance might be slow, etc.

Luckily so far many of the PRs to get windows support have been quite simple (and in some cases improve the quality of the code) so it's not been a difficult decision to merge them. But I'm also wondering about what the right strategy is for PRs which introduce more complexity?e

I don't have answers to the above questions.. but maybe a good thing to discuss a bit more while you are in the process of experimenting on Windows.

@zcbenz
Copy link
Contributor Author

zcbenz commented Dec 10, 2024

I'll add Windows build to node-mlx, so at least the C++ interface will be tested routinely, which is how the x64 mac build is being tested.

It is fine if an unsupported build breaks in master and relies on contributors for fixes, most large open source projects I worked with work this way. The key is to leave good comments for the untested code path so maintainers know whether to leave them be or just remove them.

For the Windows build of MLX, most changes are already there, with current PRs you can already build a working python extension. Some tests are crashing and need fixes, but I think most of them will not be Windows-specific. The only large change to come is support for mx.compile, which I may not do this time. So you can already evaluate the maintenance burden for Windows support, which I think is quite light.

@awni
Copy link
Member

awni commented Dec 10, 2024

That all sounds like a good plan. Thanks for your input.

with current PRs you can already build a working python extension

Very cool!

Some tests are crashing and need fixes, but I think most of them will not be Windows-specific.

Indeed any underlying bugs exposed in the new compiler/windows we will definitely want to fix.

@zcbenz
Copy link
Contributor Author

zcbenz commented Dec 14, 2024

With above PRs the C++ tests are passing on Windows, and python tests only have 2 failures left:

======================================================================
FAIL: test_buffer_protocol (test_array.TestArray.test_buffer_protocol)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cygwin\home\zcbenz\codes\mlx\python\tests\test_array.py", line 1566, in
test_buffer_protocol
    self.assertEqual(
AssertionError: 'I' != 'L'
- I
+ L
 : mlx.core.uint32<class 'numpy.uint32'>

======================================================================
FAIL: test_multivariate_normal (test_random.TestRandom.test_multivariate_normal)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cygwin\home\zcbenz\codes\mlx\python\tests\test_random.py", line 200, in
test_multivariate_normal
    check_jointly_gaussian(data, mean, cov)
  File "C:\cygwin\home\zcbenz\codes\mlx\python\tests\test_random.py", line 177, in
check_jointly_gaussian
    self.assertTrue(
AssertionError: array(False, dtype=bool) is not true

@stemann
Copy link
Contributor

stemann commented Dec 14, 2024

Please take care not to break MinGW in the process of focusing on MSVC.

@stemann
Copy link
Contributor

stemann commented Dec 15, 2024

@zcbenz If you manage to solve an issue (or stumble on the reason) that caused MLX on Windows hanging on something utterly simple (Stream construction followed by destruction), it would be utterly appreciated :-)
MLX.jl/test/stream_tests.jl#L4-L8 @ 0771ac
Hang in run: https://github.com/stemann/MLX.jl/actions/runs/12342585758/job/34442589815#step:6:100
Hang in other run(where I let it run much longer): https://github.com/stemann/MLX.jl/actions/runs/12342297430/job/34441970336

@zcbenz
Copy link
Contributor Author

zcbenz commented Dec 16, 2024

@stemann Unfortunately I'm not familiar with neither MinGW nor Julia so I can't help directly. Does the C++ tests of MLX pass with MinGW? If not you should probably focus on making the C++ tests pass first, and then try to create a C++ tests case that reproduces the hang, which I can look into.

@stemann
Copy link
Contributor

stemann commented Dec 16, 2024

Sure - thanks for the quick reply. I’m going to park Windows as broken for the Julia API for the time being.

It was also on v0.21.0, so maybe v0.21.1+ will magically make it not hang 🤞😊

@stemann
Copy link
Contributor

stemann commented Dec 16, 2024

Validating build on mingw32, and the full plethora of Julia platforms in JuliaPackaging/Yggdrasil#10019 : Windows build is currently broken (due to #1690).

@stemann
Copy link
Contributor

stemann commented Dec 18, 2024

If anyone fancies taking the (libblastrampoline) MinGW tests and/or examples for a spin, there's a build of the recent main branch available for download from https://buildkite.com/julialang/yggdrasil/builds/15678#0193d95c-e190-48eb-b664-f0e384591eb5:

EDIT (2024-12-18 18:34 UTC, below this line):

Likely dependencies:

tutorial.exe, distributed.exe hangs:

~/Downloads/mlx/julia-1.11.2/bin$ ls -la ../../bin/
total 16724
drwx------+ 1 Administrator None        0 Dec 18 19:25 .
drwxr-xr-x+ 1 Administrator None        0 Dec 18 19:20 ..
-rwx------+ 1 Administrator None   134288 Jan  1  1970 distributed.exe
-rwx------+ 1 Administrator None 13080004 Jan  1  1970 libmlx.dll
-rwx------+ 1 Administrator None   177271 Jan  1  1970 linear_regression.exe
-rwx------+ 1 Administrator None   177768 Jan  1  1970 logistic_regression.exe
-rwx------+ 1 Administrator None   133560 Jan  1  1970 metal_capture.exe
-rwx------+ 1 Administrator None  3219625 Jan  1  1970 tests.exe
-rwx------+ 1 Administrator None   180219 Jan  1  1970 tutorial.exe

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/tests.exe
[doctest] doctest version is "2.4.11"
[doctest] run with "--help" for options
Segmentation fault

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/tutorial.exe
array([[1, 1],
       [1, 1]], dtype=float32)

# HANG

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/linear_regression.exe
Segmentation fault

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/logistic_regression.exe
Segmentation fault

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/metal_capture.exe
terminate called after throwing an instance of 'std::invalid_argument'
  what():  [new_stream] Cannot make gpu stream without gpu backend.

# HANG

~/Downloads/mlx/julia-1.11.2/bin $ ../../bin/distributed.exe
No communication backend found

# HANG

@stemann
Copy link
Contributor

stemann commented Dec 18, 2024

I have opened #1720 for tracking the hang on MinGW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants