Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdma-snax integration #199

Merged
merged 31 commits into from
Jul 23, 2024
Merged

xdma-snax integration #199

merged 31 commits into from
Jul 23, 2024

Conversation

IveanEx
Copy link

@IveanEx IveanEx commented Jul 22, 2024

This PR enables the availability of the xdma on snax cluster, which will be the next generation distributed DMA focusing on intra-cluster data movement, inter-cluster data movement, and data movement between any cluster and global memory.

The function of it includes:

  1. The data fetch unit that tries to fetch data from TCDM in 8B granularity
  2. The datapath that, if one of the channels to TCDM is blocked by contention, other channels can still fetch the data It maximizes the bus utilization.
  3. The address generation unit that supports n-dimensional affine space fetching
  4. The xdma datapath extension interface allows users to plug in any element-wise processing unit, enabling computaion-on-copy capability.
  5. Designed in Chisel, with configurable data extension, bit width, and many more.

At present, only the intra-cluster capacity is enabled. Three extensions are designed as the reference of xdma datapath extension, including:

  1. Memset: Setting each byte in the given memory region to arbitrary value very fast, in one xdma request.
  2. Transposer: Transpose the 8x8 INT8 matrix on copy.
  3. MaxPool SIMD

The runtime is also provided to control xdma by standard snax CSR interface, with C headers automatically generated with the generation of hardware.

Copy link

@rgantonio rgantonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suuuupper good work @IveanEx !

I bet you work so hard on this. I like that we have this interesting xDMA already and it's a good work in progress!

I have things that need to be critical about though:

  1. The PR is huge haha! It's too big honestly and very hard to read through carefully. Remember, since SNAX is something that is shared amongst our team, it is best to make "byte-sized" (get the pun? 😆 ) PRs. I will strongly recommend to keep it to less than a thousand lines. What are ways to make it small? Separate the hardware build and the software tests in separate PRs! That would be a great fix already!
  2. I am a bit iffy about the differences between the streamers and the xDMA. They have the same units like the AGUs and data movers. Is there no way that they could be re-used or placed in the common cells section? You have a CommonCells directory inside the xDMA but then there are also re-usable cells in others. Maybe better to have all common cells at some directory that can be used through all! It now feels like we are not embracing the "reusability" of these modules.
  3. I guess it's for testing purposes but are we really making the xDMA like a separate accelerator? I was thinking it should be inside the cluster too! My worry with this setup is that there will be a HUGE clean-up afterwards. Which I fear may be prone to more work 😅

There are nice things I like though! Like the smart case-statements for the reading and writing for the CSRs. I hope the compiler is smart enough just to route the values! I would really love to have a separate PR actually fix that 😄

GOOD JOB!

I won't approve yet as we need to discuss more. The next step after this is to create documentation for this! Don't under estimate it. Lack of sufficient documentation is a problem in open source stuff since we're always rushing to implement things but never taking the time to make it easier for other people to understand...

I also suggest to ask @xiaoling-yi's opinion here since there is much similarity in your modules.

hw/chisel_acc/.gitignore Outdated Show resolved Hide resolved
Comment on lines +329 to +346
gen_chisel_file(
chisel_path=args.chisel_path,
chisel_param="snax.xdma.xdmaTop.xdmaTopGen",
gen_path=" --clusterName " + str(cfg["cluster"]["name"]) +
" --tcdmDataWidth " + str(cfg["cluster"]["data_width"]) +
" --axiDataWidth " + str(cfg["cluster"]["dma_data_width"]) +
" --addressWidth " + str(cfg["cluster"]["addr_width"]) +
" --tcdmSize " + str(cfg["cluster"]["tcdm"]["size"]) +
" --readerDimension " + str(snax_xdma_cfg["reader_agu_dimension"]) +
" --writerDimension " + str(snax_xdma_cfg["writer_agu_dimension"]) +
" --readerBufferDepth " + str(snax_xdma_cfg["reader_buffer"]) +
" --writerBufferDepth " + str(snax_xdma_cfg["writer_buffer"]) +
(" --HasMemset " if snax_xdma_cfg["has_memset"] else "") +
(" --HasMaxPool " if snax_xdma_cfg["has_maxpool"] else "") +
(" --HasTransposer " if snax_xdma_cfg["has_transposer"] else "") +
" --target-dir " + args.gen_path +
cfg["cluster"]["name"] + "_xdma/"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is smart too because rather than having a scala parameter generated, it can be this one. However, this implies it is very custom to the xDMA only. Not something that is quite "generalized" like the accelerators. I guess it is fine for now since the intent is to have the xDMA a SNAX thing. But what about the different extensions? That implies that people have to (1) change this wrapper gen to add another extension, (2) modify the scala files for the extension, and (3) add the configurations too.

The SNAX accelerators just need the configuration files since they are "generalized" already and do not need too many modifications. Even the accelerator wrapper names are all centralized in the configuration files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. it needs to be optimized. It should be done in a seperate PR, I think it is better to rewrite gen_chisel_file function.

@jorendumoulin
Copy link

I guess it's for testing purposes but are we really making the xDMA like a separate accelerator? I was thinking it should be inside the cluster too! My worry with this setup is that there will be a HUGE clean-up afterwards. Which I fear may be prone to more work 😅

I'm not completely in the loop but I agree with this sentiment! There are starting to be a lot of variations on the system with the gemm accelerator, and this would add an extra one. Can we not include this xDMA in every snax cluster from now on and replace the old DMA entirely? Or are there still some disadvantages?

There would just need to be a new definition of snitch_dma_1d and snitch_dma_2d etc.., but then all of the existing tests with the dma engine will now also test your new one, so you don't have to write tests yourself 😄

@IveanEx
Copy link
Author

IveanEx commented Jul 23, 2024

I guess it's for testing purposes but are we really making the xDMA like a separate accelerator? I was thinking it should be inside the cluster too! My worry with this setup is that there will be a HUGE clean-up afterwards. Which I fear may be prone to more work 😅

I'm not completely in the loop but I agree with this sentiment! There are starting to be a lot of variations to the system with the gemm accelerator, and this would add an extra one. Can we not include this xDMA in every snax cluster from now on and replace the old DMA entirely? Or are there still some disadvantages?

There would just need to be a new definition of snitch_dma_1d and snitch_dma_2d etc.., but then all of the existing tests with the dma engine will now also test your new one, so you don't have to write tests yourself 😄

It cannot replace the original idma at present, as it does not have the function to move the data from / to DRAM yet.

Yes, like any other accelerator, it can be disabled, and which extension should be included is completely configurable.

I also agree that now there are too many versions in snax... I plan to clean the cfg file and merge many things with @xiaoling-yi's contribution. But should happen when she come back 😊😊

@rgantonio
Copy link

@xiaoling-yi we will push this since it's a WIP that is being cut into multiple PRs anyway. So kindly take a look when you have the time. If you see something, tell @IveanEx !

Thankkkss~

Copy link

@rgantonio rgantonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!!! ❤️ ❤️ ❤️ ❤️ ❤️

Can you make the issue list of things to fix? Add the - [ ] check boxes 😄 Thankkkss!

@IveanEx IveanEx merged commit b933fce into main Jul 23, 2024
27 checks passed
@IveanEx IveanEx deleted the fkyd/xdma_snax_integration branch July 23, 2024 13:11
@jorendumoulin
Copy link

It cannot replace the original idma at present, as it does not have the function to move the data from / to DRAM yet.

Is it the plan to do this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary anymore. All ci use this chiselgen to generate the code, so it is naturally tested.

Furthermore, this is the source of error that reandomly fails.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!

@IveanEx IveanEx linked an issue Jul 23, 2024 that may be closed by this pull request
7 tasks
@IveanEx IveanEx removed a link to an issue Jul 23, 2024
7 tasks
@IveanEx
Copy link
Author

IveanEx commented Jul 23, 2024

It cannot replace the original idma at present, as it does not have the function to move the data from / to DRAM yet.

Is it the plan to do this?

Yes, it will be done in the future. But more discussion should be made on improvement / change of AXI bus.

@IveanEx IveanEx changed the title Fkyd/xdma snax integration xdma-snax integration Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants