Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ImportVerilog] add stream concat operation #7784

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

chenbo-again
Copy link

@chenbo-again chenbo-again commented Nov 8, 2024

@fabianschuiki, @hailongSun2000
Add moore.stream_concat operation for streaming like concat, it's currently a draft version and call for comments. for lvalue, {1 << {i32, i32, i32}} will be treated like

(!moore.ref<i32>, !moore.ref<i32>, !moore.ref<i32> 0) -> <stream_concat<{!moore.i32, !moore.i32, !moore.i32}, 0>>

and it will be process in the core dialect.

Because the stream concat operation is so flexible, various datatype is acceptable as input, so I just Postponed the actual analysis. And stream concat oepration just be treated like a conversion from input to a packed type.

@fabianschuiki
Copy link
Contributor

Hey @chenbo-again, thanks a lot for working on this! 🥳 I'm wondering if we need a dedicated type for the streams. Do you know whether SystemVerilog has a specific type that it returns for streaming operations such as {<<{...}}? If you can store a stream in a variable somewhere (as a stream type), we probably have to be able to represent streams as separate types and values. But if that's not possible, we might be able to fully handle streams inside of ImportVerilog without creating special streaming operations in the IR. It would be interesting to find out what all the different streaming operations are, and if they are just syntactic sugar for bit slices and concats, or if there's more to it.

@chenbo-again
Copy link
Author

I thought that before but have no answer, because streaming concat operation can be operated on bit-stream type(described in IEEE 1800-2017), dynamically sized array can be included.

Both source_t and dest_t can include one or more dynamically sized data in any position (for example, a structure containing a dynamic array followed by a queue of bytes). If the source type, source_t, includes dynamically sized variables, they are all included in the bit stream. If the destination type, dest_t, includes unbounded dynamically sized types, the conversion process is greedy: compute the size of the source_t, subtract the size of the fixed-size data items in the destination, and then adjust the size of the first dynamically sized item in the destination to the remaining size; any remaining dynamically sized items are left empty.
IEEE 1800-2017 6.24.3

And @hailongSun2000 proposed that we can distinguish sized one and unsized one, and maybe we don't have any choice to deal all the things in ImportVerilog stage.

@hailongSun2000
Copy link
Member

hailongSun2000 commented Nov 11, 2024

Hey, @chenbo-again! Thanks for your work on the streaming operator. Presently, we have a dedicated op(moore.dyn_extract) to slice a value with a dynamic low bit. So maybe we can use moore.(dyn_)extract(_ref) and moore.concat(_ref) to represent moore.streaming_concat. WDYT?

@chenbo-again
Copy link
Author

chenbo-again commented Nov 11, 2024

Hey, @chenbo-again! Thanks for your work on the streaming operator. Presently, we have a dedicated op(moore.dyn_extract) to slice a value with a dynamic low bit. So maybe we can use moore.(dyn_)extract(_ref) and moore.concat(_ref) to represent moore.streaming_concat. WDYT?

Hi, @hailongSun2000 .
I tried it, but I'm afraid it is not suitable here, we need a dynamic sliceWidth extract here, it seems like a unsythesisable expression and not present in moore for now.

@hailongSun2000
Copy link
Member

Here are some cases:

module Foo();
  int a = 1;
  int b = 2;
  int c = 3;
  bit [127:0] d;
  int size1 = 32;
  const int size2 = 32;
  parameter size3 = 32;
  initial begin
    d = {<<int{a, b, c}};           // Ok, slice size = 32
    d = {<<logic[3:0]{a, b, c}};    // Ok, slice size = 4
    d = {<<size3{a, b, c}};         // Ok, slice size = 32   parameter is a constant

    d = {<<size1{a, b, c}};         // Not, error: reference to non-constant variable 'size1' is not allowed in a constant expression
    d = {<<size2{a, b, c}};         // Not, as above
  end
endmodule

Slang will throw an error if slice size is not a constant value. So maybe we only need to use moore.extract and moore.extract_ref. The key point is how to determine the size of the dynamic array and queue.

@chenbo-again
Copy link
Author

Recently I pushed a commit try to implement streaming concat operation with concat & extract operation. But currently it's limited that the operand must be sized IntType. It's not easy to deal the cases which the operands is unsized, and I believe in the future we will find an answer and implement it.

for now, streaming concat assignment expression {<< byte {a, b, c}} = {<< byte {a, b, c}}; will emit moore ir like below:

      %0 = moore.extract_ref %a from 0 : <i32> -> <i8>
      %1 = moore.extract_ref %a from 8 : <i32> -> <i24>
      %2 = moore.extract_ref %1 from 0 : <i24> -> <i8>
      %3 = moore.extract_ref %1 from 8 : <i24> -> <i16>
      %4 = moore.extract_ref %3 from 0 : <i16> -> <i8>
      %5 = moore.extract_ref %3 from 8 : <i16> -> <i8>
      %6 = moore.extract_ref %b from 0 : <i32> -> <i8>
      %7 = moore.extract_ref %b from 8 : <i32> -> <i24>
      %8 = moore.extract_ref %7 from 0 : <i24> -> <i8>
      %9 = moore.extract_ref %7 from 8 : <i24> -> <i16>
      %10 = moore.extract_ref %9 from 0 : <i16> -> <i8>
      %11 = moore.extract_ref %9 from 8 : <i16> -> <i8>
      %12 = moore.extract_ref %c from 0 : <i32> -> <i8>
      %13 = moore.extract_ref %c from 8 : <i32> -> <i24>
      %14 = moore.extract_ref %13 from 0 : <i24> -> <i8>
      %15 = moore.extract_ref %13 from 8 : <i24> -> <i16>
      %16 = moore.extract_ref %15 from 0 : <i16> -> <i8>
      %17 = moore.extract_ref %15 from 8 : <i16> -> <i8>
      %18 = moore.concat_ref %17, %16, %14, %12, %11, %10, %8, %6, %5, %4, %2, %0 : (!moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>, !moore.ref<i8>) -> <i96>
      %19 = moore.read %a : <i32>
      %20 = moore.read %b : <i32>
      %21 = moore.read %c : <i32>
      %22 = moore.extract %19 from 0 : i32 -> i8
      %23 = moore.extract %19 from 8 : i32 -> i24
      %24 = moore.extract %23 from 0 : i24 -> i8
      %25 = moore.extract %23 from 8 : i24 -> i16
      %26 = moore.extract %25 from 0 : i16 -> i8
      %27 = moore.extract %25 from 8 : i16 -> i8
      %28 = moore.extract %20 from 0 : i32 -> i8
      %29 = moore.extract %20 from 8 : i32 -> i24
      %30 = moore.extract %29 from 0 : i24 -> i8
      %31 = moore.extract %29 from 8 : i24 -> i16
      %32 = moore.extract %31 from 0 : i16 -> i8
      %33 = moore.extract %31 from 8 : i16 -> i8
      %34 = moore.extract %21 from 0 : i32 -> i8
      %35 = moore.extract %21 from 8 : i32 -> i24
      %36 = moore.extract %35 from 0 : i24 -> i8
      %37 = moore.extract %35 from 8 : i24 -> i16
      %38 = moore.extract %37 from 0 : i16 -> i8
      %39 = moore.extract %37 from 8 : i16 -> i8
      %40 = moore.concat %39, %38, %36, %34, %33, %32, %30, %28, %27, %26, %24, %22 : (!moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8, !moore.i8) -> i96
      moore.blocking_assign %18, %40 : i96

maybe it's usable for many case for now.

@fabianschuiki
Copy link
Contributor

This looks nice! I like your idea of splitting this into a dynamically-sized and statically-sized problem. Only supporting streaming operators with statically-known sizes sounds like a great starting point. Especially since Moore doesn't support dynamic arrays yet. It might be easier to add streaming operator support for dynamic arrays after we add support for dynamic arrays themselves, because we've established all the available ops at that point. Very cool work!

@hailongSun2000
Copy link
Member

hailongSun2000 commented Nov 13, 2024

Your example is LGTM! However, could we reduce the moore.extract_ref amount? My idea is we can moore.concat(_ref) a, b, c, and we get a 96-bit result( assume it's the %0). Then, we can create

%1 = moore.extract(_ref) %0, 0    // [7:0]
%2 = moore.extract(_ref) %0, 8    // [15:8]
... 
%11 = moore.extrac(_ref) %0, 80  // [87:80]
%12 = moore.extrac(_ref) %0, 88  // [95:88]

finally, we can create moore.concat(_ref) %1, %2,..., %11, %12 to together them. WDYT @fabianschuiki , @chenbo-again 🤔?

@chenbo-again
Copy link
Author

thank you for your review! :)

lib/Conversion/ImportVerilog/Expressions.cpp Outdated Show resolved Hide resolved
Comment on lines 980 to 985
for (auto stream : expr.streams()) {
auto value = context.convertLvalueExpression(*stream.operand);
if (!value)
return {};
operands.push_back(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that each stream can also have a withExpr and constantWithWidth. Could you add error messages if either of these are set, and tell the user that we don't support those cases? It's important that we only succeed if we were able to process everything the user has type, and not silently ignore parts of the input.

Comment on lines 796 to 798
for (auto stream : expr.streams()) {
auto value = context.convertRvalueExpression(*stream.operand);
if (!value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that each stream can also have a withExpr and constantWithWidth. Could you add error messages if either of these are set, and tell the user that we don't support those cases? It's important that we only succeed if we were able to process everything the user has type, and not silently ignore parts of the input.

// CHECK: [[BYTE4:%.+]] = moore.extract [[CONCAT]] from 32 : l48 -> l8
// CHECK: [[BYTE5:%.+]] = moore.extract [[CONCAT]] from 40 : l48 -> l8
// CHECK: [[RESULT:%.+]] = moore.concat [[BYTE5]], [[BYTE4]], [[BYTE3]], [[BYTE2]], [[BYTE1]], [[BYTE0]] : (!moore.l8, !moore.l8, !moore.l8, !moore.l8, !moore.l8, !moore.l8) -> l48
vec_5 = {<<byte{vec_3, vec_4}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for checking different-sized variables here!

// CHECK: [[BYTE4_REF:%.+]] = moore.extract_ref [[CONCAT_REF]] from 32 : <l48> -> <l8>
// CHECK: [[BYTE5_REF:%.+]] = moore.extract_ref [[CONCAT_REF]] from 40 : <l48> -> <l8>
// CHECK: [[RESULT_REF:%.+]] = moore.concat_ref [[BYTE5_REF]], [[BYTE4_REF]], [[BYTE3_REF]], [[BYTE2_REF]], [[BYTE1_REF]], [[BYTE0_REF]] : (!moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>) -> <l48>
{<<byte{vec_3, vec_4}} = vec_5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few more test cases that check other ways of how streams can be described? For example, I think your implementation already supports nested streams. And there's also the possibility of having a stream go in the other direction >>. I also noticed that the spec allows for streams to specify a slice size instead of a type, so something like {<<16{...}}.

The SystemVerilog specification contains a few examples for streams which might be great to include here, or use for inspiration for the different kinds of streams that we should test. For example, in IEEE 1800-2017 section 11.4.14.2 "Re-ordering of the generic stream" I see the following test:

int j = { "A", "B", "C", "D" };
{ >> {j}} // generates stream "A" "B" "C" "D"
{ << byte {j}} // generates stream "D" "C" "B" "A" (little endian)
{ << 16 {j}} // generates stream "C" "D" "A" "B"
{ << { 8'b0011_0101 }} // generates stream 'b1010_1100 (bit reverse)
{ << 4 { 6'b11_0101 }} // generates stream 'b0101_11
{ >> 4 { 6'b11_0101 }} // generates stream 'b1101_01 (same)
{ << 2 { { << { 4'b1101 }} }} // generates stream 'b1110

Pretty sure you could remove the string constants "A", "B", "C", "D" and then just check whether you get the correct slices of the variable j with your implementation. Or replace the string with a easily-recognizable constant, like 32'h87654321.

In IEEE 1800-2017 section 11.4.14.3 "Streaming concatenation as an assignment target (unpack)" I see the following test:

int a, b, c;
logic [10:0] up [3:0];
logic [11:1] p1, p2, p3, p4;
bit [96:1] y = {>>{ a, b, c }}; // OK: pack a, b, c
int j = {>>{ a, b, c }}; // error: j is 32 bits < 96 bits
bit [99:0] d = {>>{ a, b, c }}; // OK: d is padded with 4 bits
{>>{ a, b, c }} = 23'b1; // error: too few bits in stream
{>>{ a, b, c }} = 96'b1; // OK: unpack a = 0, b = 0, c = 1
{>>{ a, b, c }} = 100'b11111; // OK: unpack a = 0, b = 0, c = 1; 96 MSBs unpacked, 4 LSBs truncated
{ >> {p1, p2, p3, p4}} = up; // OK: unpack p1 = up[3], p2 = up[2], p3 = up[1], p4 = up[0]

Variations of these tests might be good to have. You don't need to check for error reporting if the error is reported by Slang and not your code; Slang already contains tests for this and we don't need to re-check Slang. But for any error message you produce in your code, and every branch/early exit in your code, you'd want to have a test that checks whether things work as expected there 😃

lib/Conversion/ImportVerilog/Expressions.cpp Show resolved Hide resolved
Comment on lines 799 to 803
if (stream.constantWithWidth.has_value() || stream.withExpr) {
mlir::emitError(operandLoc) << "Moore only support streaming "
"concatenation without 'with operation'";
return {};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can invoke context.convertRvalueExpression() to handle withExpr if it's existing. And then ignore the same level variable.
For your case: (Only handle withExpr)

vec_1 = {<<byte{vec_0, arr with [7:0]}};

It's AST looks like

{
                              "operand": {
                                "kind": "NamedValue",
                                "type": "logic$[63:0]",
                                "symbol": "2199025903840 arr"
                              },
                              "withExpr": {
                                "kind": "RangeSelect",
                                "type": "logic$[7:0]",
                                "selectionKind": "Simple",
                                "value": {
                                  "kind": "NamedValue",
                                  "type": "logic$[63:0]",
                                  "symbol": "2199025903840 arr"
                                },
   ......

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, when the stream contains multiple variable-size data packets, or each data packet contains more than one variable-size data item, or the size of the data to be unpacked is stored in the middle of the stream, this mechanism can become cumbersome and error-prone. To overcome these problems, the unpack operation allows a with expression to explicitly specify the extent of a variable-size field within the unpack operation.

with expression is dedicated for dynamic stream operands, and it is only used for unpacked operands. however concat operation only supports packed sized operands in systemverilog, so currently we have no method to implement with expression.

By the way, I think to support full stream concat operation, the key is to create a conversion from any type to packed type, and then we can use packed value to do concat, extract and finally implement stream concat.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we have a powerful op moore.conversion can convert any type to any type, so I make a demo in 0e7f6bd, but currently I don't know how it works

    // CHECK: %[[TMP1:.*]] = moore.read %vec_3 : <l16>
    // CHECK: %[[TMP2:.*]] = moore.read %arr_1 : <uarray<64 x l1>>
    // CHECK: %[[TMP3:.*]] = moore.extract %[[TMP2]] from 0 : uarray<64 x l1> -> uarray<16 x l1>
    // CHECK: %[[TMP4:.*]] = moore.conversion %[[TMP3]] : !moore.uarray<16 x l1> -> !moore.l16
    // CHECK: %[[TMP5:.*]] = moore.concat %[[TMP1]], %[[TMP4]] : (!moore.l16, !moore.l16) -> l32
    // CHECK: %[[TMP6:.*]] = moore.extract %[[TMP5]] from 0 : l32 -> l8
    // CHECK: %[[TMP7:.*]] = moore.extract %[[TMP5]] from 8 : l32 -> l8
    // CHECK: %[[TMP8:.*]] = moore.extract %[[TMP5]] from 16 : l32 -> l8
    // CHECK: %[[TMP9:.*]] = moore.extract %[[TMP5]] from 24 : l32 -> l8
    // CHECK: %[[TMP10:.*]] = moore.concat %[[TMP6]], %[[TMP7]], %[[TMP8]], %[[TMP9]] : (!moore.l8, !moore.l8, !moore.l8, !moore.l8) -> l32
    // CHECK: moore.blocking_assign %vec_1, %[[TMP10]] : l32
    vec_1 = {<<byte{vec_3, arr_1 with [15:0]}};
    // CHECK: %[[TMP1:.*]] = moore.extract_ref %arr_1 from 0 : <uarray<64 x l1>> -> <uarray<16 x l1>>
    // CHECK: %[[TMP2:.*]] = moore.conversion %[[TMP1]] : !moore.ref<uarray<16 x l1>> -> !moore.ref<l16>
    // CHECK: %[[TMP3:.*]] = moore.concat_ref %vec_3, %[[TMP2]] : (!moore.ref<l16>, !moore.ref<l16>) -> <l32>
    // CHECK: %[[TMP4:.*]] = moore.extract_ref %[[TMP3]] from 0 : <l32> -> <l8>
    // CHECK: %[[TMP5:.*]] = moore.extract_ref %[[TMP3]] from 8 : <l32> -> <l8>
    // CHECK: %[[TMP6:.*]] = moore.extract_ref %[[TMP3]] from 16 : <l32> -> <l8>
    // CHECK: %[[TMP7:.*]] = moore.extract_ref %[[TMP3]] from 24 : <l32> -> <l8>
    // CHECK: %[[TMP8:.*]] = moore.concat_ref %[[TMP4]], %[[TMP5]], %[[TMP6]], %[[TMP7]] : (!moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>, !moore.ref<l8>) -> <l32>
    // CHECK: %[[TMP9:.*]] = moore.read %vec_1 : <l32>
    // CHECK: moore.blocking_assign %[[TMP8]], %[[TMP9]] : l32
    {<<byte{vec_3, arr_1 with [15:0]}} = vec_1;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the moore.conversion operation is more or less a placeholder for more detailed conversion operations in the future. Ideally we would have more precise operations, such as moore.pack_sbv and moore.unpack_sbv, moore.int_to_logic and moore.logic_to_int, and other operations to convert between compatible unpacked types. I have started doing this for the sign/zero extension that is needed for resizing, but more work is needed. For now I think it's fine to just rely on moore.conversion to express whatever type conversion you need, and we'll fill in the details of how those conversions work later.

lib/Conversion/ImportVerilog/Expressions.cpp Outdated Show resolved Hide resolved
test/Conversion/ImportVerilog/errors.sv Outdated Show resolved Hide resolved
lib/Conversion/ImportVerilog/Expressions.cpp Outdated Show resolved Hide resolved
@hailongSun2000
Copy link
Member

By the way, please don't forget to update the commit history. @fabianschuiki had tweaked the moore.conversion in this PR(#7783).

@hailongSun2000
Copy link
Member

Thanks for your work again 😃!

@fabianschuiki
Copy link
Contributor

@chenbo-again Feel free to request commit access to LLVM such that your CI builds run automatically without anyone approving them. That way you can also land the PR yourself once CI passes and you're happy with the reviews and code 🙂

Copy link
Contributor

@fabianschuiki fabianschuiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks a lot for making this all work 🙌. Let me know if you want to merge this yourself once you're happy with it, or if you would like me to merge this for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants