Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Validation for bytes #264

Closed
ash2k opened this issue Oct 14, 2024 · 7 comments
Closed

[Feature Request] Validation for bytes #264

ash2k opened this issue Oct 14, 2024 · 7 comments
Labels
Feature New feature or request

Comments

@ash2k
Copy link

ash2k commented Oct 14, 2024

Feature description:

Allow to check that a bytes field contains only allowed octets. This cannot be done with CEL matches() because bytes in the field may not be valid UTF-8.

Problem it solves or use case:

I want to validate that a bytes field has only certain bytes in it. Specifically, that it's a valid HTTP header name / value.

HTTP header values are not UTF-8, can have binary data: https://datatracker.ietf.org/doc/html/rfc9110#name-field-values.

I also want to avoid the cost of validating HTTP header names as UTF-8 that proto does (so have to use bytes), but instead to validate them manually.

Proposed implementation or solution:

?

Contribution:

Willing to do code review.

Examples or references:

Additional context:

#263.

@ash2k ash2k added the Feature New feature or request label Oct 14, 2024
@rodaine
Copy link
Member

rodaine commented Oct 16, 2024

matches supports escaping of octet values. We do this in our conformance tests. Is that not sufficient?

@ash2k
Copy link
Author

ash2k commented Oct 17, 2024

Indeed, it works fine! I got confused by the description of the validation rule. I read it as if it says that the data that is being validated must be utf-8 encoded, not that the pattern itself should be utf-8 encoded:

// The value of the field must be valid UTF-8 or validation will fail with a
// runtime error.

// `pattern` requires the field value to match the specified regular
// expression ([RE2 syntax](https://github.com/google/re2/wiki/Syntax)).
// The value of the field must be valid UTF-8 or validation will fail with a
// runtime error.
// If the field value doesn't match the pattern, an error message is generated.
//
// ```proto
// message MyBytes {
// // value must match regex pattern "^[a-zA-Z0-9]+$".
// optional bytes value = 1 [(buf.validate.field).bytes.pattern = "^[a-zA-Z0-9]+$"];
// }
// ```
optional string pattern = 4 [(predefined).cel = {
id: "bytes.pattern"
expression: "!string(this).matches(rules.pattern) ? 'value must match regex pattern `%s`'.format([rules.pattern]) : ''"
}];

This worked for me:

message HeaderValues {
  // At least one item, but it can be empty.
  repeated bytes value = 1 [
    (buf.validate.field).repeated.min_items = 1,
    (buf.validate.field).repeated.items.bytes.pattern = "^[^\\x00-\\x08\\x0A-\\x1F\\x7F]*$"
  ];
}

message HeaderKV {
  bytes key = 1 [(buf.validate.field).bytes.pattern = "^:?[0-9a-zA-Z!#$%&'*+-.^_|~`]+$"];
  HeaderValues value = 2 [(buf.validate.field).required = true];
}

Thank you!

@ash2k ash2k closed this as completed Oct 17, 2024
@rodaine
Copy link
Member

rodaine commented Oct 17, 2024

I got confused by the description of the validation rule.

Hm, this might be copy-pasta from the pattern rule over on strings (which does have the UTF-8 requirement). I'll fix up the documentation to be clearer. Thanks!

@ash2k
Copy link
Author

ash2k commented Oct 17, 2024

@rodaine Thank you for an excellent library!

@ash2k
Copy link
Author

ash2k commented Oct 18, 2024

Hm, I've added some proper tests and I'm getting an error: runtime error: error evaluating bytes.pattern: invalid UTF-8 in bytes, cannot convert to string. I believe this is because string(this) fails in the CEL expression.

optional string pattern = 4 [(predefined).cel = {
id: "bytes.pattern"
expression: "!string(this).matches(rules.pattern) ? 'value must match regex pattern `%s`'.format([rules.pattern]) : ''"
}];

I think we might need an overload for matches() that works natively on bytes.

@ash2k ash2k reopened this Oct 18, 2024
@ash2k
Copy link
Author

ash2k commented Oct 18, 2024

I think the error is generated here: https://github.com/google/cel-go/blob/master/common/types/bytes.go#L104-L106

@rodaine
Copy link
Member

rodaine commented Oct 21, 2024

Superceded by #268

@rodaine rodaine closed this as completed Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants