Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Removing type infromation in Struct literal. #103

Closed
wants to merge 5 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 35 additions & 62 deletions crates/iceberg/src/spec/values.rs
Original file line number Diff line number Diff line change
Expand Up @@ -554,13 +554,14 @@ impl From<&Literal> for JsonValue {
PrimitiveLiteral::Decimal(_) => todo!(),
},
Literal::Struct(s) => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @liurenjie1024 how should we design this method ? Since the iterator only return the value instead of the (id, value, name) tuple. I'm having trouble setting up the test for this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I used enumerate to account for the id property in the json object. I don't know if this is the correct procedure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. Is this method really useful? I think we should expose this method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the this Struct method supposed to return a hashmap ? The StructType for iceberg requires a NestedField with the id and value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's supposed to return an json object, and the field names should match struct field names. But this is not feasible without schema. This method is the correct approach I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean we should remove the total implementation of From<&Literal> for JsonValue. Since we have decided to remove type info for Struct, this method signature is incorrect. The conversion from/to json should delegate to ser/de module.

Copy link
Author

@mobley-trent mobley-trent Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serde_json doesn't seem to have a method to extract the JsonValue from the Literal. This was only possible through From<&Literal> for JsonValue

This line raises a compiler error when I removed From<&Literal> for JsonValue

fn check_json_serde(json: &str, expected_literal: Literal, expected_type: &Type) {
    ...
    let expected_json_value: JsonValue = (&expected_literal).into(); // error raised here

Error:

^^^^ the trait `From<&values::Literal>` is not implemented for `serde_json::Value`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after removing this, directly ser/de from json no longer works. The correct way to do this is to ser/de is using the ser/de module, you can see this method as example. It delegates ser/de to serializaiton/deserialization system.

Copy link
Author

@mobley-trent mobley-trent Dec 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are saying I should refactor check_json_serde to follow the structure of check_convert_with_avro ?

Also just a reminder, but the code doesn't build when I remove From<&values::Literal> so I don't think removing it entirely will work @liurenjie1024

Copy link
Contributor

@liurenjie1024 liurenjie1024 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mobley-trent Yes, we are planning to delegate conversion between literal values and json values to ser/de module, just like what we did with avro. You're right complete removing From<&values::Literal> will not work before that. I think we can hold on this pr before that the json conversion was finished.

JsonValue::Object(JsonMap::from_iter(s.iter().map(|(id, value, _)| {
let json: JsonValue = match value {
Some(val) => val.into(),
None => JsonValue::Null,
};
(id.to_string(), json)
})))
JsonValue::Array(
s.iter()
.map(|value| match value {
Some(val) => val.into(),
None => JsonValue::Null,
})
.collect()
)
}
Literal::List(list) => JsonValue::Array(
list.iter()
Expand Down Expand Up @@ -600,24 +601,18 @@ impl From<&Literal> for JsonValue {
pub struct Struct {
/// Vector to store the field values
fields: Vec<Literal>,
/// Vector to store the field ids
field_ids: Vec<i32>,
/// Vector to store the field names
field_names: Vec<String>,
/// Null bitmap
null_bitmap: BitVec,
}

impl Struct {
/// Create a iterator to read the field in order of (field_id, field_value, field_name).
pub fn iter(&self) -> impl Iterator<Item = (&i32, Option<&Literal>, &str)> {
pub fn iter(&self) -> impl Iterator<Item = Option<&Literal>> {
self.null_bitmap
.iter()
.zip(self.fields.iter())
.zip(self.field_ids.iter())
.zip(self.field_names.iter())
.map(|(((null, value), id), name)| {
(id, if *null { None } else { Some(value) }, name.as_str())
.map(|(null, value)| {
if *null { None } else { Some(value) }
})
}
}
Expand Down Expand Up @@ -652,16 +647,12 @@ impl IntoIterator for Struct {
}
}

impl FromIterator<(i32, Option<Literal>, String)> for Struct {
fn from_iter<I: IntoIterator<Item = (i32, Option<Literal>, String)>>(iter: I) -> Self {
impl FromIterator<Option<Literal>> for Struct {
fn from_iter<I: IntoIterator<Item = Option<Literal>>>(iter: I) -> Self {
let mut fields = Vec::new();
let mut field_ids = Vec::new();
let mut field_names = Vec::new();
let mut null_bitmap = BitVec::new();

for (id, value, name) in iter.into_iter() {
field_ids.push(id);
field_names.push(name);
for value in iter.into_iter() {
match value {
Some(value) => {
fields.push(value);
Expand All @@ -675,8 +666,6 @@ impl FromIterator<(i32, Option<Literal>, String)> for Struct {
}
Struct {
fields,
field_ids,
field_names,
null_bitmap,
}
}
Expand Down Expand Up @@ -828,20 +817,16 @@ impl Literal {
if let JsonValue::Object(mut object) = value {
Ok(Some(Literal::Struct(Struct::from_iter(
schema.fields().iter().map(|field| {
(
field.id,
object.remove(&field.id.to_string()).and_then(|value| {
Literal::try_from_json(value, &field.field_type)
.and_then(|value| {
value.ok_or(Error::new(
ErrorKind::DataInvalid,
"Key of map cannot be null",
))
})
.ok()
}),
field.name.clone(),
)
object.remove(&field.id.to_string()).and_then(|value| {
Literal::try_from_json(value, &field.field_type)
.and_then(|value| {
value.ok_or(Error::new(
ErrorKind::DataInvalid,
"Key of map cannot be null",
))
})
.ok()
})
}),
))))
} else {
Expand Down Expand Up @@ -1558,7 +1543,7 @@ mod _serde {
optional: _,
}) => match ty {
Type::Struct(struct_ty) => {
let iters: Vec<(i32, Option<Literal>, String)> = required
let iters: Vec<Option<Literal>> = required
.into_iter()
.map(|(field_name, value)| {
let field = struct_ty
Expand All @@ -1570,7 +1555,7 @@ mod _serde {
)
})?;
let value = value.try_into(&field.field_type)?;
Ok((field.id, value, field.name.clone()))
Ok(value)
})
.collect::<Result<_, Error>>()?;
Ok(Some(Literal::Struct(super::Struct::from_iter(iters))))
Expand Down Expand Up @@ -1660,9 +1645,7 @@ mod tests {
let avro_schema = schema_to_avro_schema("test", &schema).unwrap();
let struct_type = Type::Struct(StructType::new(fields));
let struct_literal = Literal::Struct(Struct::from_iter(vec![(
1,
Some(expected_literal.clone()),
"col".to_string(),
Some(expected_literal.clone())
)]));

let mut writer = apache_avro::Writer::new(&avro_schema, Vec::new());
Expand All @@ -1689,9 +1672,7 @@ mod tests {
let avro_schema = schema_to_avro_schema("test", &schema).unwrap();
let struct_type = Type::Struct(StructType::new(fields));
let struct_literal = Literal::Struct(Struct::from_iter(vec![(
1,
Some(literal.clone()),
"col".to_string(),
Some(literal.clone())
)]));
let mut writer = apache_avro::Writer::new(&avro_schema, Vec::new());
let raw_literal = RawLiteral::try_from(struct_literal.clone(), &struct_type).unwrap();
Expand Down Expand Up @@ -1839,18 +1820,14 @@ mod tests {
record,
Literal::Struct(Struct::from_iter(vec![
(
1,
Some(Literal::Primitive(PrimitiveLiteral::Int(1))),
"id".to_string(),
Some(Literal::Primitive(PrimitiveLiteral::Int(1)))
),
(
2,
Some(Literal::Primitive(PrimitiveLiteral::String(
"bar".to_string(),
))),
"name".to_string(),
)))
),
(3, None, "address".to_string()),
(None),
])),
&Type::Struct(StructType::new(vec![
NestedField::required(1, "id", Type::Primitive(PrimitiveType::Int)).into(),
Expand Down Expand Up @@ -2206,18 +2183,14 @@ mod tests {
check_convert_with_avro(
Literal::Struct(Struct::from_iter(vec![
(
1,
Some(Literal::Primitive(PrimitiveLiteral::Int(1))),
"id".to_string(),
Some(Literal::Primitive(PrimitiveLiteral::Int(1)))
),
(
2,
Some(Literal::Primitive(PrimitiveLiteral::String(
"bar".to_string(),
))),
"name".to_string(),
"bar".to_string()
)))
),
(3, None, "address".to_string()),
(None)
])),
&Type::Struct(StructType::new(vec![
NestedField::required(1, "id", Type::Primitive(PrimitiveType::Int)).into(),
Expand Down