Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Translations #74

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

Robbe7730
Copy link

As requested in #72 this PR add support for translations through the following methods in Gtfs:

  • get_stop_translated
  • get_trip_translated
  • get_route_translated

Future objects can be translated similarly to Stop, Trip and Route by implementing the Translatable trait using the translate function in Gtfs.

Copy link
Collaborator

@antoine-de antoine-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for the contribution!

I feel the API could be a bit simpler and I feel we could offer a bit more type checks and performance by limiting the number of strings 🤔

I'm not used to the GTFS translation part so maybe I'm off the mark but I'll throw some ideas.

Note: I feel that since the record_id way of getting to know the id of the object is written in the documentation it should be handled by the library

maybe

impl Translation { //<- GTFS owns an instance to this Translation object
  pub fn translate_field<T: Translatable>(&self, field: TranslatableField, obj: &T, original_value: &str, lang: &str) -> Option<String> {
   let record_id = obj.record_id(); //<- added in the Translatable trait, returns a TranslationRecordId being a (&str, &Option<String>) based on the record_id spec
...
}
}

// at first I thought we could split the ObjectType from the field, but I'm not sure it's worth the added complexity. Any thoughts on this?
pub enum TranslatableField {
   //we hardcode all possible translatable fields (not much I think) 
   RouteShortName,
   RouteLongName,
   StopCode,
...
}

// and Translation could be something like
// Note: I'm not really sure about the nested hashmap, but this makes it easy to take references and not allocate new strings no?
struct Translation {
by_id: HashMap<TranslatableField, HashMap<Lang, HashMap<TranslationRecordId, String>>>,
by_value: HashMap<TranslatableField, HashMap<Lang, HashMap<String, String>>>,
}

It's all speudo code, I don't know if it's understandable :/

What do you think about this?

I think this feature is a bit complex so we need to think carefully about its design.

#[derive(Derivative)]
#[derivative(Default(bound = ""))]
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
pub struct Translation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename this to RawTranslation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! That would make more sense 😄

@Robbe7730
Copy link
Author

I feel the API could be a bit simpler and I feel we could offer a bit more type checks and performance by limiting the number of strings 🤔

Yeah, I tought so too 😛 stringly typed APIs are pretty bad in general, but it's the best I could do. The specification does not striclty specify which fields cannot be translated, only which should not be translated, so technically all fields can be translated. (That is if I'm interpreting that correctly of course...).

Note: I feel that since the record_id way of getting to know the id of the object is written in the documentation it should be handled by the library

That's true, however I'm not sure how you would go about telling gtfs.translate which type it is, therefor I looked up the key in the struct's own .translate(). Sidenote which may not have been clear from my original post: the gtfs.translate() function is not meant to be used outside of the Translatable interface, maybe we should put that somewhere else. The users would only ever use the get_xxx_tranlated or xxx.translate(gtfs) functions.

On another note: Halfway through implementing this I realized that the data I was using uses a completely off-specification structure for translations, so I ended up having to fork this library and implement my own parsing. This means that the drive to continue implementing and improving may not be quite as high as it was when I started, so if it takes a while to respond or make changes, that's why 🙂 I do however want to deliver a decent contribution, so I'll try my best to get this working as good as it can!

@kylerchin
Copy link
Contributor

What is left on this pull req to do? I would love to tackle this! This seems like the last major item to complete to make this library production-ready for user facing applications

@antoine-de
Copy link
Collaborator

Nice! I think @Robbe7730 is not available anymore, it is good if you can spare some times to finish this PR 🎉

The RawTranslation is nice as it is, but it should be nice to have an easy to use struct.

If you plan to use it, do you think of a convenient way to represent the data?

@kylerchin
Copy link
Contributor

I'm still brainstorming ideas, what do you want for the regular Gtfs?

@kylerchin
Copy link
Contributor

At minimum, I want RawGtfs table_name to be an enum. The other fields, I might do an Enum inside an enum? I'll do a MapReduce analysis of my entire global agency dataset to see what combinations exist globally.

@kylerchin
Copy link
Contributor

kylerchin commented Jan 13, 2024

My friend proposed this solution

#[derive(Hash, PartialEq)]
enum Translatable<'a> {
  record_id(&'a str),
  field_value(&'a str),
}
struct Translation {
  mapping: HashMap<(Lang,FieldName,Translatable), String>
}

thoughts? @antoine-de

@antoine-de
Copy link
Collaborator

seems fine for me 👍

Maybe we can also think about how we would like to use them as end users?

Would the solution proposed above fit your need @kylerchin ?

What would you prefer?
Something like:

Solution 1

let fr_title: Option<&str> = gtfs.translate_field(StopField::Title, &my_stop, Lang::FR);

-> how can we enforce in the type system that we don't give a RouteField?

Solution 2

let fr_title: Option<&str> = my_stop.translate(Lang::FR, StopField::Title);

-> the problem is for this we would need to store the translations in each items, and since there are translatable field on stop times, I think the memory cost will be too big.

Solution 3

let fr_title: Option<&str> = gtfs.translate_stop_title(my_stop, Lang::FR);

Orthogonal decision

Should Lang be an Enum or a raw str

Enum enforce better typing, but might be a pain to maintain with all possible lang?

Return types

Should the translation return Option<&str> or a &str defaulting to the main lang if no translation is available?
I'd prefer an explicit Option, but it might be more difficult to use (maybe we can have different functions to choose this?).
And if we want a fallback to the original field, should we pass it explicitly like

let fr_title: Option<&str> = gtfs.translate_field(StopField::Title, &my_stop, Lang::FR, &my_stop.title);

leading to potential errors, or should we be smart and have a way to get the title from the enum.

Side note

On a side note, maybe we can first focus on finishing the Raw part of the translations, where we just need to map the specification, without thinking about usage, and iterate of a nice API afterward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants