-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LC-501: standardize stage properties to camelCase #203
base: main
Are you sure you want to change the base?
Conversation
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Outdated
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Outdated
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Outdated
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Outdated
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Outdated
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Show resolved
Hide resolved
lucille-core/src/main/java/com/kmwllc/lucille/util/CamelCaseConfigConverter.java
Show resolved
Hide resolved
import org.slf4j.LoggerFactory; | ||
import org.apache.commons.text.CaseUtils; | ||
|
||
public class CamelCaseConfigConverter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need a unit test for this. Can be a simple test that runs the conversion on one input conf file and confirms that the output matches what we expect. The conf file should have some examples of edge cases like multiple instances of the same stage property name on one line (used as a property and as a value) and some other snake case properties in connectors or elsewhere that shouldn't be converted because they're not stage properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created test for this in 1134f0a
@@ -53,25 +53,25 @@ | |||
* - "offset" : number of character offset from start of document | |||
* - "length" : number of characters in this chunk | |||
* - "chunk_number" : chunk number | |||
* - "total_chunk_number" : total chunk number produced from parent document | |||
* - "total_chunks" : total chunk number produced from parent document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Should these document field names be camel case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiratraynor to answer both of your questions. I did have the same thoughts, and short answer is, that we can, but this would make it a nightmare to create a script for conversion. Some external libraries (like Tika for TikaExtractor) produces snake_case for lucille documents. And so any other stages further down the pipeline that takes in the name of the field (like source: "field_produced_by_tika") cannot be changed in the script convertor. Also, there is the case where CSVConnector, SolrConnector, VFSConnector, DatabaseConnector could produce fields with snake_case, meaning that the script would have to know which fields not to convert in cases like this, especially for CSVConnector and DatabaseConnector, which is dependent on the input.
I'm not sure if we'd want to change these or not, but there are a few examples of stages that have values that are in snake case like |
No description provided.