Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we convert the handsontable sample table representation to a PEP-compatible sample table representation? #376

Open
nleroy917 opened this issue Aug 22, 2024 · 3 comments
Assignees

Comments

@nleroy917
Copy link
Member

Overview

Probably the most bug-prone step in the sample-table of the PEPhub UI is the conversion of the data-representation used by handsontable to the data-representation used in our database. Specifically, we need to convert an array-of-arrays, into an array-of-objects. You can view the current function deployed now.

Essentially the function must convert this:

[
  ['col1', 'col2', 'col3'],
  ['s1_col1', 's1_col2', 's1_col3'],
  ['s2_col1', 's2_col2', 's2_col3'],
]

Into this:

[
  { col1: 's1_col1', col2: 's1_col2', col3: 's1_col3' },
  { col1: 's2_col1', col2: 's2_col2', col3: 's2_col3' },
]

Things that make it tricky

There are problems with this conversion; moreover questions that need to be answered:

  1. What happens if a user has duplicate column names? This will lead to data-loss as attributes are overwritten
  2. What if a user has an empty column? This leads to objects with null as an attribute (which feels wrong)
  3. What if the user skips a row? Should it be blank or smart enough to know that they don't want that as a sample?

The conversion seems to be lossy by nature. In the interest of trying to balance out not doing magic behind the scenes but also promptly warning the user about potential errors, it becomes quite difficult to write the function, and I am looking for assistance.

@nleroy917
Copy link
Member Author

@nsheff
Copy link
Contributor

nsheff commented Aug 26, 2024

Here are some thoughts:

  1. do as much validation as possible on the server, not on the client. This way, anything posting bad data, wither from this particular client or other clients, will get a nice response. If we make the validators here, then other attempts to update things will run into issues.
  2. consider converting from array of arrays to array of objects inside python.

I think to solve your concern, I would write something that seems simpler than the function you linked. I would just do this:

  1. Consider first the header row.
  2. Check for duplicates. If any duplicates, return "Duplicate column header error" and fail.
  3. Check for nulls. If nulls are at the end of the array, do nothing (I guess just discard them). If nulls are in the middle of the array (there are values after nulls), return "Missing column header" error.
  4. Now look at the row data. Check for any rows with everything null. If there are rows with everything null, just remove them.
  5. Otherwise, now create your objects. and try to insert into the table.
  6. return any error from the database, if it fails.

@sanghoonio
Copy link
Member

@khoroshevskyi @nleroy917 how much processing should we do serverside?

sanghoonio added a commit that referenced this issue Aug 28, 2024
…use making ph_id read only makes it impossible to overwrite if pasting a larger table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants