Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular files: semicolon is not supported as a separator by ".csv" files #37

Open
1 task
vkush opened this issue Dec 5, 2024 · 1 comment
Open
1 task
Labels
external This is an external issue (e.g. on Dataverse itself), we will not work on it but only monitor it repo4cat NFDI4Cat Central Data Repository

Comments

@vkush
Copy link
Member

vkush commented Dec 5, 2024

Researches of catalysis are using semicolon ";" as a separator within .csv files, what is not supported during ingest of tabular files, where only comma "," is currently supported (because of "Comma Separated Values"). Based on Wikipedia, a .csv files can have also a different separators, not only a comma.

Issues are known and under discussion:

  • Also we have to check, if "additional empty spaces" and "multiple-underscores" are interpreted properly, especially when all issues above are solved. With proper .csv with commas it is ok now, use test-file-spaces-underscores.csv as an example.
@vkush vkush added external This is an external issue (e.g. on Dataverse itself), we will not work on it but only monitor it repo4cat NFDI4Cat Central Data Repository labels Dec 5, 2024
@dalito
Copy link
Member

dalito commented Dec 5, 2024

German researchers with not much data background may create files with ; as separator (and , as decimal separator). But it should be avoided!

The ; is unfortunately created by Excel with German locale setting. I also saw problems with date and time formats in the past when German locale is used.

Instead of adapting to such problematic csv files I would prefer a validator that rejects or at least flags incompatible csv files. Do the BasCat validation tools maybe address this? @khatamirad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external This is an external issue (e.g. on Dataverse itself), we will not work on it but only monitor it repo4cat NFDI4Cat Central Data Repository
Projects
Status: New
Development

No branches or pull requests

2 participants