Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread 'main' panicked at 'a csv record: Error(UnequalLengths ... )' #91

Closed
elazar opened this issue Oct 8, 2021 · 5 comments
Closed
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@elazar
Copy link

elazar commented Oct 8, 2021

Using the 0.0.20 Homebrew package:

# Execution with error and full backtrace
RUST_BACKTRACE=full tidy-viewer Chase9989_Activity_20211008.CSV
thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 69, line: 1, record: 1 }), expected_len: 7, len: 8 })', src/main.rs:185:20
stack backtrace:
   0:        0x1100652b1 - __mh_execute_header
   1:        0x110081b7b - __mh_execute_header
   2:        0x110061f7a - __mh_execute_header
   3:        0x1100668c5 - __mh_execute_header
   4:        0x1100664af - __mh_execute_header
   5:        0x110066fb0 - __mh_execute_header
   6:        0x110066a4e - __mh_execute_header
   7:        0x110065737 - __mh_execute_header
   8:        0x1100669ba - __mh_execute_header
   9:        0x11008eacf - __mh_execute_header
  10:        0x11008ebb5 - __mh_execute_header
  11:        0x10ff78137 - __mh_execute_header
  12:        0x10ff6dc54 - __mh_execute_header
  13:        0x10ff714d6 - __mh_execute_header
  14:        0x10ff714ec - __mh_execute_header
  15:        0x110064b54 - __mh_execute_header
  16:        0x10ff70bd9 - __mh_execute_header

# Offending line - plain ASCII
head -n 1 Chase9989_Activity_20211008.CSV
Details,Posting Date,Description,Amount,Type,Balance,Check or Slip #

# Offending line - hexdump (error happens with both Windows and UNIX line endings)
head -n 1 Chase9989_Activity_20211008.CSV | hexdump -C
00000000  44 65 74 61 69 6c 73 2c  50 6f 73 74 69 6e 67 20  |Details,Posting |
00000010  44 61 74 65 2c 44 65 73  63 72 69 70 74 69 6f 6e  |Date,Description|
00000020  2c 41 6d 6f 75 6e 74 2c  54 79 70 65 2c 42 61 6c  |,Amount,Type,Bal|
00000030  61 6e 63 65 2c 43 68 65  63 6b 20 6f 72 20 53 6c  |ance,Check or Sl|
00000040  69 70 20 23 0d 0a                                 |ip #..|
00000046
@alexhallam
Copy link
Owner

Thanks for putting so much effort into this issue. I am stumped on this problem. There is nothing that appears to be wrong with your csv. Also I was able to run on a Linux.

Are you able to run the examples in the readme without fail?

Reproducible without error on my machine

> cat Chase9989_Activity_20211008.CSV
Details,Posting Date,Description,Amount,Type,Balance,Check or Slip #

Result

> tv Chase9989_Activity_20211008.CSV
      tv dim: 0 x 7
      Details Posting Date Description Amount Type Balance Check or Slip #
# hexdump 
> head -n 1 Chase9989_Activity_20211008.CSV | hexdump -C
00000000  44 65 74 61 69 6c 73 2c  50 6f 73 74 69 6e 67 20  |Details,Posting |
00000010  44 61 74 65 2c 44 65 73  63 72 69 70 74 69 6f 6e  |Date,Description|
00000020  2c 41 6d 6f 75 6e 74 2c  54 79 70 65 2c 42 61 6c  |,Amount,Type,Bal|
00000030  61 6e 63 65 2c 43 68 65  63 6b 20 6f 72 20 53 6c  |ance,Check or Sl|
00000040  69 70 20 23 0a                                    |ip #.|
00000045

@alexhallam alexhallam added bug Something isn't working help wanted Extra attention is needed labels Oct 8, 2021
@alexhallam
Copy link
Owner

  1. Can you run the following
wget https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/diamonds.csv
tv diamonds.csv 
  1. What can you tell me about your system architecture

I would like to set up a virtual machine that matches yours to test.

@elazar
Copy link
Author

elazar commented Oct 9, 2021

@alexhallam Your test file seems to work.

I did a bit more digging and it appears that the first line works on its own, but the first two lines don't. That explains why you weren't able to reproduce it.

cat Chase9989_Activity_20211008.CSV| head -n 1 | tidy-viewer

      tv dim: 0 x 7
      Details Posting Date Description Amount Type Balance Check or Slip #

cat Chase9989_Activity_20211008.CSV| head -n 2 | tidy-viewer
thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 69, line: 2, record: 1 }), expected_len: 7, len: 8 })', src/main.rs:185:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

More info on the offending line:

cat Chase9989_Activity_20211008.CSV| awk '{if(NR==2) print $0}'
DEBIT,10/08/2021,"POS DEBIT                AT&T   *PAYMENT           800-288-2020 FL",-63.33,MISC_DEBIT, ,,

cat Chase9989_Activity_20211008.CSV| awk '{if(NR==2) print $0}' | hexdump -C
00000000  44 45 42 49 54 2c 31 30  2f 30 38 2f 32 30 32 31  |DEBIT,10/08/2021|
00000010  2c 22 50 4f 53 20 44 45  42 49 54 20 20 20 20 20  |,"POS DEBIT     |
00000020  20 20 20 20 20 20 20 20  20 20 20 41 54 26 54 20  |           AT&T |
00000030  20 20 2a 50 41 59 4d 45  4e 54 20 20 20 20 20 20  |  *PAYMENT      |
00000040  20 20 20 20 20 38 30 30  2d 32 38 38 2d 32 30 32  |     800-288-202|
00000050  30 20 46 4c 22 2c 2d 36  33 2e 33 33 2c 4d 49 53  |0 FL",-63.33,MIS|
00000060  43 5f 44 45 42 49 54 2c  20 2c 2c 0a              |C_DEBIT, ,,.|
0000006c

What I was able to dig up on my system architecture:

sw_vers
ProductName:	macOS
ProductVersion:	11.6
BuildVersion:	20G165

arch
i386

uname -a
Darwin Bees-MacBook-Air.local 20.6.0 Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 x86_64

@alexhallam
Copy link
Owner

alexhallam commented Oct 9, 2021

I see now. Thanks for the additional info.

The problem here is that there is an additional comma on the second row. That is telling tv that there are 7 columns in the header, but 8 columns in the second row.

This is not a proper csv.

Thinking about how to handle these poorly formatted csv files is an open issue #79 . For now you can remove the last column in your data set or add a new column in the header.

Here is a working example where I removed the last comma removed on the second row.

> cat Chase9989_Activity_20211008.CSV 
Details,Posting Date,Description,Amount,Type,Balance,Check or Slip #
DEBIT,10/08/2021,"POS DEBIT                AT&T   *PAYMENT           800-288-2020 FL",-63.33,MISC_DEBIT, ,
> tv Chase9989_Activity_20211008.CSV -u 9

      tv dim: 1 x 7
      Details Posting … Descript… Amount Type      Balance Check or… 
1     DEBIT   10/08/20… POS DEBI… -63.3  MISC_DEB…         NA        

I added the -u 9 to make the max column width 9 just to truncate some of those long cells.

@alexhallam
Copy link
Owner

Same as #79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants