Tabs are stripped #20

emilyyyylime · 2024-09-04T06:29:50Z

The tab character \t (U+0009 Horizontal Tabulation) gets stripped by this crate (for example strip_str("\t") -> ""), which is not what I expected.

Is the tab character considered an ANSI escape sequence by this crate? If so the documentation should make it clearer.

The text was updated successfully, but these errors were encountered:

luser · 2024-09-04T13:28:15Z

Is the tab character considered an ANSI escape sequence by this crate? If so the documentation should make it clearer.

Hrm! Not intentionally!

luser · 2024-09-04T13:35:45Z

Oh, I guess I didn't dive deeply enough into how all of this works under the hood:

strip-ansi-escapes/src/lib.rs

Lines 168 to 173 in 830038d

    
           fn execute(&mut self, byte: u8) { 
        
               // We only care about executing linefeeds. 
        
               if byte == b'\n' { 
        
                   self.err = writeln!(self.writer).err(); 
        
               } 
        
           }

perform gets called for all C0 and C1 control characters, and the tab character is in C0. TIL!

Would you like to submit a patch? I migrated to a new PC a few months ago and don't have this repo checked out in a convenient place at the moment.

emilyyyylime · 2024-09-04T14:33:24Z

@luser which characters exactly do we need to special case here? Only tabs? What about vertical tabs, carriage-returns, NULs, and other characters that mostly lost their meaning today?

emilyyyylime · 2024-09-04T15:00:50Z

Actually could we just print every byte passed to execute? It seems the actual ESCAPE character and the following sequence isn't passed to it

luser · 2024-09-04T20:47:52Z

which characters exactly do we need to special case here? Only tabs?

Adding tabs to the special-case list so that it consists of just \n and \t is what I am suggesting, yes. I don't think any of the others are going to do anything particularly useful, honestly.

Actually could we just print every byte passed to execute?

While we could, I think that given the stated purpose of this crate it's reasonable to be judicious in what we pass through. \t feels justifiable to transmit. I would want to see a proposed use case for including any others. This crate has been in reasonably widespread use for 6 years now and you seem to be the first person who has noticed that it was omitting tabs (or at least the first to take the time to file a bug on it). It seems reasonable to state that the current state of affairs isn't causing many highly-visible problems for people. :) To be clear—this is entirely my opinion, I don't have any data to back it up.

What would probably be useful would be to document the behavior we have implemented somewhere, maybe in the top-level crate docs?

strip-ansi-escapes/src/lib.rs

Line 3 in 830038d

    
           //! This can be used to take output from a program that includes escape sequences and write

You don't have to write these docs if you submit a PR, I'm just writing this down while I'm thinking about it. We could link to the ANSI escape code page on Wikipedia, and describe the categories of what this library strips:

C0 and C1 control characters, except for \n and \t.
All ESC sequences, which start with byte 0x1B, which includes all CSI and OSC sequences.

Nice catch, BTW! Thanks for taking the time to report it!

emilyyyylime · 2024-09-28T16:32:42Z

np! Sorry for taking so long to come back to this; I mostly empathise with your concerns of introducing changes to such a widely used crate, but at the very least some characters that I think appear reasonably often are \r, \0, and possibly the rest of the common C-style escapes \a, \b, \v, \f (Bell, backspace, vertical tab, and form feed). Would you be interested in whitelisting them as well? Another option would be to provide different functions with different granularity levels of escaping, allowing for no downstream breakage (I still think the default functionality should keep \t, \r, and \0, however.)

emilyyyylime · 2024-09-28T16:38:26Z

It seems actually that the minimal set of bytes I'd expect to come through exactly overlap with the escape sequences that Rust provides https://doc.rust-lang.org/reference/tokens.html#byte-escapes. This seems to me to indicate some universality to them, at least within the Rust ecosystem. As per \a\b\v\f I'm willing to compromise

emilyyyylime · 2024-09-28T16:52:46Z

I also did confirm that the characters we strip are exactly the range 0..32 (and also bytes that aren't valid UTF-8, in the case of the plain strip(), not sure if this is of any concern)

emilyyyylime added a commit to emilyyyylime/typst-ansi-hl that referenced this issue Sep 4, 2024

Temporarily work around luser/strip-ansi-escapes#20

0e0167b

emilyyyylime mentioned this issue Sep 4, 2024

Add --[no-]unindent for removing leading indentation frozolotl/typst-ansi-hl#7

Merged

frozolotl pushed a commit to frozolotl/typst-ansi-hl that referenced this issue Oct 3, 2024

Temporarily work around luser/strip-ansi-escapes#20

3684626

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabs are stripped #20

Tabs are stripped #20

emilyyyylime commented Sep 4, 2024

luser commented Sep 4, 2024

luser commented Sep 4, 2024

emilyyyylime commented Sep 4, 2024

emilyyyylime commented Sep 4, 2024 •

edited

Loading

luser commented Sep 4, 2024

emilyyyylime commented Sep 28, 2024

emilyyyylime commented Sep 28, 2024

emilyyyylime commented Sep 28, 2024

Tabs are stripped #20

Tabs are stripped #20

Comments

emilyyyylime commented Sep 4, 2024

luser commented Sep 4, 2024

luser commented Sep 4, 2024

emilyyyylime commented Sep 4, 2024

emilyyyylime commented Sep 4, 2024 • edited Loading

luser commented Sep 4, 2024

emilyyyylime commented Sep 28, 2024

emilyyyylime commented Sep 28, 2024

emilyyyylime commented Sep 28, 2024

emilyyyylime commented Sep 4, 2024 •

edited

Loading