Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper wrapping in str_format #8

Open
Dentomologist opened this issue May 5, 2022 · 1 comment
Open

Improper wrapping in str_format #8

Dentomologist opened this issue May 5, 2022 · 1 comment

Comments

@Dentomologist
Copy link

There are several cases where str_format improperly handles wrapping. Setup:

#include <string>
#include <vector>

#include <OptionParser.h>

int main(int argc, char* argv[])
{
  std::vector<std::string> descriptions{
    // buggy (no padding): newline with no space afterward
    "Lorem ipsum dolor\nsit",
    // correct: newline with space afterward
    "Lorem ipsum dolor\nsit ",
    // buggy (no wrap): autowrapped with no spaces afterward
    "Lorem ipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing",
    // buggy (no padding): autowrapped with one space afterward
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed",
    // correct: autowrapped with two spaces afterward
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed ",
    // buggy (no padding when wrapping in window, maybe ok if window is wide enough to fit word, depending on desired behavior):
    //      string longer than len with no whitespace
    "Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor",
    // buggy (no wrapping, whole string repeated with no padding):
    //      string with exactly one whitespace in position after len
    "Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor foobar",
    // correct, after first line: newline with no spaces afterward
    "Lorem ipsum\ndolor\nsit",
    // buggy (no wrapping), after first line; autowrapped with no spaces afterward
    "Lorem\nipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing"
  };

  optparse::OptionParser parser;
  char flag = 'a';
  for (std::string desc : descriptions)
  {
    std::string long_flag = std::string{flag} + "oo";
    parser.add_option(std::string("-") + flag, std::string("--") + long_flag)
      .action("store_true")
      .help(desc);
    flag++;
  }

  std::vector<std::string> args{"foobar", "-h"};
  auto options = parser.parse_args(args);

  return 0;
}

On Windows the column width is 80, which results in the following str_format parameters:

len = 80
pre = 24

Some of the examples below depend on these values, but any column width will show similar behavior for other strings.

Annotated output:

Options:
  -h, --help            show this help message and exit

			// buggy (no padding): newline with no space afterward
			// "Lorem ipsum dolor\nsit",		
  -a, --aoo             Lorem ipsum dolor
sit

			// correct: newline with space afterward
			// "Lorem ipsum dolor\nsit ",
  -b, --boo             Lorem ipsum dolor
                        sit

			// buggy (no wrap): autowrapped with no spaces afterward
			// "Lorem ipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing",
  -c, --coo             Lorem ipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing
			
			// buggy (no padding): autowrapped with one space afterward
			// "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed",
  -d, --doo             Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed

			// correct: autowrapped with two spaces afterward
			// "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed ",
  -e, --eoo             Lorem ipsum dolor sit amet, consectetur adipiscing
                        elit, sed
			
			// buggy (no padding when wrapping in window, maybe ok if window is wide enough to fit word, depending on desired behavior)
			// "Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor",
  -f, --foo             Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor
			
			// buggy (no wrap, whole string repeated): string with exactly one whitespace in position after len
			// "Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor foobar",
  -g, --goo             Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor foobar
Loremipsumdolorsitamet,consecteturadipiscingelit,seddoeiusmodtempor foobar

			// correct: after first line, newline with no spaces afterward
			// "Lorem ipsum\ndolor\nsit",
  -h, --hoo             Lorem ipsum
                        dolor
                        sit

			// buggy (no wrapping): after first line, autowrapped with no spaces afterward
			// "\nLorem ipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing"
  -i, --ioo
                        Lorem
                        ipsum dolor sit amet, consectetur adipiscing Loremipsumdolorsitamet,consecteturadipiscing

Diagnosis:
After printing some (possibly empty) portion of the string, if str_format looks ahead and doesn't find any additional whitespace it breaks out of the scanning loop and executes:

  ss << p << s.substr(linestart) << endl;
  return ss.str();

... where p is the whitespace prefix string. On the first line p is the empty string, and afterward it has size == pre and filled with spaces.
Issues:

  • There's a gap between where the end of the line is detected and when p is filled in with whitespace, and if there's no more whitespace after the current position the loop breaks during that gap, causing ss << p to output the empty string instead of the filled pre string.
  • The final output doesn't check to see if the current line is going to be too long and hence need another line break.
  • If the first whitespace character in the string is not a newline and is at a position after the wrapping point then the entire string will be output without any wrapping, followed by the entire string wrapped normally. This happens because pos and linestart (both size_t aka unsigned variables) are initialized to 0, and in this situation they will still be 0 when the following is executed:
ss << p << s.substr(linestart, pos - linestart - 1) << endl;

pos - linestart - 1 is 0 - 0 - 1, and unsigned arithmetic results in size_t::max aka string::npos, causing substring to output the entire line. linestart is then set to pos (still 0), pos is set to one position after the first space, and things proceed normally from there.

@weisslj
Copy link
Owner

weisslj commented May 7, 2022

@Dentomologist thanks for the detailed report, I try to look into this! If you want to write a patch, I am happy to review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants