Skip to content

Commit

Permalink
Readme and corresponding verbose help cosmetics.
Browse files Browse the repository at this point in the history
  • Loading branch information
MrAlexSee committed Jun 13, 2018
1 parent 1706161 commit c478dbc
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 8 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

This software is called **SOPanG** (Shift-Or for Pan-Genome). It can be used for matching patterns in elastic-degenerate (ED) text (simplified pan-genome model). Authors: Aleksander Cisłak, Szymon Grabowski, Jan Holub.

ED text is in the following format: `{A,C,}GAAT{AT,A}ATT`.
Braces determine the start and end of each indeterminate segment (degenerate segments, i.e. segments having multiple variants), and commas delimit segment variants.
ED text is in a format possibly best explained with an example: `{A,C,}GAAT{AT,A}ATT`.
Braces determine the start and end of each non-deterministic segment (a segment having multiple variants), and commas delimit segment variants.
If a comma is not preceded by a string of letters or it is a trailing symbol in a segment, it indicates an empty word.
To give an example, all three notations: `{,A,C}`, `{A,,C}`, and `{A,C,}` mean the same, which is a segment which accepts either a string `A`, or a string `C`, or an empty word.
Determinate segments (i.e. segments having a single variant) are stored as regular contiguous strings.
Deterministic segments (i.e. segments having a single variant) are stored as regular contiguous strings.
Note that, e.g., `{AC,CG}` and `{AC, CG}` are not the same (the latter would expect a space in its second variant).
Therefore, you should not use whitespaces in the ED text if not intended.

Expand All @@ -19,7 +19,7 @@ Add Boost library to the path for compilation by setting `BOOST_DIR` in the make
Type `make` for optimized compile.
Comment out `OPTFLAGS` in the makefile in order to disable optimization.

Tested with gcc 64-bit 7.2.0 and Boost 1.67.0 (the latter is not time-critical, used only for parameter and data parsing and formatting) on Ubuntu 17.10 Linux version 4.13.0-36 64-bit.
Tested with gcc 64-bit 7.2.0 and Boost 1.67.0 (the latter is not performance-critical, used only for parameter and data parsing and formatting) on Ubuntu 17.10 Linux version 4.13.0-36 64-bit.

A binary (compiled executable) for Linux is available in the release (file name `sopang`).

Expand Down Expand Up @@ -103,8 +103,8 @@ Parameter name | Parameter description
---------------------- | ---------------------
`nSegments` | total number of segments
`alphabet` | alphabet for character sampling
`nDegeneratePositions` | number of segments (must be smaller than or equal to `nSegments`) which are degenerate (indeterminate), i.e., contain multiple variants
`nMaxSegmentVariants` | maximum number of variants (`a`), the number of variants for each degenerate segment will be sampled from the interval `[2, a]`
`nDegeneratePositions` | number of segments (must be smaller than or equal to `nSegments`) which are non-deterministic, i.e., contain multiple variants
`nMaxSegmentVariants` | maximum number of variants (`a`), the number of variants for each non-deterministic segment will be sampled from the interval `[2, a]`
`nMaxVariantLength` | maximum length of each segment variant (`b`), the length for each variant will be sampled from the interval `[0, b]` (segments might contain empty words)
`outFile` | output file path

4 changes: 2 additions & 2 deletions params.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ struct Params

const string verboseInfoString =
"This software is called SOPanG (Shift-Or for Pan-Genome). It can be used for matching patterns in elastic-degenerate (ED) text (simplified pan-genome model). Authors: Aleksander Cislak, Szymon Grabowski, Jan Holub.\n"
"ED text is in the following format: {A,C,}GAAT{AT,A}ATT. Braces determine the start and end of each indeterminate segment (i.e. segment having multiple variants), and commas delimit segment variants.\n"
"ED text is in a format possibly best explained with an example: {A,C,}GAAT{AT,A}ATT. Braces determine the start and end of each non-deterministic segment (i.e. segment having multiple variants), and commas delimit segment variants.\n"
"If a comma is not preceded by a string of letters or it is a trailing symbol in a segment, it indicates an empty word.\n"
"To give an example, all three notations: {,A,C}, {A,,C}, and {A,C,} mean the same, which is a segment which accepts either a string A, or a string C, or an empty word.\n"
"Determinate segments (i.e. segments having a single variant) are stored as regular contiguous strings.\n"
"Deterministic segments (i.e. segments having a single variant) are stored as regular contiguous strings.\n"
"Note that, e.g., {AC,CG} and {AC, CG} are not the same (the latter would expect a space in its second variant). Therefore, you should not use whitespaces in the ED text if not intended.\n"
"SOPanG returns the end positions of pattern occurrences in the ED text. More precisely, it returns the set of segment indexes in which pattern occurrences end (without possible duplicates).";

Expand Down

0 comments on commit c478dbc

Please sign in to comment.