Skip to content

Latest commit

 

History

History
83 lines (64 loc) · 3.68 KB

exclusion.md

File metadata and controls

83 lines (64 loc) · 3.68 KB

Tsumugu exclusion/inclusion logic and rules

Currently tsumugu follows a simple algorithm to determine whether a path should be completely excluded, partially excluded, or included:

  1. When parsing regex, a rev_inner regex will be generated by replacing variables (${UBUNTU_LTS}, etc.) to (?<distro_ver>.+) (aka, match everything). The rev_inner would be used like this:

    pub fn is_others_match(&self, text: &str) -> bool {
        !self.inner.is_match(text) && self.rev_inner.is_match(text)
    }
  2. First, users' exclusions and inclusions are preprocessed. For all exclusions, if it is a prefix of any inclusion, it will be put into the list_only_regexes, otherwise it will be put into instant_stop_regexes. All inclusions are in include_regexes.

  3. While working threads are handling listing requests:

    1. Check with instant_stop_regexes and include_regexes:

      for regex in &self.instant_stop_regexes {
          if regex.is_match(text) {
              return Comparison::Stop;
          }
      }
      for regex in &self.include_regexes {
          if regex.is_match(text) {
              return Comparison::Ok;
          }
      }
    2. Then, the path will be checked with rev_inner regex by is_others_match(), and also completely excluded if matches (a fast shortcut).

      This is used for cases like Fedora -- it has many versions (currently from 1 to 40). Listing other version folders not in ${FEDORA_CURRENT} is a waste of time and network. With this trick we could skip these unmatched versions.

    3. Finally, if the path matches list_only_regexes, files under this directory will be ignored (unless they are matched by include_regexes), but subdirectories will still be listed. Paths that are not matched by any regexes will be included as usual.

In this process some paths, which would be unnecessary, will still be listed. However, this logic suits needs of filtering OS versions well.

Also note that currently, this is used when generating relative path for comparison:

pub fn relative_to_str(relative: &[String], filename: Option<&str>) -> String {
    let mut r = relative.join("/");
    if r.starts_with('/') {
        warn!("unexpected / at the beginning of relative ({r})");
    } else {
        r.insert(0, '/');
    }
    if r.len() != 1 {
        if r.ends_with('/') {
            warn!("unexpected / at the end of relative ({r})")
        } else {
            r.push('/')
        }
    }

    // here r already has / at the end
    match filename {
        None => r,
        Some(filename) => {
            assert!(!filename.starts_with('/') && !filename.ends_with('/'));
            format!("{}{}", r, filename)
        }
    }
}

As a result:

  1. All relative paths for comparison have "/" at front.
  2. Directory paths have "/" at back, and files don't.

Examples:

  1. http://example.com/file => /file
  2. http://example.com/dir => /dir/
  3. http://example.com/dir/file => /dir/file

Not that for compatibilities considerations, this trick is done: User regex which starts with ^ and not ^/, would be replaced: ^ -> ^/ (this might break some very rare regexes).

So you could write /something$ to exclude ALL files and directories with name something, instead of using 2 regexes (^something$ and /something$, to match something at root and others not in root).

And also, upstream itself is NOT included when comparing. So if your upstream is set to https://some.example.com/dir/, you need to exclude ^something/ to exclude https://some.example.com/dir/something/ instead of ^dir/something/.

Test with tsumugu list, if in doubt.