Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashTable use in code emitted by [GeneratedRegex] #110112

Closed
Atulin opened this issue Nov 23, 2024 · 4 comments
Closed

HashTable use in code emitted by [GeneratedRegex] #110112

Atulin opened this issue Nov 23, 2024 · 4 comments
Labels
area-System.Text.RegularExpressions question Answer questions and provide assistance, not an issue with source code or documentation.

Comments

@Atulin
Copy link

Atulin commented Nov 23, 2024

I added the BannedApisAnalyzer to my project to ban all the obsolete stuff like non-generic collections, WebClient, and so on. To my surprise, I got a warning coming from RegexGenerator.g.cs file about the use of HashTable. And, sure enough, the generated code does use it, when the regex contains named capture groups.

[GeneratedRegex(@"(^|\s)(?'tag'#[\w-]{3,})($|\s)")]
private static partial Regex HashtagRegex { get; }

generates

private HashtagRegex_1()
{
    base.pattern = "(^|\\s)(?'tag'#[\\w-]{3,})($|\\s)";
    base.roptions = RegexOptions.None;
    ValidateMatchTimeout(Utilities.s_defaultTimeout);
    base.internalMatchTimeout = Utilities.s_defaultTimeout;
    base.factory = new RunnerFactory();
    base.CapNames = new Hashtable { { "0", 0 } ,  { "1", 1 } ,  { "2", 2 } ,  { "tag", 3 }  };
    base.capslist = new string[] {"0", "1", "2", "tag" };
    base.capsize = 4;
}

I checked out the source code of the generator (click) and it is, indeed, what is supposed to be emitted:

            if (rm.Tree.CaptureNumberSparseMapping is not null)
            {
                writer.Write("        base.Caps = new Hashtable {");
                AppendHashtableContents(writer, rm.Tree.CaptureNumberSparseMapping.Cast<DictionaryEntry>().OrderBy(de => de.Key as int?));
                writer.WriteLine($" }};");
            }
            if (rm.Tree.CaptureNameToNumberMapping is not null)
            {
                writer.Write("        base.CapNames = new Hashtable {");
                AppendHashtableContents(writer, rm.Tree.CaptureNameToNumberMapping.Cast<DictionaryEntry>().OrderBy(de => de.Key as string, StringComparer.Ordinal));
                writer.WriteLine($" }};");
            }

I was wondering if there's any specific reason to use a HashTable here, instead of a Dictionary<string, int>? It seemed to me that non-generic collections are kept in the language still pretty much only for backward compatibility purposes, and new code should not be using them, in favor of lists and dictionaries.

And here it is, some pretty dang new code still referencing them.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 23, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 23, 2024
@Atulin Atulin changed the title HashTable use in code emitted by RegexGenerator HashTable use in code emitted by [GeneratedRegex] Nov 23, 2024
@DaZombieKiller
Copy link
Contributor

DaZombieKiller commented Nov 23, 2024

The property being assigned to, Regex.CapNames, enforces use of HashTable by allocating one if you pass in any different kind of IDictionary object. Opting to use Dictionary<string, int> here would result in an additional allocation.

@stephentoub
Copy link
Member

And that's because these properties store the data into fields that are exposed and are strongly-typed as Hashtable:

protected internal Hashtable? caps; // if captures are sparse, this is the hashtable capnum->index
protected internal Hashtable? capnames; // if named captures are used, this maps names->index

It's an unfortunate legacy design that we can't change without it being massively breaking, due to these being public (protected) surface area.

@stephentoub stephentoub added question Answer questions and provide assistance, not an issue with source code or documentation. area-System.Text.RegularExpressions and removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 23, 2024
@stephentoub stephentoub closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@Atulin
Copy link
Author

Atulin commented Nov 23, 2024

Ah, just another case of the consequences of breaking change prevention rearing its ugly head, then. Guess I'll just have to figure out how to disable the analyzer for generated files or stop using capture groups there, no big deal.

Thanks for the answers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.RegularExpressions question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

3 participants