Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add keyword filters for node subscription #83

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Wikeolf
Copy link

@Wikeolf Wikeolf commented Dec 22, 2024

The implementation is somewhat crude with a high degree of code coupling, which might require consideration regarding merging. These commits primarily introduce a keyword filtering feature for node subscriptions. The aim is to address issues similar to #32 and #42, allowing users to filter out nodes that are confirmed to be unable to connect to the internet. This helps avoid problems like rule download failures during the first startup.

@7Sageer
Copy link
Owner

7Sageer commented Dec 22, 2024

Thanks for contributing! The previous issue #55 seemed to mention this issue, which seems to be the real reason - nodes with true connection latency are unlikely to be so outrageous. Just now I tested the speed of the cold start, which is just a few seconds, I think that's acceptable.

Does the PR has any other functional improvement?

@Wikeolf
Copy link
Author

Wikeolf commented Dec 22, 2024

Thanks for contributing! The previous issue #55 seemed to mention this issue, which seems to be the real reason - nodes with true connection latency are unlikely to be so outrageous. Just now I tested the speed of the cold start, which is just a few seconds, I think that's acceptable.

Does the PR has any other functional improvement?

I apologize for not describing the problem clearly in #42. In fact, the issue is similar to #32.

In my environment, I use keyword filtering to exclude nodes that do not connect to the Internet, such as official websites and traffic information, which are mixed into the subscription data. This approach improves the issue of rule downloads failing during the first startup. I also attempted to change the default DNS configuration, but removing the geosite did not resolve the problem.

image
This PR implements a feature to filter nodes in subscription information using keywords, allowing users to exclude unnecessary nodes. While this functionality is common in other subscription converters, my implementation is coupled. Whether to merge it into the mainline still requires consideration. Perhaps it should be refined further before being merged?

If you have any suggestions or alternative approaches, I would greatly appreciate them, as this solution does effectively address my issue.

@jaydong2016
Copy link

据说 singbox 使用自动选择组作为代理下载,并不会真的执行 urltest 选择最低延迟节点,只会选择组内的第一个节点作为代理节点,所以如果第一个节点不可用时,是可能会造成规则下载不来

@7Sageer
Copy link
Owner

7Sageer commented Dec 23, 2024

据说 singbox 使用自动选择组作为代理下载,并不会真的执行 urltest 选择最低延迟节点,只会选择组内的第一个节点作为代理节点,所以如果第一个节点不可用时,是可能会造成规则下载不来

这个问题有点意思,我有时间去深究一下

If you have any suggestions or alternative approaches, I would greatly appreciate them, as this solution does effectively address my issue.

关于这部分,我觉得可能做成默认排除这几个关键词比较好?大部分用户可能不会深入了解这个功能,Filter Keyword的提示也不是很明确。再考虑到”剩余,官网,到期“这类词语应该不会出现在正常节点中,也许直接简单的过滤比较好?

@Wikeolf
Copy link
Author

Wikeolf commented Dec 24, 2024

关于这部分,我觉得可能做成默认排除这几个关键词比较好?大部分用户可能不会深入了解这个功能,Filter Keyword的提示也不是很明确。再考虑到”剩余,官网,到期“这类词语应该不会出现在正常节点中,也许直接简单的过滤比较好?

I have considered whether this should be default filtered. It seems there would be no ambiguity with default filtering in the web UI, but without a "filters" field in the API link, some nodes are always filtered out, which is not what users who only use the API would expect—although it is unknown if such users really exist.

There is indeed a possibility that users may not need to use or understand this feature. Perhaps it should be placed in advanced options, or an additional display switch should be added.

It is true that nodes containing these keywords are invalid, but some proxy sellers might include them in the subscription to inform users of the expiration date and remaining traffic.

Moreover, such words may have synonyms. If it's just a simple filter, I'm afraid there'll be some leaks. Giving users the option to enable this feature and giving them the ability to customize keywords may help in cases where subscription contain low-quality nodes, and users can add custom keywords, such as the name of the country/region to be filtered.

I may refactor the functionality implemented in this PR soon to achieve decoupling.

@7Sageer
Copy link
Owner

7Sageer commented Dec 24, 2024

Or a simple idea to let the user choose the name of the node downloading rule set directly in the advanced options?

@Wikeolf
Copy link
Author

Wikeolf commented Dec 24, 2024

Or a simple idea to let the user choose the name of the node downloading rule set directly in the advanced options?

This may be the simplest and most direct method, which would require the user to know the name of the node in the subscription information provided by the proxy seller, or the user to have a node of their own.

This filtering feature, or directly specifying the node option for the download rule, is just a temporary workaround for issues like #32 and #42. Perhaps the best way to fix this bug is to provide feedback to upstream and fix the urltest for the rule download.

I think this PR can be retained. I will try to improve it as much as I can, as I believe this feature has some other usage scenarios besides bypassing the issues mentioned above. I will respect your opinion on whether to merge it or not.

@7Sageer
Copy link
Owner

7Sageer commented Dec 24, 2024

据说 singbox 使用自动选择组作为代理下载,并不会真的执行 urltest 选择最低延迟节点,只会选择组内的第一个节点作为代理节点,所以如果第一个节点不可用时,是可能会造成规则下载不来

I will take a look at this problem as soon as I have enough time (maybe in few weeks) and try to solve this.

Perhaps the best way to fix this bug is to provide feedback to upstream and fix the urltest for the rule download.

I agree with you, I think this PR makes the GUI a little confusing to use for the first users although it can really solve the problem. And my other two ideas above just do not work for all situations.

Another possible solution is creating a mirror repository of the current GitHub repo using a CDN or a Cloudflare Worker and downloading directly from it. There might already be a repository like this, so changing the ruleset repo link seems better.

I'll prioritize upstream repair and mirror repositories. If these scenarios don't make substantial progress, I'll re-evaluate this PR as a fallback. Next, I'll start by researching the upstream URLtest implementation, and I'll also investigate the options for the mirror repository. I will keep you updated on my progress.

@7Sageer
Copy link
Owner

7Sageer commented Jan 10, 2025

I've decided to use gh-proxy as the solution since it works well with Cloudflare Workers and meets our requirements. I'll monitor its performance over the next few days. If it proves stable and efficient, I'll close this PR. Thank you again for all your contributions!

@Wikeolf
Copy link
Author

Wikeolf commented Jan 11, 2025

I've decided to use gh-proxy as the solution since it works well with Cloudflare Workers and meets our requirements. I'll monitor its performance over the next few days. If it proves stable and efficient, I'll close this PR. Thank you again for all your contributions!

I saw this change, is this gh-proxy built using your Cloudflare Worker?
It seems that Cloudflare has some limitations on the usage of Workers for free users. If the project has hardcoded the reverse proxy built by your Cloudflare Worker, it may cause some trouble for you.
If you need to use a reverse proxy, I think it's reasonable to implement a sub-module for reverse proxy in this project. This way, the requests can be distributed to each user, who has their own free quota to use, rather than concentrating on your link, which seems too dangerous for your wallet.

@7Sageer
Copy link
Owner

7Sageer commented Jan 11, 2025

Thanks for the heads-up! I've considered the Cloudflare Worker limitations, and I am aware that the free tier has some limitations. However, the current free tier allowance seems adequate for our anticipated use case, especially given the anticipated traffic based on current usage.

Also, I haven't linked a credit card to my worker account, so there won't be any unexpected charges. If the usage does reach the limit, I'd happily consider the pro plan for $5😂.

I appreciate your suggestion of implementing a sub-module for reverse proxy – it's definitely something I'll actively investigate if the need arises or if gh-proxy proves unsustainable. For now, I don't think that's necessarry for the extra difficulty in coding and deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants