Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For detached mode, add a mechanism to provide a list of valid consumers that can serve up S3. #1979

Open
2 tasks
JustinKyleJames opened this issue Jun 10, 2021 · 4 comments
Assignees

Comments

@JustinKyleJames
Copy link
Contributor

JustinKyleJames commented Jun 10, 2021

  • master
  • 4-2-stable

We have had requests to limit the number of servers that can serve S3 in detached mode. Provide this list and adjust voting accordingly.

If this list does not exist, then all consumers would be assumed to be able to serve up the request.

@trel
Copy link
Member

trel commented Jun 10, 2021

so this is the 'allow' list... do we also need to provide a complementary 'ignore' list?

and we'd need to consider when those are in conflict.

do we use regular expressions?

if so, the defaults would be...
allow = ['*']
ignore = ['']

OR

just have an ignore list? then the default is easier (empty list), and no need for regex to start.

@jbeal-work
Copy link

My model when I read about the s3 connector was that you registered a consumer as being able to serve a resource.

So each S3 resource would have a list of consumers that it's data could be accessed by.

Then when my client connected to a provider then the provider would load balance over the list of available consumers.

@trel
Copy link
Member

trel commented Jun 11, 2021

Except for the caveat of distinction between consumer and provider (which is a distinction/shortening of 'Catalog Service Provider' and 'Catalog Service Consumer')... I'm reading your description @jbeal-work the same as what @JustinKyleJames described... a context string entry for a particular s3 resource that defines 'from where' it can be served (when in HOST_MODE=cacheless_detached).

The question I'm asking is whether that definition should be the include, or the exclude, list (or BOTH?). A default of excluding the empty list would be equivalent of today's behavior, and most easily backwards compatible. A default of including 'all' would be the equivalent of today's behavior, but requires that we use a regex (.*) (probably preferred) or special keyword (all), potentially polluting the server namespace a tiny bit.

The scenario that brings this question is a Zone having 250 iRODS servers, mostly with a similar configuration and networking. If there are only 2-3 machines that are NOT supposed to serve s3 content (b/c they are providers behind HA, or some other networking reason), then it will be much more reasonable to define the 'exclude' or 'ignore' list with 2-3 hostnames than defining an allow list of 247 hostnames (which will probably not fit in the context string itself, varchar(1000)).

It seems the most robust and future-proof solution is to provide both include and exclude lists, with defaults, both allowing regex, with clearly defined priority when they are in conflict/overlap (aka 'exclude' trumps 'include').

So, by default it would be...

candidate_server_list = include_list (default=all) - exclude_list (default=[])

@jbeal-work
Copy link

So I would not expect all our consumers to be S3 capable. I would expect to have a set of virtual or physical irods/s3 gateways.

I would expect S3 resource to be a table with for example.

Resource_id
Resource_name
S3_DEFAULT_HOSTNAME=cog.sanger.ac.uk;
S3_SECRET=******
S3_ACCESS= *****
S3_REGIONNAME=
S3_RETRY_COUNT=1
S3_WAIT_TIME_SEC=3
S3_PROTO=HTTPS
ARCHIVE_NAMING_POLICY=consistent

You would then register a host by adding an entry to a many to many table would have an entry for the s3 resource and another for the consumer which can be used to access it. removing a machine would be deleteing an entry from the table

s3_resource: id
consumer: id

Finding which consumers can be used is just a select consumer where s3_resouce=3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants