Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using proxy #153

Open
ciokan opened this issue Jan 17, 2017 · 9 comments
Open

When using proxy #153

ciokan opened this issue Jan 17, 2017 · 9 comments
Assignees

Comments

@ciokan
Copy link

ciokan commented Jan 17, 2017

I see sockets still being used (without proxy) even when a proxy opener is provided and obj.lookup_rdap used. Is this safe when doing a lot of requests?

@secynic
Copy link
Owner

secynic commented Jan 17, 2017

It only proxies the HTTP requests (RDAP lookups). You are probably seeing the sockets being used for ASN lookups via DNS. The ASN lookups are generally very fast, so I wouldn't worry about the overhead on those.

As a test, try setting bootstrap=True, and see if those non-proxied sockets disappear.

I am working on a bulk lookup solution for this library, so this would be a good consideration.

@ciokan
Copy link
Author

ciokan commented Jan 18, 2017

I'm not worried about the overhead. I'm worried about getting banned when doing multiple requests since this thing is exposed as an API. For http proxies you will have to do a CONNECT via the sockets and authenticate. I have working code for that if you're interested. Ofc you will have to pull them out from the ProxyHandler.

If I find the time maybe I'll submit a PR. We will probably have to create a separate (single point) method that opens up sockets with consideration of the provided proxies which may be at least of types http with connect, socks5 and socks4

@secynic
Copy link
Owner

secynic commented Jan 18, 2017

I haven't had any issues with bans, only rate limiting.

From what I understand (correct me if I'm not reading this correctly), if a proxy is provided, you want the ability to route all traffic (DNS, WHOIS, HTTP) over socks4/5/HTTP proxy? Also you mention proxies (plural); do you mean to load-balance across multiple proxy ips, or specify a different proxy server per lookup method?

Maybe you can clarify a bit. For instance, ASN lookups are best performed over DNS (https://github.com/secynic/ipwhois/blob/master/ipwhois/net.py#L217), but fallback to whois (https://github.com/secynic/ipwhois/blob/master/ipwhois/net.py#L285) and HTTP (https://github.com/secynic/ipwhois/blob/master/ipwhois/net.py#L384).

Edit:
I just realized I skipped over your comment on CONNECT. Please elaborate on this, as I think you may be hinting at persistent connections, which wouldn't apply to the REST (RDAP) queries but may apply to the ASN lookups.

@ciokan
Copy link
Author

ciokan commented Jan 18, 2017

I'm providing the proxy per lookup. Nothing fancy as this is something to be done by the user and I don't see it fit inside your package honestly as it's not a one size fits all type of thing.

Regarding traffic, whoever uses proxies does it for a reason so all traffic originating from this package should be done via the supplied proxy. People do it to overcome limitations such as banning or rate limiting.

Regarding the CONNECT bit, it was just a hint. I was referring to the parts when you open up sockets. Using http proxy inside the socket would require you to write some CONNECT directives into it (ofc the proxy would have to support CONNECT and have the port 43 open).

Here's a bit from what I'm using on a similar project:

def http_proxy_connect(address=None, proxy=None, auth=None):
    def valid_address(addr):
        """ Verify that an IP/port tuple is valid """
        return isinstance(addr, (list, tuple)) and len(addr) == 2 and isinstance(addr[0], str) and isinstance(addr[1], int)

    if not valid_address(address):
        raise ValueError('Invalid target address')

    if not proxy:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect(address)
        return s, 0, {}

    if not valid_address(proxy):
        raise ValueError('Invalid proxy address')

    _headers = {
        'host': address[0]
    }

    if auth:
        if isinstance(auth, str):
            _headers['proxy-authorization'] = auth
        elif auth and isinstance(auth, (tuple, list)):
            if len(auth) == 1:
                raise ValueError('Invalid authentication specification')

            t = auth[0]
            args = auth[1:]

            if t.lower() == 'basic' and len(args) == 2:
                auth_basic = "%s:%s" % args
                _headers['proxy-authorization'] = 'Basic ' + str(to_base64(auth_basic))
            else:
                raise ValueError('Invalid authentication specification')
        else:
            raise ValueError('Invalid authentication specification')

    s = socket.socket()
    s.connect(proxy)

    fp = s.makefile('rw')
    fp.write('CONNECT %s:%d HTTP/1.1\r\n' % address)
    fp.write('\r\n'.join('%s: %s' % (k, v) for (k, v) in _headers.items()) + '\r\n\r\n')
    fp.flush()

    statusline = fp.readline().rstrip('\r\n')

    if statusline.count(' ') < 2:
        fp.close()
        s.close()
        raise IOError('Bad response')

    version, _status, statusmsg = statusline.split(' ', 2)

    if not version in ('HTTP/1.0', 'HTTP/1.1'):
        fp.close()
        s.close()
        raise IOError('Unsupported HTTP version')
    try:
        _status = int(_status)
    except ValueError:
        fp.close()
        s.close()
        raise IOError('Bad response')

    response_headers = {}

    while True:
        tl = ''
        l = fp.readline().rstrip('\r\n')
        if l == '':
            break
        if not ':' in l:
            continue
        k, v = l.split(':', 1)
        response_headers[k.strip().lower()] = v.strip()

    fp.close()
    return s, _status, response_headers

@secynic
Copy link
Owner

secynic commented Jan 18, 2017

Understood. Actually, I originally wrote the proxy support for corporate networks that commonly block outbound port 43. The library started with only whois, and more recently the RDAP protocol was introduced, so support was tacked on, and the library re-written. This was before anon proxies/vpns were very popular for these types of things.

That being said, you make a good point. Let me look over your code and see what we can do.

Thanks for the detailed info.

@secynic secynic self-assigned this Jan 18, 2017
@secynic secynic added this to the 1.0.0 milestone Jan 20, 2017
@secynic
Copy link
Owner

secynic commented Mar 29, 2017

Update: I apologize for the delays. I have been busy with work and other side projects.

This is currently sitting in priority behind:

  1. Deprecate asn_alts/allow_permutations in favor of new argument: asn_methods #158 asn_alts deprecation
  2. Bulk query wrapper #134 Bulk whois (has been open longer, and needs consideration for CONNECT implementation)

@secynic
Copy link
Owner

secynic commented Jul 11, 2017

@ciokan I added bulk lookup support in experimental.py (ipwhois.experimental.bulk_lookup_rdap):
https://github.com/secynic/ipwhois/blob/dev/ipwhois/experimental.py

You won't need to worry about getting banned for the ASN lookups, since the Cymru bulk ASN lookup can be done with a single request (ipwhois.experimental.get_bulk_asn_whois).

I believe this will solve your problem (at least for the short term). I would like to get v1.0.0 out soon, so I will open a new issue linked to this to be addressed in 1.x.x for individual queries. Let me know your thoughts, and if you get a chance to test.

@secynic
Copy link
Owner

secynic commented Jul 18, 2017

Moving to 1.1.0 to remove any confusion, instead of opening a new issue.

@secynic
Copy link
Owner

secynic commented Oct 26, 2018

@ciokan Did you get a chance to test this?

@secynic secynic modified the milestones: 1.1.0, 1.2.0 Jan 29, 2019
@secynic secynic removed this from the 1.2.0 milestone Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants