Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iget for fetching indexes from a non-sequence iterable #517

Open
groutr opened this issue Jun 23, 2021 · 4 comments
Open

iget for fetching indexes from a non-sequence iterable #517

groutr opened this issue Jun 23, 2021 · 4 comments

Comments

@groutr
Copy link
Contributor

groutr commented Jun 23, 2021

It would be useful, I think, to have a version of itertoolz.get that works on iterables that don't support indexing (ie sets, iterators, etc)

def iget(ind, seq):
    seq = iter(seq)
    j = 0
    for i in sorted(ind):
        if j < i:
            seq = drop(i - j, seq)
            j = i
        if j == i:
            yield next(seq)
            j += 1

With sets now ordered, it makes sense to pull out the nth item in order of insertion, or the nth line of a file

with open('file1.txt') as fin:
    lines = tuple([2, 5, 8, 9], fin)
@eriknw
Copy link
Member

eriknw commented Jun 25, 2021

Thanks @groutr. This has come up before: #97

I'm up for adding this functionality in some way.

@groutr
Copy link
Contributor Author

groutr commented Jun 29, 2021

After giving this some more thought I find that extending nth is a more appealing to me. It does require a mental shift from providing absolute indices (from the start of the iterator) to providing relative indices counting from the current position of the iterator.

My take on extending nth

def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    for i in n:
        seq = drop(i, seq)
        yield next(seq)

If I want indices 1 and 2 from 'ABC', that would be "return 1st element, then return 0th element".

>>> tuple(nth([1, 0], 'ABC'))
('B', 'C')

@Hugovdberg
Copy link

Hugovdberg commented Jul 12, 2021

wouldn't it be nicer to first calculate the difference between all indices and use that to determine how many to drop?

def lazy(f):
    yield f

@curry
def unpack_args(f, args):
    return f(*args)


def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    from operator import sub
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    else:
        sub1 = lambda x: x-1
        skip_n = compose(map(compose(sub1, unpack_args(sub), reversed)), sliding_window(2))
        n = concat((lazy(first(n)), skip_n(n)))
    for i in n:
        seq = drop(i, seq)
        yield next(seq)

This doesn't require the mental workout to get the correct differences (especially skipping zero if you want consecutive items). Also, if one wants to change an index it is a lot less error prone this way.

>>>list(nth([1,2,5,8], range(100)))
[1,2,5,8]

A major restriction of both methods is that it can only take items in increasing order:

>>>list(nth([1,2,1], "abcdefghijklmnopqrstuvwxyz"))
[...]
ValueError: Indices for islice() must be None or an integer: 0 <= x <= sys.maxsize.

Perhaps iterating over seq in sorted order and then reiterate in the requested order would be more stable.

@Hugovdberg
Copy link

A more stable approach would be like this, although it is a lot more ugly and iterates over n multiple times:

def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    seq = iter(seq)
    if not isinstance(n, Sequence):
        n = (n,)
    else:
        sub1 = lambda x: x-1
        skip_n = compose(map(compose(sub1, unpack_args(sub), reversed)), sliding_window(2))
        orig_order, n = zip(*sorted(enumerate([1,2,1]), key=lambda x:x[1]))
        n = concat((lazy(first(n)), skip_n(n)))
    output = []
    for o, i in zip(orig_order, n):
        if i == -1:
            output.append((o, value))
            continue
        seq = drop(i, seq)
        value = next(seq)
        output.append((o, value))
    for _, value in sorted(output, key=lambda x: x[0]):
        yield value

Using duplicated, not monotonic increasing indices works as expected:

>>>list(nth([1,2,1], "abcdefghijklmnopqrstuvwxyz"))
['b', 'c', 'b']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants