Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclusions: "all_missing" #359

Open
jamesrkg opened this issue May 27, 2019 · 22 comments
Open

Exclusions: "all_missing" #359

jamesrkg opened this issue May 27, 2019 · 22 comments
Assignees
Labels

Comments

@jamesrkg
Copy link

On 184bfcef7f8f4331b229c0e7606db425:

>>> ds.exclude('disposition in [0] or missing(bix_MY_ecommerce_unaided_recode)')`
>>> ds.get_exclusion()

ExceptionTraceback (most recent call last)
<ipython-input-21-22f17b03337f> in <module>()
----> 1 ds.get_exclusion()

C:\ProgramData\Anaconda2\lib\site-packages\scrunch\datasets.pyc in get_exclusion(self)
   2281             return None
   2282         expr = exclusion['body'].get('expression')
-> 2283         return prettify(expr, self) if expr else None
   2284 
   2285     def add_filter(self, name, expr, public=False):

C:\ProgramData\Anaconda2\lib\site-packages\scrunch\expressions.pyc in prettify(expr, ds)
    796         return _transform(_func, args, nest=nest)
    797 
--> 798     return _process(_resolve_variables(expr))

C:\ProgramData\Anaconda2\lib\site-packages\scrunch\expressions.pyc in _process(fragment, arent)
    783             return list(fragment.values())[0]
    784 
--> 785         args = [_process(arg, _func) for arg in fragment['args']]
    786         child_functions = [
    787             arg.get('function')

C:\ProgramData\Anaconda2\lib\site-packages\scrunch\expressions.pyc in _process(fragment, arent)
    794             _func == 'or'
    795         )
--> 796         return _transform(_func, args, nest=nest)
    797 
    798     return _process(_resolve_variables(expr))

C:\ProgramData\Anaconda2\lib\site-packages\scrunch\expressions.pyc in _transform(f, args, nest)
    752             result = '%s(%s)' % (functions[f], args[0])
    753         else:
--> 754             raise Exception('Unknown function "%s"' % f)
    755 
    756         if nest:

Exception: Unknown function "all_missing"

Has the Crunch API exposed the missing function with a new name @jjdelc ?

If so, it looks like scrunch.expressions.CRUNCH_FUNC_MAP may be out of date.

@jamesrkg jamesrkg added the bug label May 27, 2019
@jamesrkg jamesrkg changed the title ds.get_exclusion Exception: Unknown function "all_missing" Exclusions: "all_missing" May 27, 2019
@jamesrkg
Copy link
Author

It looks like we will also need to add all_missing as a supported function when applying an exclusion as well:

>>> ds.exclude('disposition in [0] or all_missing(bix_MY_ecommerce_unaided_recode)')`

ValueError: unknown method "all_missing", valid methods are: [bin, all, duplicates, selected, not_selected, any]

@jamesrkg
Copy link
Author

@jjdelc can you advise on the place of missing vs missing_all? I've done some testing:

expressions.py:

CRUNCH_FUNC_MAP = {
    'valid': 'is_valid',
    'missing': 'all_missing',
    'bin': 'bin',
    'selected': 'selected',
    'not_selected': 'not_selected',
}

CRUNCH_METHOD_MAP = {
    'any': 'any',
    'all': 'all',
    'duplicates': 'duplicates',
    'bin': 'bin',
    'selected': 'selected',
    'not_selected': 'not_selected',
    'all_missing': 'all_missing'
}
>>> conn.ds.get_exclusion()
u'disposition in [0] or missing(bix_MY_ecommerce_unaided_recode)'

>>> conn.ds.exclude('disposition in [0] or bix_MY_ecommerce_unaided_recode.all_missing()')
>>> conn.ds.get_exclusion()
u'disposition in [0] or bix_MY_ecommerce_unaided_recode.all_missing()'

all_missing is completely new to us/scrunch.

@jjdelc
Copy link
Contributor

jjdelc commented May 30, 2019

This looks like a translation problem. Looking at the logic used to transform, there seems to be a nomenclature problem.

The methods are translated to method(var) and the functions are translated as var.func_name() which should be the other way around.

The all_missing crunch function returns True when all the subvariables of an array are missing (be this a multiple response categorical or a categorical array).

I'm not sure of what would be the syntax that feels more natural to you, but depending on your choise of method(var) vs var.function() it should be added to just one of the constants you modified above.

i'd recommend avoid renaming the functions in a map, as this could cause collisions in the future if we grow more functions on zz9 that are already used in scrunch.

@jamesrkg
Copy link
Author

Just to clarify, the above CRUNCH_FUNC_MAP and CRUNCH_METHOD_MAP show the result of my testing, in scrunch master all_missing is not present at all, so there is a bug here somewhere as shown at the top of the ticket.

@jjdelc do you mean all_missing is not new/has always been there? If so, what can explain us now getting an error when scrunch tries to return ds.get_exclusions() then?

@jjdelc
Copy link
Contributor

jjdelc commented May 31, 2019

I think this function has been around for a while.
It is possible that this exclusion filter was set in another way? And now that is read from scrunch is failing to translate.

The fix for this would be in 2 part:

  • Decide if you want all_missing(variable) or variable.all_missing() syntax. And add the function to one of the variables only depending on the desired syntax.
  • Decide if we want to fallback on the code for new functions that may grow on Crunch that scrunch doesn't know of yet, so they don't break and at least have some representation, maybe function(variable) is good enough - Arguments may be a problem to interpret, but we could device a generic way to handle simple cases and only decide to break on more complex nested expressions.

@jamesrkg
Copy link
Author

jamesrkg commented Jun 4, 2019

In the original example we set the exclusion using scrunch in the same way we always have, so nothing seems to be different on our side with this. It is definitely new behavior for this to throw an error:

>>> ds.exclude('disposition in [0] or missing(bix_MY_ecommerce_unaided_recode)')`
>>> ds.get_exclusion()

...
Exception: Unknown function "all_missing"
  • Regarding all_missing(alias) vs alias.all_missing(), the former is more consistent with established use for missing(alias).
  • @xbito you may need to comment on the generic case for unknown future functions as regards the parser, but the function(alias) representation looks fine for me.

@jjdelc
Copy link
Contributor

jjdelc commented Jun 4, 2019

It looks like this problem of going back and forth happens within scrunch. But overall we should avoid translating functions because we could cause conflicts as the Crunch API grows more functions that may collide with some of Scrunch translations.

@jamesrkg
Copy link
Author

jamesrkg commented Jun 6, 2019

Being able to translate filter payloads back to scrunch syntax is actually a very important feature because it allows us to save a filter to re-apply later. An example of a situation where we might want to do this is described for derive_weight, here (see the Note after the main method description/example):

https://github.com/Crunch-io/scrunch/wiki/User-Reference#derive_weight

Of course, it's also just important for human users to be able to make sense of an existing filter.

@jjdelc
Copy link
Contributor

jjdelc commented Jun 6, 2019

Sorry, with translating I meant to have function names differ from Scrunch and the Crunch API. If we have a function called is_valid, it should also be is_valid in Scrunch instead of calling it valid. The day we introduce a function called valid will cause problems.

@jamesrkg
Copy link
Author

jamesrkg commented Jun 7, 2019

Right, I get it now. This came up in a conversation I had with Fernando yesterday as well and I think we all agree on this point. Making a switch to Crunch-native function names across the board for scrunch's expression syntax is something we can cope with on our side if you want to move in that direction.

@jjdelc
Copy link
Contributor

jjdelc commented Jun 7, 2019

We could raise a deprecation warning if the translations are used. And have some compatibility if needed temporarily, but I think we should move forward avoiding changing function names.

@mathiasbc
Copy link
Contributor

we were translating the function missing to the crunch function all_missing but I think we wanted is_missing so I changed that. I will hold the release until @jamesrkg confirms this is ok.
https://docs.crunch.io/object-reference/object-reference.html?highlight=all_missing#binary-functions

@jjdelc
Copy link
Contributor

jjdelc commented Jul 15, 2019

Careful about changing function names as there may be a bunch of scripts that use it. Do you have a PR to look at the change?

@jamesrkg
Copy link
Author

Based on the docs is_missing is for a non-array variables and all_missing is for an array variable. Is this perhaps why we ended up with just missing in the scrunch syntax, as a catch-all for both? It does feel tedious to have to decide which to use depending on the variable structure.

I know above we agreed to disallow non-canon function names but in the case of is_valid/is_missing and all_valid/all_missing the convenience, in addition to being able to use those as-is, of being able to use the shorthand valid/missing would be greatly appreciated. I feel that the forcing the distinction here for the typical scrunch user is too strict a requirement and will lead to more complicated code.

@mathiasbc
Copy link
Contributor

So what should we do ? I think we should change the parser to use strict names for all_missing, is_missing. Cut a release with a warning that missing is no longer supported.

@xbito
Copy link
Contributor

xbito commented Jul 18, 2019

After having a conversation with Jamie this is what I think we should do:

  1. Allow is_missing/all_missing is_valid/all_valid in expressions in scrunch
  2. Continue allowing missing/valid shortcuts where scrunch makes the determination if the variable passed is an array or not to select which missing function to pass to Crunch. But, throw a deprecation warning.
  3. When coming back from crunch to scrunch through get_exclusion lets use the canonical Crunch function name all_missing/is_missing all_valid/is_valid.

@jamesrkg
Copy link
Author

Can we support all of the logical functions here:

https://docs.crunch.io/object-reference/object-reference.html?highlight=all_missing#logic-functions

And with regard to the shorthand valid and missing, can we plan to keep these indefinitely? I'd rather not ever deprecate them if possible. If so some other warning may be more appropriate.

@jjdelc
Copy link
Contributor

jjdelc commented Jul 19, 2019

Agreed to @xbito's comment. The support for the other functions should be dealt with in another ticket. Since I suppose some syntax choices need to be made.

@mathiasbc
Copy link
Contributor

mathiasbc commented Jul 19, 2019

that's not gonna work, we use a dictionary to map srunch functions to Crunch functions and viceversa. I can't have 2 keys or 2 values mapping to is_missing or all_missing. Currently the scrunch function missing maps to all_missing there is no magic happening where Crunch knows what to do depending on the wrapped variable.

My proposal is to either:

  1. Support canonical all_missing and is_missing in Scrunch which directly map to Crunch functions, which implies dropping support for missing.
  2. Keep support for missing mapping to all_missing and add is_missing as scrunch function.

The rest of the functions would need to be addressed on a new ticket as mentioned by Jj

@jamesrkg
Copy link
Author

In that case I guess the request here is not to use a dict as-is, but to wrap this with a function that can, in the case of missing, differentiate between the required all_missing or is_missing, based on the dataset's variables catalog.

@mathiasbc
Copy link
Contributor

That is going to be a lot more work and if we are going to start supporting variable context analysis then we should plan how to do it. For the context of this ticket we simply translate the JSON that comes from Crunch into a human-friendly form. Making the parser aware of the type of variables it's translating is a whole new world.

@mathiasbc
Copy link
Contributor

@xbito The code to deal with expressions is here: https://github.com/Crunch-io/scrunch/blob/master/scrunch/expressions.py.

mappings happen here: https://github.com/Crunch-io/scrunch/blob/master/scrunch/expressions.py#L59

This function parse_expr translates natural expressions to the JSON synax: https://github.com/Crunch-io/scrunch/blob/master/scrunch/expressions.py#L168

The function process_expr translates variable names/aliases to urls: https://github.com/Crunch-io/scrunch/blob/master/scrunch/expressions.py#L454. Is usually used as a last step of the process to send data to the Crunch API

And finally pretiffy transforms a JSON expression to the string equivalent that we use: https://github.com/Crunch-io/scrunch/blob/master/scrunch/expressions.py#L691

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants