Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizations #5

Open
projectgroepsteenplaat opened this issue Dec 24, 2019 · 7 comments
Open

optimizations #5

projectgroepsteenplaat opened this issue Dec 24, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@projectgroepsteenplaat
Copy link

  • Indentation can be minified
  • Sometimes it transforms a literal into a variable when the literal is only used once and it becomes less minified
  • Large numbers will be converted to hex, but for example 1000000000000 could be 10**12 but instead I get 0xe8d4a51000, which is shorter than the original, but could be smaller
  • Repeating the same command a lot of times in succession can be minified to a for loop with the command in it.
  • The print command will convert an integer to a string so print(str(2)) can become print(2)
  • str(1) can become '1' and str(2) can become '2' and so on
  • import decimal can become import decimal as d, this will be more bytes for the import, but will be less for every call to the decimal module
  • if 1 == 2: print("Hello, how are you?") can be removed completely as 1 does never equal 2
  • print(1+1) can become print(2)
  • pow(a,b) can become a**b as that is shorter
  • pow(a,b,c) can become a**b%c, but this is slower so it may should be optional
  • a=1;a=2 can be changed into a=2 as that will be the final value anyway
  • print("Hello, how are you?");print("Hello, how are you now?") can become
    a="Hello, how are you";print(a+"?");print(a+" now?")
  • for i in range(10):pass becomes for i in range(10):0, but it can be completely ignored
@dflook
Copy link
Owner

dflook commented Dec 24, 2019

Hi, thanks for taking the time to create this issue!

I realise I haven't written the goals for this project anywhere, so here they are:

  1. With no transforms enabled, the output should produce exactly the same program as the input when parsed (an identical AST).
  2. Transformations should take sensible, well written code and produce code of the same or shorter length with the same behaviour. The existing transformations do things that would be difficult, tedious or just look strange to a programmer.
  3. Never produce broken code.
  4. Works for supported versions of cpython and pypy.

Non-Goals:

  1. Minifier performance.
  2. Transformations that are targeted at badly written code.
  3. Obfuscation.

The following suggestions I feel are out of scope, as the code can be minified by just writing better code:

  • Repeating the same command a lot of times in succession can be minified to a for loop with the command in it.
  • The print command will convert an integer to a string so print(str(2)) can become print(2)
  • str(1) can become '1' and str(2) can become '2' and so on
  • if 1 == 2: print("Hello, how are you?") can be removed completely as 1 does never equal 2
  • print(1+1) can become print(2)
  • a=1;a=2 can be changed into a=2 as that will be the final value anyway
  • for i in range(10):pass becomes for i in range(10):0, but it can be completely ignored

I appreciate you can't always change the input code but it's a lot of effort to add these. Without doing a level of dynamic analysis which is difficult or impossible, it would only work for relatively trivial examples.

For the following suggestions, I thought python-minifier already produced optimal code - have you got any examples where that's not the case?

  • Indentation can be minified
  • Sometimes it transforms a literal into a variable when the literal is only used once and it becomes less minified
  • import decimal can become import decimal as d, this will be more bytes for the import, but will be less for every call to the decimal module

These are good ideas! I think these can be in the next version:

  • Large numbers will be converted to hex, but for example 1000000000000 could be 10**12 but instead I get 0xe8d4a51000, which is shorter than the original, but could be smaller
  • pow(a,b) can become a**b as that is shorter
  • pow(a,b,c) can become a**b%c, but this is slower so it may should be optional

This is a good idea, but more difficult:

  • print("Hello, how are you?");print("Hello, how are you now?") can become
    a="Hello, how are you";print(a+"?");print(a+" now?")

@dflook dflook added the enhancement New feature or request label Jan 2, 2020
@projectgroepsteenplaat
Copy link
Author

projectgroepsteenplaat commented Feb 14, 2020

edit: indentation at the way I did it becomes invisible so I added indent for indentation

x = input()
if x == "hi":
indent>for i in range(2):
indent>indent>pass
indent>print("Hello")
print("Hello, how are you?")

becomes:
x=input()
if x=='hi':
indent>for i in range(2):0
indent>print('Hello')
print('Hello, how are you?')

could be:
x=input()
if x=='hi':
space>for i in range(2):0
space>print('Hello')
print('Hello, how are you?')
or even:
x=input()
if x=='hi':print('Hello')
print('Hello, how are you?')

And this:
import decimal
print(decimal.Decimal(4))
print(decimal.Decimal(7))
print(decimal.Decimal(9))

Remains the same but could be:
import decimal as d
print(d.Decimal(4))
print(d.Decimal(7))
print(d.Decimal(9))

Which is +4 bytes for import and -6 bytes for every print, and thus -18 is an improvement of 14 bytes

For the literal that is only used once I can't find an example anymore but I have had it once.

Kind regards

@dflook
Copy link
Owner

dflook commented Feb 14, 2020

Hi, thanks again for creating those examples!

Bear in mind that after minification indents become tab characters, which are a single byte long - the same as spaces. I use tabs because the output is more readable.

In your example we can eliminate the for loop entirely, but that's because we know there are no side effects to the range() call. Determining that in the general case is a difficult problem. We could specifically recognise some simple cases, but it's not a feature I plan to add.

In the decimal example I think you have rename-globals disabled, which prevents that transformation because it introduces the new name d, which may not be desirable. If I enable rename-globals, I actually get

B=print
import decimal as A
B(A.Decimal(4))
B(A.Decimal(7))
B(A.Decimal(9))

@projectgroepsteenplaat
Copy link
Author

Numbers greater than 10 ^ 45 get minified with hex, but would be shorter in base 36 (int(string, 36) will convert back to decimal)

@amaank404
Copy link

the one that shows conversion of 1000000000000 into 10**12 instead of 0xe8d4a51000 might not be a great idea considering the performance penelty of multiplying the number 12 times at runtime. this might even become a problem for performance requirements of some projects. How about make that optional. I have worked with that power operator and it is slow sometimes.

@mqyhlkahu
Copy link

@amaank404

might not be a great idea considering the performance penelty of multiplying the number 12 times at runtime. this might even become a problem for performance

Because both 10 and 12 are constant values, the result of 10**12 is calculated at compile time, before the code is run.

import dis
dis.dis("10 ** 12")
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (1000000000000)
              4 RETURN_VALUE

@amaank404
Copy link

I understand that now, I believe I was still new to this field back then. Thanks for knowledge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants