Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Optional chaining resurgent #29

Closed
dirkf opened this issue Jun 28, 2022 · 22 comments
Closed

GitHub Optional chaining resurgent #29

dirkf opened this issue Jun 28, 2022 · 22 comments
Labels
js-syntax Problem caused by unsupported JS syntax wontfix This will not be worked on

Comments

@dirkf
Copy link

dirkf commented Jun 28, 2022

See JustOff/github-wc-polyfill#68 (comment) for an example of this nonsense.

So I thought it should be possible to apply an edit to the GH scripts using this nice extension. If they say a.b.c?.d, we should be able to replace that with ((_x=>(_x==null)?undefined:_x.d)(a.b.c)). There was already a minor example in the gh-script-optchain fix (which I renamed below after removing that).

I edited lib/main.js, also removing an undefined diagnostic for fixes.a:

--- a/lib/main.js
+++ b/lib/main.js
@@ -78,10 +78,14 @@ function evaluateFix(fix, script, csp, contentReplace) {
                           "data-module-id": "./chunk-index2.js",
                           "data-src": "https://github.githubassets.com/assets/chunk-index2-a3fdc9f7.js"});
             break;
-        case "gh-script-optchain":
+        case "gh-script-nullish":
             // works only for this specific minimizer output...
-            contentReplace.push([/([a-zA-Z]+)\?\?/g, "((typeof($1)!==undefined)&&($1!==null))?($1):"]);
-            contentReplace.push([`this.matchFields?.join("-")`, `((y)=>y?y.join("-"):y)(this.matchFields)`]);
+            contentReplace.push([/([a-zA-Z_$][a-zA-Z_$0-9]*)\?\?/g, "((typeof($1)!==undefined)&&($1!==null))?($1):"]);
+            contentReplace.push([`H.integrity=S.sriHashes[t],`, ``]);
+            break;
+        case "gh-script-optchain":
+            contentReplace.push([/(([a-zA-Z_$][a-zA-Z_$0-9]*\.)*([a-zA-Z_$][a-zA-Z_$0-9]*))\?\.([a-zA-Z_$][a-zA-Z_$0-9]*)/g,
+                                "((_x => (_x==null)?undefined:_x.$3)($1))"]);
             contentReplace.push([`H.integrity=S.sriHashes[t],`, ``]);
             break;
         case "gh-worker-csp":
@@ -158,7 +162,7 @@ function supersededFixes(service) {
         superseded.add("tmx-optchain");
     }
     if (service.isNullishCoalescingSupported) {
-        superseded.add("gh-script-optchain");
+        superseded.add("gh-script-nullish");
     }
     return superseded;
 }
@@ -814,7 +818,7 @@ class HTTPObserver {
                 }
                 break;
             case "http-on-modify-request":
-                if (gService.isSeaMonkey && fixes.a.includes("sm-cookie")) {
+                if (gService.isSeaMonkey && fixes.fixes.includes("sm-cookie")) {
                     try {
                         this.cookie = subject.getRequestHeader("Cookie");
                     } catch (e) {

and lib/builtin-rules.js:

--- a/lib/builtin-rules.js
+++ b/lib/builtin-rules.js
@@ -11,7 +11,7 @@
 // For easier maintainability, please keep in logical-alphabetical order:
 //   generally sorted by host name part, minus "www."
 //   "www.dhl.de" sorts as "dhl"
-//   "static.ce-cdn.net" sorts together with it's parent, "godbolt.org"
+//   "static.ce-cdn.net" sorts together with its parent, "godbolt.org"
 
 exports = String.raw`
 developer.apple.com
@@ -31,6 +31,9 @@ gist.github.com/socket-worker.js$script
 github.com/assets-cdn/worker/socket-worker-*.js$script
 gist.github.com/assets-cdn/worker/socket-worker-*.js$script
     gh-worker-csp
+github.githubassets.com/assets/*.js$script
+    $script-content,gh-script-optchain
+
 ! --
 godbolt.org
     std-queueMicrotask

(Anyone who asks why this isn't a GH PR should try making one when GH scripts aren't working)

Now, I've abolished the console errors, but apparently the scripts are failing silently. I'm getting this
None of the "sha512" hashes in the integrity attribute match the content of the subresource.
apparently from the catch at the end of TracingListener.onStopRequest.

Any advice?

@dirkf
Copy link
Author

dirkf commented Jun 28, 2022

The Debugger shows that (eg) the script that was previously delivered with v=n(y=>y?.tagName==="TURBO-FRAME", ...) now has v=n(y=>((_x => (_x==null)?undefined:_x.y)(y))==="TURBO-FRAME", ...) in its place.

@martok
Copy link
Owner

martok commented Jun 28, 2022

Optional chaining and nullish coalescing are both supported by current UXP. Please explain a) what breaks, b) how the cited issue is a problem on the supported platforms, and c) what this patch (don't worry about the not-PR) is supposed to fix.

@dirkf
Copy link
Author

dirkf commented Jun 28, 2022

SeaMonkey 2.53 doesn't yet handle optional chaining (though the next .release should do). That might fall outside the supported platforms, but it would still be helpful to understand if I failed to do something obvious.

Normally breaking language changes (due to incompatible JS versions being served as text/javascript' instead of application/javascript;version=ES2017` or some such) cause scripted site functionality to fail because important functions following the invalid syntax are never parsed and loaded. If the change affects dynamic behaviour rather than language parsing, some functionality may fail with a diagnostic in the error log instead.

By removing unsupported language features from the delivered scripts, there should be no missing functionality, even if possibly one of the edits causes a dynamic error. However I see neither parsing errors nor dynamic errors.

If this were made to work we also need in the conditional list in lib/main.js ca. l.62:

        superseded.add("gh-script-optchain");

@martok
Copy link
Owner

martok commented Jun 29, 2022

Ah, right.

Basically your approach was correct, and last time this happened I ran into the same error: GH uses script integrity (good, actually!) so it refuses to execute the script after it was modified.
To get around this, add gh-integrity to the fixes for github.com, that should strip the protection and run the modified script.

Also, good catch on the typos.

@dirkf
Copy link
Author

dirkf commented Jun 30, 2022

With some git archaeology I traced the gh-integrity issue. That revives the SyntaxError: expected expression, got '.' diagnostics (ie, the scripts are being parsed).

Now I just have to fix the REs: in general nested REs would be needed but my hope that a set of REs targeting a.b.c?.d(x), a.b.c?.d[x] and a.b.c?.d would cover most of the GH scripts looks optimistic.

One frequent pattern is document.head?.querySelector(selector_expression)?.content. This needs to be turned into a function call with a try {return document.head.querySelector(selector_expression).content;} catch {return undefined;} body.

The logging feature is reassuring although because String.prototype.replace() is used (otherwise good) there isn't an easy way to see exactly how the suspect scripts have been rewritten.

@SeaHOH
Copy link

SeaHOH commented Jun 30, 2022

The difficulty is implement the expression short-circuits, there are too many usage modes of optional chaining. No simple way, babeljs can be competent, but too slow.

@SeaHOH
Copy link

SeaHOH commented Jun 30, 2022

runtime
behaviors
app_assets_modules_github_behaviors_pjax_ts

and those scripts will load scripts with check attribute "integrity", must be rewrite as empty.

@dirkf
Copy link
Author

dirkf commented Jun 30, 2022

This gh-script-optchain fix should handle a string of JS identifiers interpersed with the three optional chaining syntaxes (?.(...), ?.[...] and plain ?.), provided the ... don't have nested optional chaining, in either value (failure => undefined) or calling (failure => ()=>undefined) context:

            let replacer = function (mch, p1, p2) {
                let chg = false;
                p1 = p1.replace(/\?\.([\])])?/g, (m, p)=>(chg=true,p==null?".":p));
                if (!chg) return mch;
                return `((()=>{try {return ${p1};} catch (e) {return ${p2?"(()=>undefined)":"undefined"};}})())${p2==null?"":p2}`;
            }

            contentReplace.push(
                [/((?!(?:return|eturn|turn|urn|rn|n)\()(?:[a-zA-Z_$][a-zA-Z_$0-9]*(?:\([^)]*\)|\[[^]]*\])?\??\.(?:\([^)]*\)|\[[^]]*\])?)+(?:[a-zA-Z_$][a-zA-Z_$0-9]*)?)(\()?/g,
                replacer]);

This generates completely unreadable (so no worse than before) but so far valid JS.

I can't make SM's regex force a match that has at least one ?., so potentially all JS access expressions are matched, although the match is returned unchanged if no optional chaining was found. There is a guard to stop return(x.y.z... being matched, though it would be nice to have an easier way.

Now just these are not being processed:

21:13:58.446 SyntaxError: missing name after . operator  environment-926f8956f4a3.js:1:5641
21:14:03.707 SyntaxError: missing name after . operator  app_assets_modules_github_behaviors_pjax_ts-683fa5a6e739.js:1:13858
21:14:03.710 SyntaxError: expected expression, got '.'  app_assets_modules_github_behaviors_keyboard-shortcuts-helper_ts-app_assets_modules_github_be-af52ef-8ba1beffba70.js:2:19
21:14:03.717 SyntaxError: expected expression, got '.'  behaviors-2e283b89d4f5.js:9:1667
21:14:03.733 SyntaxError: missing name after . operator  optimizely-2fe0304a629f.js:1:5286

The optional chaining syntax seems to be matched but not replaced, even if the contentReplace is repeated.

@dirkf
Copy link
Author

dirkf commented Jul 1, 2022

@SeaHOH

... babeljs can be competent, but too slow.

Indeed, it's a tool that should be deployed (probably is, or an equivalent) in GH's server build process, but they insist on setting the switch that makes ES2020 code.

runtime
behaviors
app_assets_modules_github_behaviors_pjax_ts

... those scripts will load scripts with check attribute "integrity", must be rewrite as empty.

If/As this isn't being done in PF, how should it be done?

After further testing I have a version that's working well enough for the issue ... menu to be populated and the Delete command to work, but I still have (from the Ctrl+Shift+J Browser Console):

14:47:51.865 SyntaxError: expected expression, got '.'  app_assets_modules_github_behaviors_pjax_ts-683fa5a6e739.js:1:17048
14:47:51.869 SyntaxError: expected expression, got '.'  app_assets_modules_github_behaviors_keyboard-shortcuts-helper_ts-app_assets_modules_github_be-af52ef-8ba1beffba70.js:2:19
14:47:51.879 SyntaxError: expected expression, got '.'  behaviors-d4237c1eb484.js:9:1667
14:47:51.930 SyntaxError: expected expression, got '.'  optimizely-2fe0304a629f.js:1:16702
14:47:51.974 TypeError: 'getAttribute' called on an object that does not implement interface Element.  element-registry-9f412aa21311.js:1:927

The last of these (probably all the rest too) is a script that isn't being processed even thought the script source is handled perfectly under test. The Developer Tools Browser Console (which is mysteriously different from the stand-alone display) attributes it to the TypeScript source const strategyName = (child?.getAttribute('data-load-on') || 'ready') as keyof typeof strategies, which presumably transpiles into the file delivered as element-registry-9f412aa21311.js.

The almost working code:

            let replacer = function (mch, p1, p2) {
                let chg = false;
                // only replace in context (varname|fnname(...)|arrname[...])?.(varname|(...)|[...])
                p1 = p1.replace(/([a-zA-Z_$0-9)\]])\?\.([a-zA-Z_$([])/g, (m, q1, q2)=>(chg=true, q1 + ('[('.includes(q2)?"":".") + q2));
            if (!chg) return mch;
            // x.call -> x.call.bind(x): https://nemisj.com/some-interesting-aspects-of-call-method-in-javascript/
            return `((()=>{try {return ${p1.replace(/^((?:.|\s)*)\.call$/, "$&.bind($1)")};} catch (e) {return ${p2?"(()=>undefined)":"undefined"};}})())${p2==null?"":p2}`;
        }

        contentReplace.push(
            [/((?!(?:return|eturn|turn|urn|rn|n)\()(?:[a-zA-Z_$][a-zA-Z_$0-9]*(?:\([^)]*\)|\[[^]]*\])?\??\.(?:\([^)]*\)|\[[^]]*\])?)+(?:[a-zA-Z_$][a-zA-Z_$0-9]*)?)(\()?/g,
            replacer]);

@dirkf
Copy link
Author

dirkf commented Jul 1, 2022

@SeaHOH
Copy link

SeaHOH commented Jul 3, 2022

If/As this isn't being done in PF, how should it be done?

         case "gh-script-integrity":
             contentReplace.push([`.integrity=`, `._integrity=`]);
             break;

and isHTMLDocument() should check one more type which is nsIContentPolicy.TYPE_FETCH.


outdated

my dirty scheme (?. => &&, ?? => ||), fast enough and works well in most cases.

  • apply optchainSub to all script (for reduce configuration items, it's common), and repeat replace() until no change.
  • apply others to corresponding files.
  • (replace) at last, write and apply special substitutions to other cases.
function optchainFill(match, split, expr, target) {
  return `${split}${expr}&&${expr}${"([".includes(target)?"":"."}${target}`;
}

const optchainSub = [/([,;:= {}<>!|&(\)?])((?:[a-zA-Z_$][a-zA-Z_$0-9]*|\([^()]+?\))(?:(?:\.[a-zA-Z_$][a-zA-Z_$0-9]*)*(?:\[[^[\]]+?\]|\([^()]*?\))?)*)\?\.([a-zA-Z_$]|[([])/g, optchainFill],
      nullishSub = [/([a-zA-Z_$0-9\)\]])\?\?/g, "$1||"],
      integritySub = [".integrity=", "._integrity="],
      integritySubHtml = [/(<script[^<]+?)integrity=".+?"/g, "$1"];

/*
 * special substitutions *
  * behaviors
  [/=(\(0[^\)]+\)\(.\.querySelector\("\.js-code-editor"\)\))\?\.editor;/, '=(([a]=[$1])=>a?a.editor:undefined)();']
  * notifications-v2
  [/=(.\.match\(.+?)\?\.pop\(\)\|\|"",/, '=(([a]=[$1])=>a?a.pop():"")(),']
*/

@SeaHOH

This comment was marked as outdated.

@dirkf
Copy link
Author

dirkf commented Jul 3, 2022

Like @SeaHOH I managed to get a working set of patches, at least to edit and label posts. Unfortunately, it's equally incomprehensible!

There are some runtime JS errors logged that I expect can be fixed by reviewing the replacements, but the patched GH scripts don't throw SyntaxErrors.

@SeaHOH

This comment was marked as resolved.

@dirkf
Copy link
Author

dirkf commented Jul 4, 2022

The point I was making is that incomprehensibility is a general characteristic of any complex regex solution (see also Perl), even with the (?x) flag.

I've posted the now apparently fully working version as above. This is the code corresponding to that posted earlier:

            let replacer = function (mch, p1, p2) {
                let chg = false;
                // only replace in context (varname|fnname(...)|arrname[...])?.(varname|(...)|[...])
                if (p1 != null) {
                    p1 = p1.replace(/([a-zA-Z_$0-9)\]])\?\.([a-zA-Z_$([])/g, (m, q1, q2)=>(chg=true, q1 + ('[('.includes(q2)?"":".") + q2));
                }
                if (!chg) return mch;
                // split the expression ${path}${tail} with no initial . in tail; bind `this` if necessary
                // thanks https://nemisj.com/some-interesting-aspects-of-call-method-in-javascript/
                const [p, e] = p1.split(/(?=(?:\.(?=[a-zA-Z_$])|[([])([a-zA-Z_$][a-zA-Z_$0-9]*|\([^)]*\)|\[[^\]]*\])$)/).concat([null]);
                const try_txt = (e?
                    `const _1 = ${p}; const _2 = ${p1}; return (typeof _2 === 'function' && 'constructor' in _2)?_2.bind(_1):_2`:
                    "return " + p1);
                const catch_txt = "return " + (p2 != null? "(()=>undefined)": "undefined");
                const ret = `((()=>{try {${try_txt};} catch (e) {${catch_txt};}})())${p2==null?"":p2}`;
                print("modifying " + mch + " to " + ret);
                return ret;
            }

            contentReplace.push(
                // match and sanitise quite a general JS access expression, rooted at identifier or (0, identifier[.identifier]*), avoiding embedded {} expressions
                // not return(..., if(... identifier              |\(0,[  identifier[?].               ]*  identifier               [(...(not {))|[...not {]]]* [?].[(...(not ))|[...not ]]]+     [identifier]                 (
                [/(?:if\(|return\(|((?:(?:[a-zA-Z_$][a-zA-Z_$0-9]*|\(0,(?:[a-zA-Z_$][a-zA-Z_$0-9]*\??\.)*[a-zA-Z_$][a-zA-Z_$0-9]*\))(?:\((?:\$\{|[^{)])*\)|\[(?:\$\{|[^{\]])*\])*\??\.(?:\([^)]*\)|\[[^\]]*\])*)+(?:[a-zA-Z_$][a-zA-Z_$0-9]*)?))(\()?/g,
                 replacer]);
            // const o=(0,$t.P)(t.querySelector(".js-code-editor"))?.editor
            contentReplace.push(
                [/(\([^)]*\)\((?:[a-zA-Z_$][a-zA-Z_$0-9]*\.)*[a-zA-Z_$][a-zA-Z_$0-9]*\([^)]*\)\)\?\.[a-zA-Z_$][a-zA-Z_$0-9]*)(\()?/g,
                 replacer]);

Happy to PR this, but only tested on my SM 2.53.12.

Enabling the add-on's debug flag and filtering for PF: modifying in the Browser console gives a list of the substitutions from which a more specific and possibly more efficient set of patterns and replacements could be generated, but that would be even less likely to work after GH spin the release roulette wheel.

@SeaHOH

This comment was marked as outdated.

@UCyborg
Copy link
Contributor

UCyborg commented Jul 11, 2022

Interesting work, manually replacing code sequences with those operators with the older syntax equivalent is definitely a pain in the butt, particularly when there are multiple occurrences.

BTW, I just updated to freshly released SeaMonkey 2.53.13. Release notes only mention optional chaining while nullish coalescing operator support appears to have been implemented already in 2.53.12, but its release notes don't mention it.

@dirkf
Copy link
Author

dirkf commented Jul 12, 2022

My last attempt (to handle action logs) started going into a infinite-ish loop, so I reverted.

The SM support for these features is correct, but 2.5.13 also removes some legacy experimental features that were used in popular extensions.

@SeaHOH
Copy link

SeaHOH commented Jul 16, 2022

update, Optional Chaining and Nullish Coalescing replacers were merged to nullishOPSub, became simple and faster. inspiration comes from dirkf‘s scheme.

function nullishOPFill(match, split, identifiers, target, operator) {
  let optchain = true, expression = identifiers;
  if (target) {
    expression += target;
  }
  else {
    operator = "??";
    optchain = /\?\.[a-zA-Z_$[(]/.test(identifiers);
  }
  if (optchain) {
    let async_ = expression.includes("await ");
    expression = expression.replace(..._optchainSub);
    expression = `${async_?"await ":""}(${async_?"async ":""}()=>{try{return ${expression}}catch(e){if(!/is (undefined|null|not a function)/.test(e.message||"")){throw e}}})()`;
  }
  return operator ? operator === "??" ?
      `${split}![undefined,null].includes(_NP_=${expression})?_NP_:` :
      `${split}[undefined,null].includes(_NP_=${expression})?undefined:_NP_${operator}` :
      `${split}${expression}`;
}

const _optchainSub = [/([a-zA-Z_$0-9)\]])\?\.([([]?)/g, ((_,o,t) => `${o}${t||"."}`)],
      nullishOPSub = [/([?,;:= {}()<>!|&](?!return\()|return(?=\())((?:[a-zA-Z_$][a-zA-Z_$0-9]*|\([^()]+(?:\([^()]*\))?[^()]*\))(?:(?:\??\.[a-zA-Z_$][a-zA-Z_$0-9]*)*(?:\[[^[\]]+?\]|\((?:[^()]+\([^()]*\))?[^()]*\))?)*)(?:\?\?|(\?\.(?:[a-zA-Z_$][a-zA-Z_$0-9]*|\[[^[\]]+?\]|\((?:[^()]+\([^()]*\))?[^()]*\))(?:(?:\??\.[a-zA-Z_$][a-zA-Z_$0-9]*)*(?:\[[^[\]]+?\]|\((?:[^()]+\([^()]*\))?[^()]*\))?)*)([([]|\?\?)?)/g, nullishOPFill],
      // "_NP_" MUST be declared, front of main HTML is OK
      nullishOPSubHtml = ["<link", `<script>var _NP_;</script>\n<link`],
      integritySub = [".integrity=", "._integrity="],
      integritySubHtml = [/(<script[^<]+?)integrity=".+?"/g, "$1"];

applied scripts:

*.js
nullishOPSub

runtime-*.js
integritySub

behaviors-*.js
integritySub

app_assets_modules_github_behaviors_pjax_ts-*.js
integritySub

@dirkf
Copy link
Author

dirkf commented Aug 2, 2022

https://github.com/dirkf/palefill/releases/tag/v1.19df is working for me.

@rofl0r
Copy link

rofl0r commented Aug 2, 2022

it's great that palefill, unlike youtube-dl, sees regular releases 👍

@dirkf
Copy link
Author

dirkf commented Aug 2, 2022

You are looking at a tool dependency!

@martok martok added wontfix This will not be worked on js-syntax Problem caused by unsupported JS syntax labels Feb 4, 2023
@martok martok closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
js-syntax Problem caused by unsupported JS syntax wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants