Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure Tools #407

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open

Structure Tools #407

wants to merge 30 commits into from

Conversation

CocoisBuggy
Copy link
Contributor

@CocoisBuggy CocoisBuggy commented Nov 4, 2024

I'm experimenting with a feature for area navigation and climbing data input and I don't want to hammer the current GQL queries to get the data out, especially since I mostly just need pointers indicating the overall structure,

I've got the speed down to an acceptable time, but really it should be faster. I need to eliminate the lookup stage in the pipeline, which means embedding an ObjectId for an area's parent - I just haven't done it yet.

Queries can't run over 900ms at the moment, which I feel is probably too lenient. I'm just putting these commits here for now while I work on the rest of the PR.

Other unrelated commits

I can kill these from the PR. They are not crucial, they mostly relate to making the development environment a little more forgiving

  1. I added shell.nix - I'm on nixos
  2. made replica set env var optional during seeding
  3. added a config in the vscode settings to prevent vscode extensions from mangling existing shell scripts

I ran into a one-off issue where the replica set specification was
throwing off the script in a development environment. This code will let
you un-set the variable if you don't need it while in dev.
This is by no means a complete environment, but if no one minds me
leaving this here it would save me some trouble - but is not required.

If you are unfamiliar with nixos, this file is intended to give my
environment the necessary mongo utils / yarn / node and so forth.

You can peg a node version and all sorts of various magic. I would
recommend any nerds with sufficient free time investigate - but you will
find it very hard to escape once it has ensnared you.
The query is too slow, it remains to be seen if this is because the
scope is simply too liberal - but my intuition is that this is down
to the regex search and full-document retrieval

Current root query time is along the lines of 16 seconds (Unthinkably
slow) so I'll try another approach, otherwise this architechture is
simply no good.
value? This may be down to me not manually running some cron operation
in my development environment?

I will work it out later.
There is still some query tuning that I want to do - I might sqaush this
commit later

Cost Reference
 Texas ~50ms
 California ~900ms
achieved by pruning the data being passed through the pipeline
achieved by pruning the data being passed through the pipeline
The only way I can think of to speed this up is to eliminate the parent
lookup stage in the pipeline, which seems to double the time the query
takes to run until completion.

The natural way to address this would be to embed a parent pointer -
which I assume should be easy enough since there must be existing code
to maintain the integrity of the

pathTokens,
ancestors

So I'll take a look at doing that too
Rather than duping logic from other code units over and over
the name of mongo was changed to mongo a little while ago, and I can't
be bothered to symlink the name, so now if mongo is not available the
script will fallback onto mongosh.

+ Add mongosh to the nix shell
The tests are not passing at this commit, even though TSC is not
reporting type issues. I will try and correct whatever is preventing the
types from being reported.
Previously you would need to re-enter the auth credentials.
This commit should be squashed later...
This code, as far as I can tell, hasn't been touched for a while and is
just clutter? If I am wrong about this then I need to know why we are
still importing data that we should presumably now be maintaining.
I'm not sure why I didn't do this initially
There is a new, simple, source of truth for area structure derivations.
Each area document holds a parent reference, which points to its desired
reference (or none, if it is a root node). The derived fields such as

 - Children
 - Ancestry

have been collated in `embeddedRelations`. the denormalized fields
`ancestors` and `pathTokens` have also been colocated into a single
entity path that makes propogating updates to denormalized data easier
(e.g, when updating an areas name). It leaves us a space for adding new
fields, and finally exposes the ACTUAL id of the document so that we can
index effectively.

I would suggest that using the parent field is the most straightforward
and safest thing to do in pretty much all scenarios, but if you are a
mongodb veteran you may find need for the array indexing features -
though as far as I can tell, this is rarely the case?

 - Added tests
 - Updated some existing tests
 - Data returned at the gql layer should be identical, but I have not
   fully de-risked those tests. All I can say is that they are passing.

This commit does not address #126, but gives us an enviable headstart
at it. It is however needed by #407
I'm going to go live in the mountains for a week or so, but will take a
laptop so may want to work on this a bit.
@vnugent vnugent added this to the v2 milestone Nov 27, 2024
It's all in, I think. I added a new test to cover name uniqueness but no
doubt more cases will come up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants