v3 data cd #175

gmbronco · 2024-02-27T21:59:17Z

some syntax cleanup and legacy code isolation (tokenService) to make jobs run in separation
will refactor token service to be a prisma client instead.
will work on simplifying tables schema for easier data syncing
one action for syncing all the pools data

gmbronco · 2024-02-27T22:03:14Z

few questions:

I’d like to merge and drop “dynamic data” tables, because it’s 1:1. Is there any need I don’t know why it’s separated?
Batch multicalls together logically, eg: if we are syncing pool tokens, let’s get all the info in one call logically matching what we need for the DB schema.
When would we need a block number in onchain calls? should we always be getting the latest?
Easier to get types from multicall directly when we know what we want to get, rather than wrapping in a multicaller class.
Typecasting read functions return types using AbiType
original onchain syncing has a huge fn call, i find it problematic because it's trying to coordinate a log of data sources into one update, which makes it error prone. we can split it into smaller chunks to make it easier to handle.
Most of the time we don’t need specific return types, unless we need strong guarantees something isn’t broken between two different domains – for now, trying to set them up is slowing down iterations
Don’t we want to store all config params from the vault getPoolConfig?
I cannot get over not using arrow notation, it's hard to go back to the 2000s :)

franzns · 2024-02-28T12:24:38Z

few questions:

I’d like to merge and drop “dynamic data” tables, because it’s 1:1. Is there any need I don’t know why it’s separated?

Pool.dynamicData? Idea was that we would never need to update pool table as it's static and only update dynamic data.

Batch multicalls together logically, eg: if we are syncing pool tokens, let’s get all the info in one call logically matching what we need for the DB schema.

yes, makes sense.

When would we need a block number in onchain calls? should we always be getting the latest?

True, it's a legacy product. Let's get rid of it

Easier to get types from multicall directly when we know what we want to get, rather than wrapping in a multicaller class.

Ok, let's setup a nice pattern for how to use the viem multicaller.

Typecasting read functions return types using AbiType

Oh nice!

original onchain syncing has a huge fn call, i find it problematic because it's trying to coordinate a log of data sources into one update, which makes it error prone. we can split it into smaller chunks to make it easier to handle.

On the other hand, this makes sure a pool is either updated entirely or not at all. Wonder what is worse, having a pool with inconsistent data or with outdated data.

Most of the time we don’t need specific return types, unless we need strong guarantees something isn’t broken between two different domains – for now, trying to set them up is slowing down iterations

It's just nice to work with, but agree, it's cumbersome...

Don’t we want to store all config params from the vault getPoolConfig?

I think we dont need most of that and it will probably also still change. Would store as we need it.

I cannot get over not using arrow notation, it's hard to go back to the 2000s :)

🤣

gmbronco · 2024-02-28T16:26:42Z

Pool.dynamicData? Idea was that we would never need to update pool table as it's static and only update dynamic data.

with one table it's going to be easier to make db upserts – something for another PR – added benefit would as well be a native support in prisma > 4.6.0

block number
True, it's a legacy product. Let's get rid of it

i'll leave a todo note

Typecasting read functions in multicall

Turns out AbiType works only with tuples as a return type, because they are decoded as objects. arrays are just returned as arrays, so we need to parse them manually and abitype doesn't return type for that. But we don't have to worry much about it once we abstract it in a fetching function.

original onchain syncing has a huge fn call, i find it problematic because it's trying to coordinate a log of data sources into one update, which makes it error prone. we can split it into smaller chunks to make it easier to handle.

On the other hand, this makes sure a pool is either updated entirely or not at all. Wonder what is worse, having a pool with inconsistent data or with outdated data.

I think it's better to keep consistency and fetch everything in one multicall. – even do decorations, eg: total liquidity. Revise only when it gets slow.

gmbronco · 2024-03-04T21:36:46Z

@franzns – here is the updated V3 syncing with following updates:

Jobs:

sync pools — takes all the pools from subgraph and upserts current data
sync join exists — subgraph to DB – all events on v3 and just 100 days for v2
syncing is based on subgraph pagination and fetches 1000 records in one call, to fully sync missing events run:

yarn task sync-join-exits-v2 1

as many times as needed (probably ~10 - 15x)

Read:

poolGetJoinExits – lists events, filterable by chain, pool, user

gmbronco · 2024-03-06T18:39:49Z

New work:

Jobs for syncing:
- V3 swaps
- V3 Join/exits
- V2 Join/exits
Adding pools – renamed this to addPools which finds new pools in subgraph and adds them to the DB
Syncing pools – syncPools is checking logs, fetches onchain data for affected pools and upserts to the DB
In case when we need to do full sync with subgraph and onchain data, we can run the addPools with specific IDs and it will handle all the data with upserts

franzns · 2024-03-07T12:59:44Z

tasks/index.ts

@@ -6,8 +6,18 @@ const jobsController = JobsController();
 async function run(job: string = process.argv[2], chain: string = process.argv[3]) {
    console.log('Running job', job, chain);

-    if (job === 'sync-changed-pools-v3') {
-        return jobsController.addMissingPoolsFromSubgraph(chain);
+    if (job === 'add-pools-v3') {


Is this class just used so one can run the jobs locally using yarn task ? Maybe add a comment if so

franzns · 2024-03-07T15:10:16Z

modules/controllers/queries-controller.ts

+    };
+};
+
+export function QueriesController(tracer?: any) {


This wont scale, we'll need to split that into multiple queries controllers.Though I am not even sure whether these should be controllers or whether we just leave them in the pool service. Maybe we create another category "loaders" where we put the query logic divided by domain (pools, users, etc.). Same for mutations, have another category "mutators" divided by domain.

yes, agree, i consider this a placeholder, lets give it a revamp once we collect more actions. i like loaders / mutations split by domain, same goes for jobs controller.

franzns · 2024-03-07T15:17:27Z

modules/sources/enrichers/swaps-usd.ts

+        const tokenPrices = await prisma.prismaTokenPrice.findMany({
+            where: {
+                timestamp: {
+                    gte: parseInt(timestamp),


shouldnt this be equal to? Otherwise you get a lot of prices and also wont work properly for "refilling" old swaps.

yes, good catch.

franzns · 2024-03-07T15:20:51Z

modules/actions/pool/sync-swaps.ts

+    });
+
+    // Get events since the latest event or limit to number or days we want to keep them in the DB
+    const since = Math.floor(+new Date(Date.now() - daysToSync * 24 * 60 * 60 * 1000) / 1000);


please use a var for this, no idea how many days this is

franzns · 2024-03-07T15:23:58Z

modules/actions/pool/sync-swaps.ts

+    // Store only the events that are not already in the DB
+    const existingEvents = await prisma.poolEvent.findMany({
+        where: {
+            id: { in: swaps.map((event) => event.id) },


why do we need this? we already only get swaps with a greater blockNumber than what we have stored.

maybe we dont even need to check against existing events? it anyway does create with skipDuplicates

does prisma detects duplicates by ID? if so, probably not needed.

although it's useful, because we don't need to be getting USD prices for synced events.

I'd remove it completely, there might be some duplicate swaps but it would save another potentionally big query (as id: { in: XXX} is a huge array)

franzns · 2024-03-07T15:24:17Z

modules/actions/pool/sync-swaps.ts

+    const since = Math.floor(+new Date(Date.now() - daysToSync * 24 * 60 * 60 * 1000) / 1000);
+    const where =
+        latestEvent?.blockTimestamp && latestEvent?.blockTimestamp > since
+            ? { blockNumber_gte: String(latestEvent.blockNumber) }


I think this should be gt not gte

initially i did it on purpose to overlap events in case we missed something, but probably doesn't make sense, because most likely all events from the same block are available atomically.. (?)

Maybe adding id_not_in would make things more predictable, i mean we would be getting only missing swaps for sure?

Yes, events from the same block are available atomically.

franzns · 2024-03-08T07:23:34Z

modules/actions/pool/sync-join-exits.ts

+
+    // Prepare DB entries
+    const dbEntries = await Promise.all(
+        events.map(async (event) => {


can you refactor this to how swaps are synced? No reason there are written so differently. Swaps sync seems super clean compared to this...

franzns · 2024-03-08T07:24:10Z

modules/actions/pool/sync-join-exits-v2.ts

+    const events = joinExits.filter((event) => !existingEvents.some((existing) => existing.id === event.id));
+
+    // Prepare DB entries
+    const dbEntries = await Promise.all(


same, as for sync join exits v3. can it be refactored to how swaps are done?

franzns · 2024-03-08T07:35:32Z

modules/controllers/jobs-controller.ts

+         *
+         * @param chainId
+         */
+        async reloadPools(chainId: string) {


would move this to the pool controller, as it is not triggered by a job

i think we start to see a pattern emerging with organising things around models. looks like jobs becomes pretty much pools at the moment. how about forgetting about jobs specific controller, and use "model/domain" naming instead?

I see what you mean. We can evolve it later

franzns · 2024-03-08T07:38:04Z

modules/controllers/jobs-controller.ts

+         *
+         * @param chainId
+         */
+        async updatePools(chainId: string) {


isnt this the same as "reloadPools" above?

true, removing

franzns · 2024-03-08T07:40:11Z

modules/controllers/jobs-controller.ts

+            const pools = await prisma.prismaPool.findMany();
+            const ids = pools.map((pool) => pool.id);
+            const client = getV3JoinedSubgraphClient(balancerV3, balancerPoolsV3);
+            const newPools = await client.getAllInitializedPools({ id_not_in: ids });


this is clever! But we need to be vary of the max return limit. Subgraph only max returns 1000 entities

maybe safer to get all pools with paging and then filter here on node?

getV3JoinedSubgraphClient – should handle paging – pools subgraph client needs to have the same paginated query as the vault has, i'll add it.

franzns · 2024-03-08T07:45:27Z

modules/actions/pool/upsert-pools.ts

+    }
+
+    // Get the token prices needed for calculating token balances and total liquidity
+    const dbPrices = await prisma.prismaTokenPrice.findMany({


this should be prismaTokenCurrentPrice

franzns · 2024-03-08T07:50:46Z

modules/actions/pool/upsert-pools.ts

+            await prisma.prismaPoolToken.deleteMany({ where: { poolId: pool.id } });
+            await prisma.prismaPoolTokenDynamicData.deleteMany({ where: { poolTokenId: { startsWith: pool.id } } });
+            await prisma.prismaPoolExpandedTokens.deleteMany({ where: { poolId: pool.id } });


is that needed? Wonder if we could have unexpected side effects if upsertPools is called regularly. Then pools end up having no tokens quite often.

Ok, I see the problem. There is no upsertMany that works properly. hmm, need to add a todo here

we can wrap it in a transaction, so there won't be intermittent "empty" state

franzns · 2024-03-08T07:55:29Z

modules/sources/contracts/fetch-pool-data.ts

-                isPoolPaused: configResult[i].result!.isPoolPaused,
-                isPoolInRecoveryMode: configResult[i].result!.isPoolInRecoveryMode,
-            } as PoolData;
+    const parsedResults = pools.map((pool, i) => {


this whole parsing is so ugly. We need a way to abstract that properly.

i agree that something similiar to multicaller would be great. I added this here: web3/multicaller-viem.ts – but making it work with the types is another challenge and i don't think we need to fight for this one. There are just few places where we are fetching onchain data, and they are always opinionated, i mean we always know what we are fetching, so it's easy to match inputs. if anything i'd consider that for a separate task.

franzns · 2024-03-08T07:57:50Z

modules/actions/pool/sync-pools.ts

+                    swapFee: String(onchainPoolData.swapFee ?? '0'),
+                },
+            },
+            poolTokenDynamicData: onchainPoolData.tokens.map((tokenData) => ({


shouldnt this use the transformer?

franzns · 2024-03-08T09:04:17Z

modules/actions/pool/sync-pools.ts

+    /** TODO: enrich updates with USD values
+    const tokenAddresses = Array.from(
+        new Set(Object.values(onchainData).flatMap((pool) => pool.tokens.map((token) => token.address))),
+    );
+
+    // Get the token prices needed for calculating token balances and total liquidity
+    const dbPrices = await prisma.prismaTokenCurrentPrice.findMany({
+        where: {
+            tokenAddress: { in: tokenAddresses },
+            chain: chain,
+        },
+        include: {
+            token: true,
+        },
+    });
+
+    // Build helper maps for token prices and decimals
+    const decimals = Object.fromEntries(dbPrices.map(({ token }) => [token.address, token.decimals]));
+    const prices = Object.fromEntries(dbPrices.map((price) => [price.tokenAddress, price.price]));
+    */


Would be great to actually add this to this PR :)

franzns · 2024-03-08T09:05:33Z

modules/pool/pool.gql

@@ -801,28 +791,78 @@ enum GqlPoolStakingGaugeStatus {
 input GqlPoolJoinExitFilter {
    poolIdIn: [String!]
    chainIn: [GqlChain!]
+    userAddress: String


Gotta tripple check if any of these changes break the current UI. I think they dont, but need to check it!

i was paying attention to just add new attributes, so join/exits have all the same attrs as swaps

v3 actions

7383bdf

gmbronco force-pushed the v3/data branch from ead05b8 to 7383bdf Compare March 4, 2024 17:21

gmbronco added 4 commits March 4, 2024 21:05

fixes

af61390

syncing events

077c5e8

events filtrering

73ebe1b

join-exits reading

b82e1f0

gmbronco marked this pull request as ready for review March 4, 2024 21:36

gmbronco added 7 commits March 5, 2024 22:17

added swap v3 syncing

655584b

bit of a cleanup

0fc8cf0

gql schema updates

9cedd0c

syntax cleanup

3f7bf08

query description

a15c4cc

joined subgraph client

5119bd6

add and sync pools

5fee973

gmbronco added 6 commits March 6, 2024 19:41

task manager actions

1fd3919

upserting onchain totalSupply for totalShares

29dfa03

upsert task

a56a538

reload task

fc208ac

task trigger

f691aa6

fn doc update

7dd2f0f

franzns reviewed Mar 7, 2024

View reviewed changes

updates

65e6c8f

franzns reviewed Mar 8, 2024

View reviewed changes

gmbronco added 5 commits March 8, 2024 18:48

removed duplicated action

bd2da28

current price fix

7802a85

swaps sync cleanup

c296f11

review updates

d41a246

Merge branch 'v3-canary' into v3/data

7acf5cf

gmbronco merged commit db04691 into v3-canary Mar 11, 2024
1 check passed

v3 data cd #175

v3 data cd #175

Conversation

gmbronco commented Feb 27, 2024

gmbronco commented Feb 27, 2024 • edited Loading

franzns commented Feb 28, 2024

gmbronco commented Feb 28, 2024

gmbronco commented Mar 4, 2024

gmbronco commented Mar 6, 2024

Choose a reason for hiding this comment

franzns Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

franzns Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

franzns Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

franzns Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmbronco Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

gmbronco commented Feb 27, 2024 •

edited

Loading

franzns Mar 7, 2024 •

edited

Loading

franzns Mar 7, 2024 •

edited

Loading

franzns Mar 8, 2024 •

edited

Loading

franzns Mar 8, 2024 •

edited

Loading

gmbronco Mar 8, 2024 •

edited

Loading