trade.py trade
definitely does more work than it should
#182
Replies: 8 comments
-
After consuming around 12GB of memory (my system only has 16GB, so I'm unsure if that 12 is an upper bound, or if it just has a "reasonable" percent of using 66-67% total system memory) and 2-3 minutes of wall time, the following command was finally able to output something (granted said something is still not the most helpful but that's a separate issue I've brought up before [1]):
Using GNU For comparison, the actual commodities buy command that even got me those points to check their trades of:
A command that at minimum had to be executing queries on distance and multiple queries on commodity prices used 155.7MB of memory and 30 seconds of time, versus a command to output two precise location's data diffed against each other and then sorted taking 150 seconds and 11.6 GB of memory. And yes, if it weren't obvious, I was interested in how Anyway, something has clearly broken here, and I'm trying to resist the urge to dive into the code at the moment to root cause it versus actually playing the game and enjoying that CG. (P.S. [1] refers to my previously opened issue #143, with a subitem asking for (P.P.S. It would also be pretty nice if the |
Beta Was this translation helpful? Give feedback.
-
By the way, the reason Refactoring I saw that the P.S. This line in the |
Beta Was this translation helpful? Give feedback.
-
Don't underestimate my potential for derp -- I totally started TD as a learn-me-some-python project; to emphasize that:
https://python.godbolt.org/z/nn7KrEqv8 and that may be as far as I looked? I also wouldn't known of/thought of https://python.godbolt.org/z/a9WxcrEPb This was also probably before I discovered I wasn't going to be able to scale performance with threading :) There's another issue/discussion elsewhere where I have suggested what was always my original plan, for TD to be-its-own-service so that the data is loaded once. SQLite was just the most accessible ACID-storage option I had at the time, I didn't actually intend to use it as a database, and that's really what you're seeing in this code, is my trying to avoid the lure of writing SQL queries believing that to do so would make it harder to funnel things through TradeDB rather than pure SQLite. I seem to remember that scoped-loading either existed or was planned. I don't know whether I simply screwed it up with order of operations in this one or the sql-creep bent this out of shape. I've also commented elsewhere that a HUGE problem with the station-items table is that it's a single table instead of separate buying/selling tables, which results in a pair of "x_price > 0" indexes which turn out to have logarithmic complexity maintaining - so when the db was much smaller it was an optimization, now it's a giant hindering turdbuger :( I was working on changes to move those into their own tables, alongside changes to also change the cache build so that it doesn't rebuild the file from scratch every time, allowing the db to benefit from reuse. https://github.com/kfsone/Trade-Dangerous/tree/kfsone/inplace-cache-rebuild |
Beta Was this translation helpful? Give feedback.
-
When I was very much younger, I studied chemistry at university and also took a lot of ilicit mind altering drugs. Either way "acid storage" means something unrelated to computers, could you educate me?
The utterly ridiculous database size is a large (if not the exclusive) reason we've started routinely purging data more than 30 days old. To put some numbers to this, before the purge, the DB was ~5.5GB in size, it's now ~500MB. So the vast majority of the data available (for example from spansh) is wildly out of date, sometimes literal years old. This should also hugely reduce the daily listings.csv download size, which has been mentioned elsewhere. |
Beta Was this translation helpful? Give feedback.
-
Atomic, Consistent, Isolated, Durable; if you're in the middle of writing something to the disk, it can't end up in an unrecoverable state. Say the first 1KB of your file is an index, four bytes of an ID and four bytes of an offset into the file the data for that ID as at. e.g. (3612, 8000) - meaning offset for id 3612 is end-of-index + 8000 bytes. You get an update that 3612 was deleted and 3613 added, but the new record is bigger so it has to be relocated, and it turns out there's a big enough free space at offset 2000. writing this data into your file is going to take a non-trivial quantity of time - dozens or even hundreds of cpu instructions. say the code does: seek(index_position) There's no data at the new offset, you didn't move the old data into the free list, and the old data is still there but now orphaned? If those values are something to do with payments, you just threw money away. There are ways you can work around it to provide stability, a common one being that you write the steps you need to execute to a "journal", and you don't allow access to the data while the journal is non-empty. After that you still have to be careful not to put the data into an intractable state, but you can at least recover from crashes or power outages. https://en.wikipedia.org/wiki/ACID Unfortunately, as I was building up TD, I kept finding SQLite to not be all that; it was fairly routine for it to render the database unrecoverable. It was still giving me headaches at work up until about 4-5 years ago but - touch wood - haven't seen one in at least 4 years. |
Beta Was this translation helpful? Give feedback.
-
I didn't even notice that
I did notice that, and you've mentioned it somewhat elsewhere IIRC. Honestly, it may make sense at this point to just migrate to a pure NoSQL key-store instead of using SQL at that point. Though I'm honestly surprised that sqlite is so inefficient as to not be feasible to offload to for concurrent lookup at least, given Python's inability to do threading and poor multi-process data passing ability (pre 3.13 at least, if that ever materialises). For all the areas where additional pure-data checks are being done in Python rather than SQL that is (like "is the price less than this number?" or "Is the pad size.." etc.). |
Beta Was this translation helpful? Give feedback.
-
I'm a sysadmin by trade, not a programmer. I'll hack crappy code together to achieve something simple when I've no other choice. So a lot of this goes over my head. My question therefore is how much of these discussions are realistically achieveable? I've done a lot of project management and this project has zero budget very limited resources, so I look at this and whilst it may be academically interesting discussion, does it lead to something that can be made to happen with reasonable expectations on time input for all concerned? |
Beta Was this translation helpful? Give feedback.
-
This was, and still is, an actual issue. The discussion parts came after, but I'm pretty sure Anyway, since it is a discussion now, I guess I'll use this to ask, is the sqlite database getting To the issue itself, I have the following SQL query tested in
Obviously, need to replace the system and station name specifications with variable entries, but chose that one since there are a ton of "Walker Port" stations so nice way to test the non-uniqueness criterion. I have it setup this way because I saw there was an index set for "find a station by name" (non-unique) but no index setup for "find a system by name" (unique), so best way I could think of was "find all stations with that name, then filter to stations whose system id matches the system name". Incidentally, is there a reason there's an index to find stations by name but not systems by name? It seems like it would make more sense to filter by system, since it's unique, and then filter on the system_id, since there's an index for that too, with stations. Maybe I'm misunderstanding the way the DB is setup or some use case there, or it has to do with how foreign keys work (I'm not super familiar with what optimisations SQL can do with those constraints, if they link it directly then makes sense that finding a station by name would let you immediately check its system name too). Unfortunately, I couldn't quite figure out a clean way to get it to, in one query, return two system-station pair's data without running into uniqueness issues while avoiding JOIN. Obviously could do Here's a solution relying on JOIN (that also drags in the item stats rather than assuming we have that loaded in Python already to query against):
It performs basically within the same standard deviation of a single query using CTEs above, so probably is fine. I remembered some vague tribal knowledge of "JOIN with way too large tables can lead to massive inefficiencies", but I'm guessing that either was never true or got optimised better sometime in the last oh 10-20 years since I stared at SQL... The LHS 1348 system has a Haxel Terminal too, the above query properly avoids that (as it should). I'll see about prototyping that into an actual python script and measuring its performance, for sure it'll beat the current implementation of That actual prototype though will likely be either later this weekend or when I get time later, as I attempt to actually still have fun playing the game rather than doing tools for the game.. |
Beta Was this translation helpful? Give feedback.
-
Calling
trade.py source destination
appears to attempt to load the entire database and then perhaps do some additional processing considering the amount of time it takes.This is obviously wasteful, it has been given to exact starting and end zones, all it really needs to do is pull out the data for both of those locations and then compare them. And if pulling out the data requires loading the entire database rather than searching the database with much lower memory requirements... then that sounds like a pretty major usability issue considering how large the database is now.
I've noticed this for all the commands really, they seem to spend tons of time loading the entire database and then doing the culling operations, if even, requested of the operation, making all queries slow. The load could be avoided by just making the application a REPL, something I believe that has been mentioned in previous issues, or some sort of (local) sever-client architecture, though making cases that obviously don't need a full load not do the full load would be a good step too.
Beta Was this translation helpful? Give feedback.
All reactions