-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are individual crew results across multiple years a more complete record? #5
Comments
See https://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/linc/linc_m1e.txt and Lincoln I do not race for a few days (and drop down one place each day but the fact they didn't race is represented in the chart. |
For the same chart https://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/sjoh/sjoh_m1e.txt shows St John's I dropping out by a Brasenose II (https://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/bras/bras_m2e.txt) don't race for five days but have a position each day. They then race for one day before dropping out! |
Which suggests there's merit in capturing on a per day and per crew basis whether the crew raced or dropped out? |
I agree. Some crews clearly drop out, never to race again. I'd assume the next day all crews behind them effectively start one station up, so there isn't a gap left in the start order. Do you think this is correct? Either that division has one fewer racing crews or one crew gets moved up from each division so the bottom division has one fewer crew. I think this happened relatively recently in Cambridge with a crew getting a penalty that resulting in them not racing again, but I'd need to find out when that was (or if I imagined in). In the Eights results as soon as they have more than one division (1874) I can't see any higher division crews not racing each day. If a crew doesn't race that day, but does race later on, I'd assume the same happens when they didn't race - all crews behind start one station up, so there isn't a gap left. The Eights rules changed in 1841, from then on you lose one place each day you don't race, previously you went to the bottom. This is shown differently from when a crew never races again - we see where the crew would have been in the start order, but if the crew withdraws we end their line with a dot. I'm assuming when a crew doesn't race that day, we don't know whether they'll be coming back later, it's only when racing is completed we would know whether we should show their virtual position in a bumps chart. In Anu's text results, this means each day a crew doesn't race we don't know whether to put 999 position results, or track +1 each day with a -1 in the flags column, until the end of the set. I'm not sure whether this a problem or not. I'd almost prefer to represent the two cases the same way, so that even if the crew doesn't race again you still show their virtual start position. That way you can do the results incrementally each day and not change previous results based on the next day of racing. If that's the way to do it, then for the tg_format you need a variant of the 'e' code, which means move this crew exactly this many places but they didn't actually race, so something like 'v' for they virtually moved places. |
Having tried to do this, I run into a problem - I end up with Lincoln ahead of Christ Church II on the last day, whereas the chart above is the other way around. I can't quite figure out where the difference comes from. So maybe I should abandon this idea, and just think how to reproduce the kind of chart above... |
So using 'x' as a code for withdrawing completely, and 'v' for the virtual move but not racing that day, here's the result I get: |
I think the earliest ad_results file in this repo for Eights is 1892, and Torpids is 1900. Is there a reason for not having the earlier ones? In https://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/e1847/e1847m.txt we don't have the flags to indicate which days should be indicated as not raced (ie Brasenose II is shown as '0 1 0 0 2 1 2 -99'. Is this why you didn't include them? If the 'not raced' flags are only in the per-crew history then do we need to use these are the source and generate the tg_results data from that? For example, https://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/bras/bras_m2e.txt has the full information: |
If that's the case, we need to get all the per-crew stats files, and read them all. Do you have them all downloaded and available somewhere, or do we need to crawl Anu's site and download everything? |
Lots of food for thought! As for why the results in this repo stop at 1892 for Eights and 1900 for Torpids. 1891 Eights is the first event with crews dropping out and I haven't written any parsing/exporting code to handle that behaviour. For Torpids I think I just stopped at a round number. I've just pushed up Torpids results back to 1880 until I encounter the first crew dropping out. |
I've pulled down the per crew stats files just in case we want to make use of them. |
Okay, thanks. I'll have a look at whether I can automatically create results files for all years from this data. It might be quicker just to manually write out each year, since most of the tricky years (<1890) only have a single division per year, but the challenge of doing it automatically is an interesting one! |
I've written the tool to read these files and turn them into sets of results, including coping where crews drop out or skip races. I currently have three issues:
|
Problem 1: 1969: Merton 2nd and 3rd boat have the same data. The ad_results for 1969 don't list Merton 2, just the 1st and 3rd boat. So I think in the ad_results file Merton 3 needs renaming as Merton 2, and in the per-crew results the entry for Merton 3 needs the 1969 line removing (or changing to 999). I have changes that fix this. 1981: Taking the ad_results data as correct the problem is with the per-crew results file for LMH, I have a change that fixes this. 1978: I think the ad_results for this set is wrong - for many crews the results for the last three days are added together and the total is put as the second day change, with the last two days as wrong. The top of the first division also doesn't match between the two sources of data, but the other way around (ad_results file has Keble going up one each day, the per-crew file has them going up two on the first and third day. So it's not clear how to fix this yet. |
Problem 2: Examples of a missing files from recent results: (I can't see these files on https://web.archive.org/web/*/eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/*)
Examples of missing files from old results: (again I can't see any evidence of per crew files)
Assuming that we only really care about regenerating results from <1900, then I think we only need to worry about these older colleges, rather than the lower boats from the first list, which can probably be done manually unless you have more data stashed away somewhere! |
I'll have more time to devote to this next week but here are my thoughts (I'm reading your posts in order).
Hmm, yes, I see the same.
Probably a few things going on here. I definitely didn't get every file in my download. I went through every college listed in the 'contents' section and attempted to get up to ten boats. Looking at the men's 1995 Torpids I missed Manchester and possibly Templeton (if they are considered different to Green Templeton). The 1995 results also look like not much racing happened full stop.
I agree it'll end up being a manual process. Which reminds me to schedule some time to input some older Cambridge results.
👍
👍
Well spotted that the second day looks like the sum of the last three days! Yes, the ad_results file looks broken. It would be nice to have a third source of results. I might be mis-remembering this, but did The Times print results at some point?
I don't have anything extra to hand but I'm up for some manual data entry! |
Okay, I've got to a point where things are basically working and manual checking & data entry are the next steps. Write a bumps chart: Convert an 'ad_format' results chart to a 'tg_format' results chart: Read all the per-crew data files, and try to generate all the tg_format files: The bumps.py file has support for the new results codes (v, x, w, d). |
I've added another option to the last command, if you give it a third directory it doesn't output any results files for ones that it finds in that third directory. My suggestion is that we can both generate this set of 114 candidate files, once we have verified that the results file is correct, commit it into the results directory, so that next time it won't be generated. We can also investigate the missing.txt file, and add crew information into escapes.py so that it will generate more candidate files next time around. |
See http://eodg.atm.ox.ac.uk/user/dudhia/rowing/bumps/ball/ball_m1t.txt for an example. In particular it contains data on whether a crew raced on a particular day.
The text was updated successfully, but these errors were encountered: