Skip to content
This repository has been archived by the owner on Sep 11, 2024. It is now read-only.

Merge function inherently flawed #50

Open
kelle opened this issue May 13, 2016 · 7 comments
Open

Merge function inherently flawed #50

kelle opened this issue May 13, 2016 · 7 comments
Assignees
Labels

Comments

@kelle
Copy link
Member

kelle commented May 13, 2016

Merge function needs to retain id/primary keys from modified db and not generate new ones by default.

@kelle kelle added the Critical label May 13, 2016
@kelle
Copy link
Member Author

kelle commented May 13, 2016

until this is fixed, we should go back to modifying bdnycdev.db in dropbox and rely on Dropbox notify us of conflicted copies. Only people with write access include me, @dr-rodriguez , and @hover2pi

@hover2pi
Copy link
Member

I don't fully understand the problem.

What should happen is that records from the modified SOURCES table are merged into the master SOURCES table and given a new source_id. Then the source_id column is updated in all relevant tables.

Is this not what happens?

There could be a problem arising from the SOURCES table not having a NOT NULL column requirement since this is what astrodbkit looks for when testing record duplication. For example, if you modify a record in the SOURCES table then merge, it will think they are two separate sources. Is that what you mean?

@kelle
Copy link
Member Author

kelle commented May 19, 2016

This workflow is dangerous because the modified database could reference the source_id's in multiple tables which are not merged at the same time as the sources table. In general, the idea of changing primary keys of tables when merging is very unsettling. Every effort should be made to maintain the primary key from the modified table and give big big warnings when it will be changed.

@paigegiorlagodfrey
Copy link

Perhaps if the source_ids are a unique identifier instead of a random ordered number set. Such as, a randomly generated string of numbers and characters of a set length. OR, and better, the shortname of the object minus the +/-. For example, 1503+2525 would have source_id = 15032525. In the case where objects share shortnames, we'd need a remedy, but in most cases this would be unique and not necessary to renumber when merged like the current workflow.

@dr-rodriguez
Copy link
Member

The source_ids are already a unique identifier. I haven't tested if source_ids are being updated, but found the code that's supposed to do it, so I can look into that later.

I thought the problem was more with non-source_id tables. For example, updating a publication creates a new publication id rather than replacing the existing one as it has no way of checking for duplicates. This new record is not updated throughout the sources table, or anywhere else.

@dr-rodriguez
Copy link
Member

For example, if you modify a record in the SOURCES table then merge, it will think they are two separate sources. Is that what you mean?

Just tested and indeed this is what happens. There is no way to check for duplicates in the SOURCES table and so the source_id would not be replaced anywhere. The same thing happens for the PUBLICATIONS table (and likely elsewhere): an updated record becomes a new one whose new id is not reflected in SOURCES.

@hover2pi
Copy link
Member

hover2pi commented Jun 3, 2016

Right, so the first order solution is to require a UNIQUE NOT NULL column in addition to the id column.

The source_id changing across other tables could be remedied by just not allowing individual table merges. Modified databases should always be completely merged into the master so source_ids can be updated in all tables if something in the SOURCES table changes.

@hover2pi hover2pi self-assigned this Jul 6, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants