-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create alembic script for institutions table update #48
Comments
A few thoughts and questions. Just trying to make the column names more explicit, and keep things flexible for the future. Some of these are probably questions for @nongarak and @Kibrael.
|
@nongarak can you post the answer Larry gave us about foreign owned/controlled entities that are covered for US reporting? |
By having a separate lookup table we get a few things.
Yeah. We should figure out how to differentiate those.
👍 Yep. I think the the ISO-3166 standard. To add to the fun there, it sounds like the standard allows for those US subdivisions to be either their own country ( |
@Kibrael and I chatted about some of this...
|
Here's my take on the current state of things here. There's still a fair number of questions, but nothing that is a show-stopper from starting work on all this. I'll work with @Kibrael and @nongarak to try to get the rest answered over the coming days. Below are the tables we'll need for this work. I've included the data for the lookup tables when available.
|
column | datatype | constraints | notes / questions |
---|---|---|---|
lei |
CHAR(20) |
required, unique | Table's primary key |
legal_name |
VARCHAR |
required | |
tax_id |
CHAR(9) ❓ |
❓required, unique | Q1: Do we know if this is only EIN (12-3456789 ) format, or will SSN-formatted IDs (123-45-6789 ) come through as well? Q2: Do we strip dashes and just have this be 9 int digits? |
rssd_id |
INTEGER |
❓required, unique | |
primary_federal_regulator_id |
VARCHAR(4) |
❓ | Foreign key to federal_regulator table. |
hmda_institution_type_id |
CHAR(❓) |
❓ | Foreign key to hmda_institution_type table. |
sbl_institution_type_id |
VARCHAR ❓ |
❓ | Foreign key to sbl_institution_type table. |
hq_address_street_1 |
VARCHAR |
required | Q: Do we need more than 2 street_x columns? @nongarak's demo showed how GLEIF's address data can has up to 5 (I think), and the first is frequently filled with C/O type data, not actual address data. Or will downstream NIC/GLEIF data munging get us down to just 2 useful fields? |
hq_address_street_2 |
VARCHAR |
||
hq_address_city |
VARCHAR |
||
hq_address_state |
CHAR(2) |
Foreign key to address_state table |
|
hq_address_zip |
CHAR(5) ❓ |
required | Q: Do we need to support 12345-6789 extended zips? Q: Do all territories have "zip codes"? Should we make this a more universal hq_address_postal_code ? |
parent_lei |
CHAR(20) |
||
parent_legal_name |
VARCHAR |
❓ | Q: Required if parent_lei or parent_rssd_id is set? |
parent_rssd_id |
INTEGER |
||
top_holder_lei |
CHAR(20) |
||
top_holder_legal_name |
VARCHAR |
❓ | Q: Required if parent_lei or parent_rssd_id is set? |
top_holder_rssd_id |
INTEGER |
federal_regulator
table
column | datatype | constraints | notes / questions |
---|---|---|---|
id |
VARCHAR(4) |
required, unique | |
name |
VARCHAR |
required, unique |
As for the data we insert into this lookup table, I think we can just follow what's in NIC. Per page 26 of NIC BULK DATA DOWNLOAD DATA DICTIONARY AND REFERENCE GUIDE
The Primary Federal Regulator column contains the name of the agency that is the primary regulator of a depository institution, non-deposit trust company, bank holding company, thrift holding company, farm credit bank, or federal housing enterprise. The following agencies are represented:
FCA = Farm Credit Administration
FDIC = Federal Deposit Insurance Corporation
FHFA = Federal Housing Finance Agency
FRS = Federal Reserve System
NCUA = National Credit Union Administration
OCC = Office of the Comptroller of the Currency
OTS = Office of Thrift Supervision (only valid until July 21, 2011)
Q: Should CFPB be in that list too?
hmda_institution_type
table
column | datatype | constraints | notes / questions |
---|---|---|---|
id |
❓ | required, unique | Table's primary key |
name |
VARCHAR |
required, unique |
@Kibrael, what values will go in this table? Is this the same as NIC's ENTITY_TYPE? Per page 14 of Per page 26 of NIC BULK DATA DOWNLOAD DATA DICTIONARY AND REFERENCE GUIDE:
The Entity Type field is derived from other fields. The entity types listed below are in alphabetic
sequence by entity code.AGB - Agreement Corporation - Banking
AGI - Agreement Corporation - Investment
BHC - Bank Holding Company
...
sbl_institution_type
table
column | datatype | constraints | notes / questions |
---|---|---|---|
id |
❓ | required, unique | Table's primary key |
name |
VARCHAR |
required, unique |
@Kibrael, what values go in this table?
address_state
table
column | datatype | constraints | notes / questions |
---|---|---|---|
code |
CHAR(2) |
required, unique | Table's primary key. |
name |
VARCHAR |
required, unique |
Values here should be pretty self-explanatory. All 50 states and D.C. and territories...
code | name |
---|---|
AS |
American Samoa |
DC |
District of Columbia |
GU |
Guam |
MP |
Northern Mariana Islands |
PR |
Puerto Rico |
UM |
United States Minor Outlying Islands |
VI |
Virgin Islands, U.S. |
...assuming we're going with ISO-3166.
Whoops, sorry for the delay. Turns out I did not have notifications turned on for GH Public. I'm glad you've answered the hard questions without me though! Some thoughts: re: TIN vs. SSN. I'll ask Regs, but you're right - sole props / single member LLCs can use SSN instead of TIN. I doubt we'll have that many of them, but it's a case worth considering. We probably couldn't publish those. re: street address columns. I think it's probably best if we include all of them, as we will be pulling it straight from GLEIF. Long term maybe we could do some data munging, but I don't think it's a priority and also there is zero consistency in how folks submit this data. re: ZIP code. There is nothing in the reg specifying what HQ address means, so I think we can safely suck in whatever GLEIF has. If we can suck in the full zip5+4 code for their confirmation purposes, we can later strip it to zip5 for any other analysis/reporting purposes if we need to. All US territories also use ZIP code. I don't think there's a need for calling it "postal code" re: parent/top_holder_legal_name - definitely required if the parent RSSD or LEI are set, but also can be set without those things (not all parents will have RSSD or LEI). re: should CFPB be on the prim fed reg list - that's definitely a @Kibrael question, but if there's no decisive answer we can ask Regs. My gut says no, because I feel like PFR has some definition somewhere that we aren't aware of, but also CFPB has "primary authority to enforce federal consumer financial laws for banks and other depository institutions with total assets of more than $10 billion, and their affiliates" (though I'm not sure what the difference between primary supervision/enforcement authority and PFR might be). re: HMDA institution type code: I haven't parsed through it all, but this ticket outlines the logic used to describe it (on GHE) re: SBL Type of Financial Institution - here is what the reg says the options are.
I guess that means we need to add another column for free form text field to your table, @hkeeler re: @Kibrael's earlier question about the readout from Larry on foreign controlled entities, he said there should be a US HQ address that they would report. So a big international bank (like HSBC or whatever) would report their US entity information. |
Other notes:
If you want _ between every word like hq_address_line_1 that's fine with me. |
I went through and changed all of the column names in my code to match the names hans gave above (except the address naming convention). The final list is this, in order of data point in the rule:
|
@nargis-sultani you're right, technically we don't need the domains field in this dataset. I have that all cleaned up and parsed out in a separate table and that's probably more appropriate, I can remove it from the current iteration of the dataset here. |
|
the fields that need to be added and modified:
lei
tax_id
rssd
respondent_name (modified from
name
)parent_id_rssd
parent_name
top_holder_rssd
top_holder_name
prim_fed_reg
street_line1
street_line2
city
state_cd
zip_cd
parent_lei
top_holder_lei
fi_type
The text was updated successfully, but these errors were encountered: