Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying AGE3 database #3

Open
MorganArrington-NOAA opened this issue Feb 23, 2023 · 12 comments
Open

Querying AGE3 database #3

MorganArrington-NOAA opened this issue Feb 23, 2023 · 12 comments

Comments

@MorganArrington-NOAA
Copy link
Contributor

MorganArrington-NOAA commented Feb 23, 2023

@BethMatta-NOAA @EstherDGoldstein-NOAA

Starting a thread about trying to query directly from the AGE3 database to pull data into R. Talking with both of you, it sounds like this is something that would help all of us streamline things. There is some draft code at the top of the "Read Opus Files in R" script in this repository. I repurposed this chunk from code I've used to query the RACEBASE database and haven't yet tried to make this connection with AGE3. I wonder if we will need to work with Jon to make sure we have the right permissions? I have no idea how it works to get set up initially. For RACEBASE, it required working with IT and getting a username and password.

@EstherDGoldstein-NOAA
Copy link
Contributor

I got this from Jon and connected to AGE3 using it. I was able to see the tables, I just didn't know how to query properly.

dbconnection <- odbcDriverConnect("Driver={SQL Server};Server={AKC0SS-VI-071.nmfs.local,1919};Database=AGE3;Trusted_Connection=yes;")

@MorganArrington-NOAA
Copy link
Contributor Author

Oh that's awesome! I think the function I put in there may let us query then, just by updating the table names from RACEBASE tables to the appropriate AGE3 table. I'll check this out today and let you know how it works.

@BethMatta-NOAA
Copy link

Cool, thank you both for sharing this. For querying, I found this page that might be useful? I haven't tried any of it yet.

@BethMatta-NOAA
Copy link

So I tried the code you provided in combo with stuff from that link I sent and I think this will do it; we just have to make sure we're adjusting the querying for the relevant tables. This is a simple example but it works.

#Test querying from age3

library(RODBC)

#establish connection
con <- odbcDriverConnect("Driver={SQL Server};Server={AKC0SS-VI-071.nmfs.local,1919};Database=AGE3;Trusted_Connection=yes;")

#see table names
sqlTables(con, tableType = "TABLE")$TABLE_NAME

#try a query
surveys <- sqlFetch(con, "survey_cruises")
str(surveys)

qry <- "SELECT * FROM survey_cruises WHERE cruise_number = 197902"
cruise197902 <- sqlQuery(con, qry)
str(cruise197902)

odbcCloseAll() #closes connection

@BethMatta-NOAA
Copy link

The hardest thing will probably be getting the SQL commands right (if you're like me and not used to using them)

@BethMatta-NOAA
Copy link

So, I started delving a little deeper and it looks like we may need to join tables prior to querying the data we want. I looked at the "nir_scans" table and it doesn't appear that you can query by species (other than maybe by parsing it from the scan name, although this seems like a pain) or other collection variables. Here's the table structure.

str(NIR)
'data.frame': 114718 obs. of 17 variables:
$ scan_id : int 699 700 701 702 703 704 705 706 707 708 ...
$ data_session_id : int 5 5 5 5 5 5 5 5 5 5 ...
$ structure_id : int 6355684 6355714 6355833 6355948 6356017 6356152 6356183 6356683 6356687 6356770 ...
$ side : chr NA NA NA NA ...
$ broken : int 0 0 0 0 0 0 0 0 0 0 ...
$ crystallized : int 0 0 0 0 0 0 0 0 0 0 ...
$ other_problem : int 0 0 0 0 0 0 0 0 0 0 ...
$ percent_affected: int NA NA NA NA NA NA NA NA NA NA ...
$ unscannable : int 0 0 0 0 0 0 0 0 0 0 ...
$ scanned : int NA NA NA NA NA NA NA NA NA NA ...
$ comments : chr "" "" "" "" ...
$ scan_name : chr "162201701140_22_OT1" "162201701140_30_OT1" "162201701140_46_OT1" "162201701140_70_OT1" ...
$ run_number : int 1 1 1 1 1 1 1 1 1 1 ...
$ instrument : chr "T" "T" "T" "T" ...
$ misc_data : num NA NA NA NA NA NA NA NA NA NA ...
$ timestamp : POSIXct, format: "2022-02-03 10:18:00" "2022-02-03 10:39:00" "2022-02-03 12:20:00" "2022-02-03 13:06:00" ...
$ structure_weight: num NA NA NA NA NA NA NA NA NA NA ..

And I couldn't really tell in which table we could find the traditional age data. Should I start an email thread with Jon, or does he have a github account?

@EstherDGoldstein-NOAA
Copy link
Contributor

EstherDGoldstein-NOAA commented Feb 23, 2023

Thanks for delving into this! Jon does have a really great schematic of the database on the wall in his office, and also has a Github account @JonShort4.

@MorganArrington-NOAA
Copy link
Contributor Author

Thanks, Beth! Oh interesting. Looks like there isn't a "species" field in the NIR data table. I wonder if that's something that would be fairly straightforward to add to make querying easier? Definitely a question for Jon.

@BethMatta-NOAA
Copy link

Hi @JonShort4, we are learning how to query from age3 directly in R and have a few questions about the tables that came up in the comment thread above. When you have a chance, could you look this over and maybe we could have a follow-up conversation next week?

@JonShort4
Copy link

Hi guys, yeah we should have a conversation about this before you put much effort into it. I already have data views that join tables and make querying very easy. We should meet though to make sure they provide what you are looking for.

@BethMatta-NOAA
Copy link

BethMatta-NOAA commented Feb 23, 2023

Sweet! That would be excellent. What day/time would be good for everyone? I could do this afternoon or pretty much any time next week.

@BethMatta-NOAA
Copy link

BethMatta-NOAA commented Mar 9, 2023

I'm sure you're all way ahead of me, but just in case you need to join tables in your query, here's a way to do it. This joins the "fish with ages" view to the species and survey cruises tables. I was not very familiar with SQL queries prior to this, so it was a learning experience for me and would love to spare you some grief if you're in a similar boat.

library(RODBC)

#set up query
species <- 'WALLEYE POLLOCK' #as recorded in species table
collection_year <- 2018 # Enter year samples taken
collection_type <- 'survey' #'survey', 'fishery', 'special'
region <- 'BS'
cruise <- 201801
survey <- 1 #1 for groundfish trawl, 2 for longline, and 3 for hydroacoustic (in survey cruises table)

#Query from age3
dbconnection <- odbcDriverConnect("Driver={SQL Server};Server={AKC0SS-VI-071.nmfs.local,1919};Database=AGE3;Trusted_Connection=yes;")

#age data
data <- sqlQuery(dbconnection,paste("SELECT A.collection_type, A.collection_year, A.cruise_number, A.date_collected,
A.final_age, A.length, A.read_age, A.readability, A.region, A.sex, A.species_code, A.specimen, A.test_age,
A.vessel_code, A.weight, B.survey_type, B.description, C.common_name
FROM fish_with_ages_v AS A
JOIN survey_cruises AS B
ON A.cruise_number = B.cruise_number
AND A.vessel_code = B.vessel_code
JOIN species as C
ON A.species_code = C.species_code
WHERE (C.common_name = '",species,"') AND (A.collection_year = ",collection_year,")
AND (A.collection_type = '",collection_type,"') AND (A.region = '",region,"')
AND (A.cruise_number = ",cruise,") AND (B.survey_type = ",survey,")", sep=""),believeNRows=FALSE)

odbcClose(dbconnection)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants