Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Jackson2024 #1161

Merged
merged 13 commits into from
Apr 5, 2024
Merged

Merge Jackson2024 #1161

merged 13 commits into from
Apr 5, 2024

Conversation

iseultj
Copy link
Contributor

@iseultj iseultj commented Apr 4, 2024

Before merging - should I also add to single genome host assoc.? Raw reads & aligned bams available for T. forsythia and S. mutans (and bams aligned to T. denticola, but this is basically unusable in downstream analysis as is).

Pull Request

This PR is for a

For the following list(s):

  • ancientmetagenome-environmental (README)
  • ancientmetagenome-hostassociated (README)
  • ancientsinglegenome-hostassociated (README)

This is to close #1160

PR Workflow

  1. Open this PR with sample metadata on the samples metadata sheet (:tada: you're already here!)
  2. Wait for checks for sample metdata to pass
  3. (If checks fail) make corrections, and push changes to this branch (no need to open a new PR!)
  4. (Once passed) comment on this PR @spaam-bot please autofill <table_name> <project_id> to get a half-filled template! (may take a minute or so to get the comment with the file)
  5. Fill in the template, and verify autofilled data correct!
  6. Once filled in, append(!) the new rows from the TEMPLATE file to the end of the corresponding library metadata file, and update this PR
  7. Wait for checks for library metadata to pass
  8. Request review!

Pre-review checklist (new publications)

  • Publication is published
    • Preprints currently not accepted?
  • Checked the publication is not already in the database?
  • [ x] Checked samples in this publication are not previously published data?
    • Newly re-sequenced metagenomes are OK!
  • Samples are shotgun metagenomes and not amplicon data
    • Note: hostassociated-singlegenome may also contain whole-genome enriched data
  • Checked the list follows conventions as described in the corresponding sample type's README file (e.g. using ERS/SRS accession codes for ENA/SRA)?
  • Once sample table validation completed Library metadata has been added
    • Use @spaam-bot please autofill <table_name> <project_id> to get a half-filled template! May take a minute or so to get the comment with the file
    • Fill in template, and verify autofill data correct!
    • Once filled in, append(!) the new rows to the end of the corresponding library metadata file
  • Changelog is updated to include the publication under 'Added'?

@iseultj iseultj requested a review from jfy133 April 4, 2024 12:35
@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

@spaam-bot please autofill ancientsinglegenome-hostassociated_libraries.tsv Jackson2024

fixed accidentally pasting in header
@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

@spaam-bot please autofill ancientsinglegenome-hostassociated_libraries.tsv Jackson2024

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2024

@iseultj it needs to be the table name without _libraries.tsv suffix, I realise now the terminology is unclear now 😅

@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

@iseultj it needs to be the table name without _libraries.tsv suffix, I realise now the terminology is unclear now 😅

Oh cool! that makes sense 😄 could you merge the other PR I made (#1162) so Streptococcus mutans is an allowed species? Thanks!

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2024

@iseultj it needs to be the table name without _libraries.tsv suffix, I realise now the terminology is unclear now 😅

Oh cool! that makes sense 😄 could you merge the other PR I made (#1162) so Streptococcus mutans is an allowed species? Thanks!

I approved it, do you not have merge permissions?

@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

@iseultj it needs to be the table name without _libraries.tsv suffix, I realise now the terminology is unclear now 😅

Oh cool! that makes sense 😄 could you merge the other PR I made (#1162) so Streptococcus mutans is an allowed species? Thanks!

I approved it, do you not have merge permissions?

No, sorry!

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2024

Oh! Ok! Merged for you!@

@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

@spaam-bot please autofill ancientsinglegenome-hostassociated Jackson2024

@iseultj
Copy link
Contributor Author

iseultj commented Apr 4, 2024

All done - I think it might be better to get the ftp link for the submitted file rather than the file generated by the ENA/SRA though (which is what it does currently)- I have put ftp links and md5sums for the submitted files, because with the aligned bam, you get a generated fastq and have to realign anyway (which I assume people would prefer to go to the raw data for anyway). If you prefer the generated file, let me know and I'll modify it again :)

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2024

@spaam-bot please autofill ancientsinglegenome-hostassociated Jackson2024

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2024

@spaam-bot please autofill ancientmetagenome-hostassociated Jackson2024

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really good :D a couple of structural changes but the accuracy is good!
Regarding the generated/submitted: we prefer to go for the generated one as this is more consistent, I've found occasionally submitted stuff is a bit broken (that has been fixed with the generated one).

Thank you @iseultj !

Comment on lines 665 to 667
Jackson2024 2024 10.1093/molbev/msae017 Killuragh 52.6 -8.33 Ireland KGH2-F Homo sapiens 3775 10.1093/molbev/msae017 bacteria Tannerella forsythia tooth chromosome ENA "raw,reference_aligned" PRJEB64128 ERS15977755
Jackson2024 2024 10.1093/molbev/msae017 Killuragh 52.6 -8.33 Ireland KGH2-B Homo sapiens 3775 10.1093/molbev/msae017 bacteria Streptococcus mutans tooth chromosome ENA "raw,reference_aligned" PRJEB64128 ERS15977754
Jackson2024 2024 10.1093/molbev/msae017 Killuragh 52.6 -8.33 Ireland KGH1-E Homo sapiens 3775 10.1093/molbev/msae017 bacteria Tannerella forsythia tooth chromosome ENA "raw,reference_aligned" PRJEB64128 ERS15977753
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about having sample IDs per tooth, however in this case you would just have two KGH2 lines, each for the two species.

You can remove the quotes around the raw,reference_aligned bit

Comment on lines 3064 to 3066
Jackson2024 2024 10.1093/molbev/msae017 KGH2-B ENA PRJEB64128 ERS15977754 KGH2-B-EX3-UDG1-Hi4.mutans double AccuPrime Pfx DNA full-udg Illumina HiSeq 2500 SINGLE WGS 3493602 ERR11658334 bam_mapped ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658334/KGH2-B-EX3-UDG1-Hi4.trimmed25bp25q.mutans.sorted.grouped.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658334/KGH2-B-EX3-UDG1-Hi4.trimmed25bp25q.mutans.sorted.grouped.bam.bai 4990a7e776b5881a681f3c2d31a82a1c;bcb496be7425fd6b653f0c96a3826c48 171788860;6736
Jackson2024 2024 10.1093/molbev/msae017 KGH2-F ENA PRJEB64128 ERS15977755 KGH2-F-EX3-UDG1.forsythia double AccuPrime Pfx DNA full-udg Illumina HiSeq 2500 SINGLE WGS 221487 ERR11658336 bam_mapped ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658336/KGH2-F-EX3-UDG1.trimmed25bp25q.forsythia.sorted.grouped.fish.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658336/KGH2-F-EX3-UDG1.trimmed25bp25q.forsythia.sorted.grouped.fish.bam.bai 8358ef4f9945c68d64cb650ea2ca44e6;36939624efba902cdd817ce0485b8c06 11720746;8664
Jackson2024 2024 10.1093/molbev/msae017 KGH1-E ENA PRJEB64128 ERS15977753 KGH1-E-WEX1-UDG1.forsythia double AccuPrime Pfx DNA full-udg Illumina NovaSeq 6000 PAIRED WGS 841339 ERR11658335 bam_mapped ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658335/KGH1-E-WEX1-UDG1.trimmed25bp25q.forsythia.sorted.grouped.fish.bam;ftp.sra.ebi.ac.uk/vol1/run/ERR116/ERR11658335/KGH1-E-WEX1-UDG1.trimmed25bp25q.forsythia.sorted.grouped.fish.bam.bai 8e5939d645c3c5e5935b2e2dc8468e89;efc79ec9b721cfd9a0b482f71f4a3483 33654513;9720
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for the BAMs should be fastq_mapped and use the generated file, as ENA doesn't offer a regenerated BAM file.

iseultj and others added 3 commits April 5, 2024 09:17
…tassociated_samples.tsv


rounded to nearest century

Co-authored-by: James A. Fellows Yates <[email protected]>
Fixed date to nearest century and  removed quotation marks around data type
@iseultj
Copy link
Contributor Author

iseultj commented Apr 5, 2024

@spaam-bot please autofill ancientmetagenome-hostassociated Jackson2024

@iseultj
Copy link
Contributor Author

iseultj commented Apr 5, 2024

@jfy133 I've made most of those changes - I don't think it really makes sense to treat them as one sample, given that when analysed there are 2 different genomes from each tooth, and if people want to include them in metagenomic analysis, each subsample of the tooth is pretty different. If you're ok with that, I think this is ready to merge - I have updated the dates and ftp links, and the automatically generated table is working well :)

@jfy133 jfy133 self-requested a review April 5, 2024 11:11
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough if you consider the subsamples as distinct samples! Will retain structure as is :)

OK looks good, I see you removed the BAM files but I guess that's OK still has you have the raw reads anyway, so no need to duplicate :) I will fix the conflict and merge in - thanks for the rapid turn around @iseultj !

Copy link

github-actions bot commented Apr 5, 2024

AMDirT, version 1.5.0

Samples

Ancient Metagenome Host Associated

ancientmetagenome-hostassociated_samples.tsv is valid

Ancient Single Genome Host Associated

ancientsinglegenome-hostassociated_samples.tsv is valid

Ancient Metagenome Environmental

ancientmetagenome-environmental_samples.tsv is valid

Libraries

Ancient Metagenome Host Associated

ancientmetagenome-hostassociated_libraries.tsv is valid

Ancient Single Genome Host Associated

ancientsinglegenome-hostassociated_libraries.tsv is valid

Ancient Metagenome Environmental

ancientmetagenome-environmental_libraries.tsv is valid

@jfy133 jfy133 merged commit d7e1cb2 into master Apr 5, 2024
1 check passed
@jfy133 jfy133 deleted the jackson2024 branch April 5, 2024 11:15
@jfy133 jfy133 mentioned this pull request Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jackson 2024
2 participants