Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about halLiftover #301

Open
Marh32 opened this issue Apr 25, 2024 · 5 comments
Open

A question about halLiftover #301

Marh32 opened this issue Apr 25, 2024 · 5 comments

Comments

@Marh32
Copy link

Marh32 commented Apr 25, 2024

Hi,

I'm sorry to bother you. I was a little confused when using halLiftover. When using halLiftover to locate the corresponding conserved element (e.g., Conserved Element 1) in the target genome based on annotations from the reference genome, I occasionally receive empty results. How can I ascertain whether this outcome is due to the genuine absence of this element in the corresponding region of the target genome (Case 1) or because of issues such as poor assembly quality of the target genome, leading to the entire region not aligning properly (Case 2)? Thank you so much for your help.

Best regards,
Hao

Picture1

@glennhickey
Copy link
Collaborator

There's no easy way to check this from the cactus output. You could

  • make a pairwise alignment, maybe with another tool, and if it's aligned there that would be evidence it's a missed alignment (otherwise it would be more evidence of an assembly issue)
  • export a MAF of the region with cactus-hal2maf using --maximumGapLength big enough to span your gap (note: this won't work for very big gaps). If you see a big insertion and deletion in the MAF, that'd be a sign of an under alignment.
  • Do a liftover of the element and its flanking regions, to see if the flanking regions are presetn in the target genome. if they are, and there is some sequence in between, you can manually compare it to your missing element...

@Marh32
Copy link
Author

Marh32 commented Apr 26, 2024

Ok.Thank you so much for your reply. Do I have any tools can extract the specific region of hal file to fasta file format(retain alignment information)?

@Marh32
Copy link
Author

Marh32 commented Apr 27, 2024

In addition, why can a single line in a BED file correspond to multiple alignment results?Does this indicate that a single contig in the BED file aligns to multiple regions?

My understanding is that HAL files are indeed derived from constructing a homology map based on anchors produced by tools like LASTZ during whole-genome alignments, eventually leading to the formation of full-genome comparisons. If an element in a BED file does not reside within a block, it should return an empty result, whereas if it's within a block, it should return a unique mapping result. Why would there be a situation where multiple results are returned?
Picture1
Screenshot 2024-04-27 at 22 47 41

@glennhickey
Copy link
Collaborator

If there's one copy of gene A in species 1 and two copies in species 2, then then all three copies will (probably) be aligned together in Cactus. Due to such paralogous relationships, you can expect a given query region to map to multiple reference regions. There's a tool, halSynteny that tries to filter this somewhat. You can run it yourself or within cactus-hal2chains

@Marh32
Copy link
Author

Marh32 commented Apr 29, 2024

Thank you so much for your reply. I have try to use halSynteny to filter it. And I get the results as follow:
Screenshot 2024-04-29 at 14 46 23
In this situation, should the alignment result of the third line be considered error or attributed to such paralogous relationships? Consequently, when searching for orthologous genes or conserved elements, should I indeed filter out these alignment outcomes(like thrid line)? Also, I find that there is some missing alignment information between blocks in the returned result (such as from 30317824 (in the first row) to 30323901 (in the second row)), is there any way I can get this missing alignment information? Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants