This week was mostly focused on studying the paper named Superbubbles, Ultrabubbles, and Cacti by Benedict Paten et. al (which can be found here) in order to improve the bubble detection capabilities of my script. As you may remember from last week, VCF validation didn't go as expected, and I think Bubble Detection may be (at least part of) what's causing the issue. So, I was suggested to use that paper as a reference for a new, more general, Bubble Detection algorithm.
The current approach, which is heavily inspired by Flavia95's work, albeit with a few changes (such as using a BFS instead of a DFS), has two main drawbacks:
- It is impossible to detect multiple (simple) bubbles that happen at the same distance from the root. This is due to the algorithm looking for a single node at a given level. This is clearly not the case when multiple bubbles are open at the same time.
- It does not detect superbubbles properly. While there is a simple way to detect when a bubble is opening, it's harder to tell when it must be closed.
An example of a problematic graph |
- Contract the grey edges in the biedged graph
- Merge the vertices in each 3-edge-connected component, to obtain a cactus graph
- Contract the edges in each cycle, to obtain a bridge forest
The main problem with applying this algorithm to a Handlegraph instead of a biedged graph is that in step 1 the grey edges get "eliminated" (or rather, contracted) so that only black edges remain. But what does this entail in a Handlegraph? It can't be multiple nodes because there is no edge (grey edge in the biedged graph) connecting them; it also makes no sense to have a single node representing the whole graph. This is still an open issue, and I'll try to solve it by taking a close look at how vg implements this algorithm.
My take on simulating step 1 of the algorithm above, but is it actually correct? |
No comments:
Post a Comment