Sunday, June 21, 2020

Week #3: Getting GFAtoVCF to work

As promised last week, here's a first look at a VCF file produced by my program:


See the full size image here

I was actually able to get it to work quite early in the week, with a small caveat: it only worked with a specific dfs-tree given as input, more specifically the one produced by the Python version of GFAtoVCF. When a program only works with a specific set of input, it means there's something radically wrong with it -- I only had to figure out why it was wrong, and how to fix it.

The first thing I actually noticed was that different bubbles were being detected depending on the dfs-tree obtained by the visit. This has always confused me since, starting from the same graph, I expected to always find the same bubbles. More specifically, when starting from the dfs-tree produced by Rust, multiple bubbles of length 1 were being produced, leading to errors at the end of the program.


See the full size image here

So, what I tried to do was finding a way to obtain the same bubbles no matter the dfs-tree, i.e. to "compress" contiguous bubbles of length 1 into a single longer bubble. I never really liked this solution since it wasn't really a solution per se, but more of a workaround.

Then, after discussing for a while with Gianluca Della Vedova, another idea came to mind: what if the DFS was the problem, and instead we needed another kind of visit? And what if that visit was a BFS? If you recall from last week, we found out that both dfs-trees were actually correct, but some nodes had different distances from the root, depending on the visit; with a BFS, such a thing cannot ever happen.

So, I implemented the BFS in Rust, and everything started working correctly, since if a given node has a certain distance from the root element, it will always have that distance. Since I now have a greater understanding of how HandleGraph works, it wasn't really that hard to implement.

Now only tests and an in-depth documentation are missing. Expect a long blog post next week, where I'll explaining how the program works in detail, and what's going to happen next!

No comments:

Post a Comment