-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"matrix multiplication" statistic #1246
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1246 +/- ##
==========================================
+ Coverage 89.89% 93.39% +3.50%
==========================================
Files 30 28 -2
Lines 29043 27464 -1579
Branches 5672 1259 -4413
==========================================
- Hits 26107 25649 -458
- Misses 1667 1781 +114
+ Partials 1269 34 -1235
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 20 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Very cool!
I can't think of a better one, but I'd be +1 on adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I think we should add a genetic_relatedness_matrix
method as well as a convenience, but you probably had that in mind too?
The code all looks basically right to me, although I have scrutinised the detail. My only question really is about the "weight" term and whether this will be confusing. If it's the same W as in general stat (which I think it is?) then we're all good.
It is the same. So, maybe the argument should be called
Yeah, I guess? We still haven't gotten around to writing the |
Just been playing around with this and it seems like there might be some numerical stability issues. I tried @petrelharp 's code above with different seeds and occasionally get an AssertionError. Actually, the SUPER WEIRD thing is that sometimes it works with the same seed and sometimes it doesn't... i.e. I run exactly the same code twice and the first time it will be fine but the second time returns, e.g., B = [inf, inf ; inf, inf]. Has anyone seen anything like this before? This is on my MacBook btw.
It seems to work fine on rescomp (the Oxford BDI computing cluster, which runs linux). Though this is using python v3.7.10 compared to v3.8.6. Also, the K matrices are ever so slightly different between the two machines. Bizarre... I'll just develop on rescomp for now. |
OK, still some bizarre behaviour on rescomp. See the following code to perform matrix-vector multiplication using the same tree sequence as above. I am using the old code for comparison as
Returns:
|
Yep, I'm seeing that too. Bizarrely, only in an interactive shell,not at the command line. It must be that I've done something bad with memory management. I'm not sure how to track that down - can we run valgrind on the python tests? |
I'm also getting the problem when I run the above code in the command line via a script... Never used valgrind before but could look into it if that would help? |
No, Python gets in the way. We'd have to reproduce the error as a C test case to be able to use valgrind. I have a bit of time in the morning, I could take a look if someone sends me a script to reproduce? |
|
Thanks @petrelharp, this is a weird one. Reliably fails on the 10th seed, but not if we just run that seed on its own. I'll poke around, but it nearly has to be a memory management error. |
I think it's worthwhile putting in some tests to cover the remaining code paths now - this will have to be done later anyway, and it might unearth the problem we're having here. Stuff covering error paths in the Python C stuff is particularly important to exercise, subtle problems can manifest themselves in all sorts of unexpected ways and lots of time can be lost tracking them down. |
That sounds good. I'm pretty swamped right now, might you be able to have a go at the tests, @brieuclehmann? |
Yep, I'd be happy to! Please could you point me towards a relevant function/method that I could use as a template? |
Thanks @brieuclehmann but it's probably better if I put in a commit to test out the code paths I'm thinking about - the testing code is quite convoluted for this stuff, and it would be a good way for me to start getting up to speed on what the function is doing anyway. I'll have a go as soon as I can (hopefully tomorrow AM) |
I think that should do it - I think the issue was that the I don't understand how valgrind missed this though... |
Doh! Good catch. |
I think we should try to get this merged @petrelharp - do you know what's needed to get it in? We don't need to have it fully documented, but it would be good to get it in to main so we can experiment. |
It needs, at least:
The latter should be easy because we can just compare its output to what we'd get by matrix-multiplying the |
Do you think you could update this branch please @petrelharp? If you rebase and force push, then @brieuclehmann can open a PR against your fork with a some tests, and that should finish things up. As far as I can see it's mainly simple interface tests needed here now? |
Ping @petrelharp - I'd be happy to write some tests and more documentation in the coming days so that we can get this merged, but I think I'm not able to rebase the branch myself? |
Done! |
2994659
to
829431f
Compare
FYI I just pushed an update fixing the build problems. |
829431f
to
aae0a30
Compare
Would it be possible to rebase this onto main? I'd like to use (Sorry for dropping the ball on writing the tests & documentation for this - I will get round to it v soon!) |
Uh-oh, a segfault - ping @jeromekelleher? |
Looks like a simple compile error because of API changes we made for C 1.0. Should be fixed. |
Initialise total_weights to 0 and avoid an extra loop through arrays. Fix compile error
First pass at genetic_relatedness_weighted tests Full pass at tests for genetic_relatedness_weighted Update python/tskit/trees.py Co-authored-by: Peter Ralph <[email protected]> Add summary func to genetic_relatedness_weighted tests Fix summary function definition in docs
Closing this in favour of #2785 (which includes these commits) |
Here's a draft of the C code to compute covariances of aribtrary weighted sums of genotypes. Borrowing from @brieuclehmann's code, this should always be true:
(... and, it is, happy day!)
TODO: input checking, and tests. Tests should be straightforward, as we can just check for the property above, but hooking it in to the testing code will take some doing.
Also TODO:
weights
an argument togenetic_relatedness
, which would call one method or the other depending on whethersample_sets
orweights
were present. This could be a general pattern, even.