Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQF is replaced by MQF #1859

Closed
wants to merge 17 commits into from
Closed

CQF is replaced by MQF #1859

wants to merge 17 commits into from

Conversation

shokrof
Copy link
Collaborator

@shokrof shokrof commented May 2, 2018

MQF, Mixed Quotient Filter, is a new variant of CQF. MQF has changed the counting eschme of the parent data structure by adding fixed size counters and using different encoding for the dynamic one.
This improvement makes MQF more memory efficient for a wider range of zipfian distributions. Also, MQF expanded the metadata component of the filter to enable tagging of its kmers. In addition to the basic functions provided by the original CQF, the new datastructure comes with an expanding list of kmer processing functions including functions for merging and resizing the filter, functions for comparing kmer sets, and functions for indexing and meta-analysis of multiple kmer groups. More details about these functions can be found here

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested? No new functionality is added
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
    Khmer source files are changed after running this command.
  • Did it change the command-line interface? Only backwards-compatible
    additions are allowed without a major version increment. Changing file
    formats also requires a major version number increment.
  • For substantial changes or changes to the command-line interface, is it
    documented in CHANGELOG.md? See keepachangelog
    for more details.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)

@shokrof
Copy link
Collaborator Author

shokrof commented May 2, 2018

@standage can you help me with this error?
travis build succeeds on Linux but it fails on OSX.
Build link: https://travis-ci.org/dib-lab/khmer/jobs/373855996.
Error: /Users/travis/.travis/job_stages: line 166: shell_session_update: command not found

@ctb
Copy link
Member

ctb commented May 2, 2018 via email

@codecov-io
Copy link

codecov-io commented May 2, 2018

Codecov Report

Merging #1859 into master will increase coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1859      +/-   ##
=========================================
+ Coverage    0.07%   0.07%   +<.01%     
=========================================
  Files          67      67              
  Lines        6679    6660      -19     
  Branches     2485    2467      -18     
=========================================
  Hits            5       5              
+ Misses       6674    6655      -19
Impacted Files Coverage Δ
src/oxli/storage.cc 0% <0%> (ø) ⬆️
include/oxli/storage.hh 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 599cc34...739d887. Read the comment docs.

@ctb
Copy link
Member

ctb commented May 7, 2018

This is now passing on Mac OS X; looks like the Python 3.4 ci that failed is due to a network timeout, nothing in khmer. Restarting to see.

@ctb
Copy link
Member

ctb commented May 7, 2018

All the tests pass!!

To me, the changes to the current khmer code look straightforward (relatively speaking :). Any objections to merging? @luizirber @standage @camillescott ?

Copy link
Member

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shokrof
Copy link
Collaborator Author

shokrof commented May 7, 2018

@ctb
The problem is not solved. The last commit was an experimental one.
I am using travis to test my code because I don't have Mac machine. It will be great If anybody with MAC machine can help me solving this problem.

@ctb
Copy link
Member

ctb commented May 7, 2018 via email

@shokrof
Copy link
Collaborator Author

shokrof commented May 7, 2018

The problem occurs when c compiler(I think clang) links the binaries together to create cpp-demo program. It seems that the compiler cant find the c++ standard library.

Log : https://travis-ci.org/dib-lab/khmer/jobs/374119637

Linking Command:
c++ -O3 -fPIC -fno-omit-frame-pointer -Wall -Wstrict-null-sentinel -std=c++11 -I ../../include/ -I ../../third-party/seqan/core/include/ -I ../../third-party/smhasher/ -I ../../third-party/mqf/ -I ../../third-party/rollinghash -I ../../third-party/zlib/ -I ../../third-party/bzip2/ -DSEQAN_HAS_ZLIB=1 -DSEQAN_HAS_BZIP2=1 -O3 -fPIC -fno-omit-frame-pointer -Wall -DVERSION= -DNO_UNIQUE_RC=0 -lstdc++ -pthread -install_name /usr/local/lib/liboxli.dylib.1 -compatibility_version 1 -current_version 1 -shared -o liboxli.dylib.1 hashtable.o hashgraph.o hllcounter.o kmer_hash.o labelhash.o traversal.o read_aligner.o read_parsers.o subset.o kmer_filters.o assembler.o alphabets.o murmur3.o storage.o ../../third-party/zlib/adler32.lo ../../third-party/zlib/crc32.lo ../../third-party/zlib/deflate.lo ../../third-party/zlib/infback.lo ../../third-party/zlib/inffast.lo ../../third-party/zlib/inflate.lo ../../third-party/zlib/inftrees.lo ../../third-party/zlib/trees.lo ../../third-party/zlib/zutil.lo ../../third-party/zlib/compress.lo ../../third-party/zlib/uncompr.lo ../../third-party/zlib/gzclose.lo ../../third-party/zlib/gzlib.lo ../../third-party/zlib/gzread.lo ../../third-party/zlib/gzwrite.lo ../../third-party/bzip2/blocksort.o ../../third-party/bzip2/huffman.o ../../third-party/bzip2/crctable.o ../../third-party/bzip2/randtable.o ../../third-party/bzip2/compress.o ../../third-party/bzip2/decompress.o ../../third-party/bzip2/bzlib.o ../../third-party/mqf/gqf.o

Error:
Undefined symbols for architecture x86_64:

  • "std::ostream::seekp(std::fpos<__mbstate_t>)",
  • "std::string::_Rep::_M_destroy(std::allocator const&)"
  • "std::string::_Rep::_S_empty_rep_storage"
    and so on.

@standage
Copy link
Member

standage commented May 7, 2018

I've recently seen weird behavior like this on my Mac as well. My solution was to set environmental variables CC=gcc-7 and CXX=g++-7. This feels like a dirty hack and not a long term solution, but I couldn't find any other way to help Python compile and link C++ extensions correctly.

@shokrof
Copy link
Collaborator Author

shokrof commented May 7, 2018

@standage
I tried to add these environmental variables in .travis.yml in this commit. it fails because it cant find g++-7.
Error: FileNotFoundError: [Errno 2] No such file or directory: 'gcc-7': 'gcc-7'
Log.

I tried to set the variables to g++ and gcc. it fails to link the binaries.
Log

@ctb
Copy link
Member

ctb commented May 18, 2018

Hi folks, the build succeeds & all tests pass on my mac laptop.

@ctb
Copy link
Member

ctb commented May 18, 2018

whups, make cpp-demos does not work on my laptop.

@shokrof could you open a new pull request to experiment, rather than experimenting on this branch? thx. then once we've figured stuff out we can update this branch once and for all.

@ctb
Copy link
Member

ctb commented May 19, 2018

The fix in #1865 works on my Mac laptop; let's see if it works via CI and on Linux!

@ctb
Copy link
Member

ctb commented May 20, 2018

Update from #1865 - still haven't solved the problem(s), but I am now guessing that the problem is that the MQF code is being compiled as a '.c' file despite the fact that it is using C++ commands inside. This may be interfering with linking the C++ code into as a CPython extension.

@ctb
Copy link
Member

ctb commented May 21, 2018

I spent a lot of time poking around on this yesterday. Digging around and comparing MQF with CQF, I have a few observations that may be causing the problem:

  • the only difference between CQF and MQF code is that MQF code uses c++ code internally (see std:: stuff)
  • the compile time approaches and options between the two do not really vary at all
  • so I thought the problem might lie in there somewhere. Perhaps the best plan would be to either remove all the C++ code from MQF or fully upgrade it to C++ rather than C.

I've been able to reproduce the problem on my laptop now, and I also note an interesting feature: I can run make clean all cpp-demosand generate the error; but then cd third-party/mqf && make clean all; cd ../../; make cpp-demosworks!? There seems to be no variation in compile time arguments or anything for either mqf or cpp-demos between the first and second run, so it seems like an ordering problem of some sort.

Also, we don't use the shared library for linking the cpp-demos currently, so it's not clear to me why cpp-demos depends on sharedobj.

Anyway, that's it for now!

@shokrof
Copy link
Collaborator Author

shokrof commented May 22, 2018

@ctb
I removed the linking command responsible for creating the shared library liboxli.dylib(log). Travis created the static library and used it to compile the cpp examples. However, It fails to run the examples because it can't find the stdlibc++.

I searched how to include the stdlib in the static library but no luck.
I found an interesting link maybe it is the reason. Apple discourages the use of statically linked binaries on Macosx.. It also may explain the weird behavior that you mentioned.

Should we start focus on creating the shared library and use it with the cpp demos?
It is also good to mention that MQF is successfully compiled and linked in khmer binaries. QFCounttable uses mqf instead of cqf, and tests passed on Travis.

Have a nice day.

@drtamermansour drtamermansour mentioned this pull request Jun 19, 2018
5 tasks
@shokrof
Copy link
Collaborator Author

shokrof commented Jun 19, 2018

Bug is fixed in this pull request.
#1873

@shokrof shokrof closed this Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants