-
Notifications
You must be signed in to change notification settings - Fork 82
Core Meeting notes
- Will Sharg be included in SeqAn3 (https://github.com/orgs/seqan/projects/13/views/1)? Conclusion: We chose 3. Sharg will not be part of seqan3, but SeqAn3 will reference it in the documentation.
- We will disable
alphabetical sorting
for the brief description of members in doxygen. (https://github.com/seqan/sharg-parser/issues/166) - The now unusable member functions
seqan3::sam_record::alignment
andseqan3::sam_record::offset
will get a static_assert instead of the deprecation because they are technically not deprecatable because they don't work anymore.
- Discussed https://github.com/seqan/seqan3/pull/3102. Enrico will do more digging regarding what the Standard actually wants.
- Enrico will create a PR to enable advanced doxygen search on vercel. Will be ported to docs.seqan.de once IT changes settings.
- Discussed https://github.com/orgs/seqan/projects/13/views/1 Will sharg be included in seqan3?
- Discussed changes to PRs
- accepted the Announcement text suggested below.
To the SeqAn Team,
in our last core meeting we talked about the review process that tends to take a lot of time (as we have all noticed on several occasions) and right now is again stalling us a little because our team is small. We have an idea how we could improve that we'd like to discuss with all of you next Monday.
Namely, we might want to drop the first review and instead do a single review as pair-reviewing with a Core member (on a voluntary basis). We will present some pros and cons and can add entirely different ideas as they pop up. We'd like to ask you to think about what works and doesn't work for you in the current process, and how your ideal approach would look like.
- Idea: We should reduce the number of reviewers to 1 single reviewer
- knowledge transfer is not as large as we wished
- alternatively, we should engage in more in pair programming and pair reviewing.
- find better tasks to engage more people
- We discussed sharg-parser issues (--version, synopsis, filetype output)
- @SGSSGene tests single header include (can we do a single header?)
- discussed IO API stdin cashing with version checking
- API sam_file, will keep ctor as is, since other functionality than the alignment requires this
- We discussed open seqan3 and sharg PRs, we closed the ones that are stale, these can be reopened if required. We also closed some that are superseded by new PRs.
- we want that the sharg parser can handle input_validator
-
and/dev/stdin
- we want that the sharg parser can handle output_validator
-
and/dev/stdout
- we need to make the seqan3 IO files compatible with the above
- we will drop the restriction that input_validator must be a
is_regular_file
to also allow FIFO, sockets etc. - the expectations and validator will be grouped in the documentation to avoid many classes on top level
Topic: I/O Let's fix the API. We have decided that:
- we want a record class per file that has members (no tuples anymore, no selected fields)
- we want that only specific types are modifyable, namely
- sequence_files: seq alphabet, qual alphabet
- sam/bam: seq alphabet, qual alphabet, sam_tag dictionary, cigar
- we want everything that is configurable to be configured via designated initialisers of the file class itself
- we will have a shallow record and deep record for each file
- the shallow record will be light weight will views where possible
- the deep record will have containers and be constructable form the shallow record
Topic: I/O
- first, use complete seqan2 for io, 2. make everything work, 3. reduce seqan2
- copy seqan2/include to contrib folder, putting seqan2 into new namespace.
- Taking a look at io-playground/api/alignment.md
- tunnel custom formats through seqan3 reader/writer (not important)
- talking about implementation details
- refining the alignment.md file, see io-playground
- How to integrate BCF? Not supported by SeqAn2, implemented in b.i.o.
- Use SeqAn2 (into contrib)
- Get API right
- Port to SeqAn3 afterwards
- Make a list of SeqAn3/SeqAn2/b.i.o. API (what, not how) for alignment, sequence, and variant files
- Decide what we want and check if feasible
- @eseiler check includes
- @smehringer alignment file API
- @SGSSGene variant file API
- whoever: sequence file API
- Looking at the io-playground stuff
- making a few optimizations for writing records
1. Topic: I/O - Using HTS-Lib
Contra:
- Profiling wird schwieriger
- 1 Monat einarbeitung
- Hannes wirkt dem abgelehnt gegenΓΌber
- selber bauen - linken
- HTS schwerer zu erweitern
Pro:
- Wir mΓΌssen nicht "selber" schreiben
- Performance vielleicht besser als selbst geschrieben
Resolution: we decided not to use HTS-Lib as a base for IO (single formats (e.g. CRAM) might be worth to use HTS-Lib, may as an extra Repo?)
2. Topic: File-Formats
- Fasta: highest importance
- Fastq: highest importance
- EMBL: neutral
- GenBank: neutral
- Indexed Fasta: a little important
- Sam/Bam: highest importance
- Cram: (please, no!!!)
- Indexed SAM: a litlte important
- Vcf/Bcf: high importance
- BED: neutral
- GFF/GTF: neutral
3. Topic: Hannes I/O design
- We need more intel, creating benchmarks
- checking against seqan2
- checking against hannes
- will take a look closly at records (why is seqan3 so much slower than seqan2? maybe chunks?)
- Timeplan: until the end of this month
- Write a baseline benchmark, how fast could we load fasta/fastq (only id/seq) if we use no seqan3 and skip all conversion steps.
4. Topic: sharg-parser in seqan3
- What to do with sharg-parser and namespace?
- Having sharg wrapped in seqan3 leads to a lot of maintanance and confusing app code and documentation
- For app developers has everything when cloning seqan3
Resolution: Keep sharg in its own namespace, will include as submodule in seqan3
5. Topic: CI adjustments
- enrico is giving some hints on improving the script (cmakeversion, threads, compiler, --use-bottle)
- Change clang format currently depends on PRs being on the current master to get the correct
changed files
names (to only run on changed files). This will be changed by rebasing a separate branch on current master automatically and take thechanged files
diff from there before running clang format. Dependency resolved. - sharg parser: To avoid 3 config classes we will only have one config class that needs to explicitly stated in the
parser.add_[...]
call. E.g.add_option(v, sharg::config{.short_id = 'a'})
. See considerations here: https://github.com/seqan/sharg-parser/issues/76#issuecomment-1140978457
- PR on new seqan3::ranges::to fixes some other issues, which will have to be touched again in the future.
- clang-format is assume. We decided to automatic generate for every PR an additional commit which formats the PR. Pro: no dev has to learn/use clang-format themself, always newest clang-format. Con: might give conflicts while pushing. We will see how this works and change our method if it doesn't work.
- Discussing some problems about the CI with documentation deployment. (enrico suggest workaround)
- CTD format support (via TDL): For now, we will ignore the help_page specific functions like
add_list_item
,... . Subcommands will create individual export-help nodes (e.g. yara indexer --export-help CTD). The user has to call --export-help for each submodule. -
views::join_with
andviews::zip
go into the [utility] folder because of their dependency on seqan3 detail functionality. Documentation will be kept to a minimum and the standard is being referenced.
- Since we already have GCC-12 support but need the upstream bugfix in range-v3 we will concentrate on becoming independent. See https://github.com/seqan/product_backlog/issues/124
- We will improve the dna4 complement functionality and provide a piece of documentation for dna4_simdifyable (https://github.com/seqan/seqan3/issues/1970#issuecomment-1098788971)
- Although
seqan3::floating_point
is slightly different thanstd::floating_point
>= GCC10, it wasn't API stable so we will deprecate the seqan3 concept - We will keep
seqan3::arithmetic
,seqan3::implicitly_convertible_to
,seqan3::explicitly_convertible_to
,seqan3::builtin_character
,seqan3::trivially_destructible
,seqan3::trivially_copyable
,seqan3::trivial
,seqan3::standard_layout
,seqan3::weakly_assignable_from
as their counter part does not exist in the stl (their usage should be checked) - All of the above will live in
/utility/concept.hpp
no mention of exposition only. -
weakly_equality_comparable_with
,weakly_ordered_with
and their traits will be probably moved directly to the alphabet module were they are needed. If not we will discuss the new location.
- We would like the cmake stuff that is currently maintained in seqan3, the app template and external modules like the sharg parser to be maintained ONLY in the app template. Afterwards all apps and libraries can pull the changes. @eseiler will test this with the app-template and the sharg-parser and discuss the reasoning and track progress here: https://github.com/seqan/product_backlog/issues/395
- After deciding on
seqan3::concatenated_sequences
last week, we did not consider this so we will not change the template parameter ofseqan3::concatenated_sequences
until other use-cases arise. - We will keep the
seqan3::concatenated_sequences::value_type
member because we need it for the random access iterator. # - Discussed the output_file_validator constructor, see https://github.com/seqan/sharg-parser/issues/63
- Decisions on API for
seqan3::concatenated_sequences
. See https://github.com/seqan/seqan3/pull/2947
- We will not provide a
seqan3/all.hpp
header (reasoning here: https://github.com/seqan/seqan3/issues/2936) - We will instead provide a script for header replacements (see issue https://github.com/seqan/product_backlog/issues/414)
- we want to support gcc-12 and will then drop gcc-9
- We will drop c++17 https://github.com/seqan/seqan3/pull/2915
- The sharg parser will get its own std module
- The variable
bgzf_thread_count
will not be static anymore (see https://github.com/seqan/seqan3/pull/2752) - We will probably use the b.i.o. library from hannes for vcf and bcf and maybe also the other formats at some point. New contributions should go to the bio library.
- We will work on dropping gcc 7 and 8.
- We will rip out the argument parser to it's own repo but include it back to seqan3 as a submodule.
- discussed benchmark, noted only std::forward_list related are issues are slow, and will not be solved for 3.1.0
- discussed https://github.com/seqan/seqan3/pull/2868 - Resolution: Using 2. solution (as in the PR), but using
static_assert
instead of arequire
. and restricting only onchar
. - discussed https://github.com/seqan/seqan3/pull/2869 - And all related I/O will not be part of 3.1.0
- discussed https://github.com/seqan/seqan3/pull/2841 - Simon will check and report today
- discussing SeqAn 2.5 Release - We want to release. https://github.com/seqan/seqan/issues/2469
- we are sad that we have to talk about windows: https://github.com/seqan/seqan/issues/2468
- discussed how to integrate all seqan developers
- we discussed how to approach the next big goals
- disussing how to deal with modules that are libraries, that are not header only (Big discussion for sometime else)
- Marcel is not part of the core team anymore but writes his thesis. He is also taken out of 2nd review but can be selected for certain PRs that needs his expert level π§―.
- Simon checks the benchmark regressions
- Enrico will release the SDSL as-is
- discussed https://github.com/seqan/seqan3/pull/2770
- discussed https://github.com/seqan/seqan3/pull/2769
- discussed https://github.com/seqan/seqan3/pull/2752
- We talked about file positioning and seeking to records, see Resolution
- We talked about the problem with
seqan3::alphabet_size
and the restriction thatuint32_t/uint64_t
(on 32bit / 64bit) aren't alphabets, see Discussion
- We talked about interleaved_bloom_filter, technical_binning_directory, see Resolution
- Re-introduce seqan3::literals, Issue
- Issue is considered a bug.
- Documentation issues due to moving namespaces (e.g. concept is still in core, but should be in utility), see Resolution.
- We talked about the task description of
bi_fm_index_specialisation
and found an error; we updated the task, see Resolution.
- charconv header, see Resolution
- argument parser, Issue is considered a bug
- external project, see Resolution
- fm_index cursor implementation, Simon Gottlieb wants to reduce that, see Issue for progress.
- API Stability: Ranges, see Resolution
- seqan3::test::tmp_filename interface and behaviour; see Resolution
- API Stability: What is High Priority and what is Low Priority (Issue)
- API Stability: What should be stable and what not.
- Call it
seqan3::option_spec::standard
for the default case. PR - Make
pipable_config_element
detail API and rename it toconfiguration_element_base
. Issue - Checking the inter-dependencies of our modules (see issue) showed that we have a few unwanted dependency's but nothing too important. Since we have a lot of other stuff to do we will post-pone any further work on this. For unused includes we will wait until we have clang support which can check this automatically.
- We specified the tasks for CentOS 7 Resolution
- ConcurrentQueue: We talked about the proposed "Patch" and came to the conclusion that we have to look at the problem with more people.
- Views: we talked about issue and resolved some points
- phred alphabet (see issue)
- We will keep the current naming of the quality alphabet (phred42/63/94)
- We need to describe the implemented phred scores (Sanger/Solexa formats) more clearly in the documentation
- We want a phred94 alphabet that can represent the full range of Sanger encoded phred scores.
- The phred94 alphabet should be the default when reading quality scores
- We will remove detail/all.hpp headers and public all.hpp should not include detail headers. We remove them without any deprecation notice. (related issue)
- If an existing all header includes partially from deprecated headers we add a compiler warning of the seqan version increases (see related issue)
- We will keep the generic
seqan3::operator|
etc for enums but remove the public documentation for it. Instead we will try to create an interface documentation that enums can link to (related issue). - Headers should not be prefixed with their submodule if they are in the corresponding folder anyway (related PR comment).
- We will partially support older compiler versions, and provide patches were work is easy but give no guarantee (related issue).
- We talked about some
_specialisation
concept names that are NOT pureseqan3::detail::template_specialisation_of
instances and are odd-balls. Issue
- We want to change schedule:
- Team Meeting Monday at 10:00 / Core Meeting right after that;
- A second Core Meeting Wednesday at 10:00
- We discussed multiple ways to make
seqan3::get
a CPO, see resolution - Remove
is_in_alphabet
- Replace
<typename>_specialisation<type instance>
concept byseqan3::detail::template_specialisation_of<<type instance>, <typename> >
- We decided to split utility / core
- Worked a lot on IO design google-doc
- SeqAn 3.1 Roadmap planning
- we discussed the roadmap for SeqAn 3.1
- we generated a list of items which might have an influence on the public API and therefore need to be completed before the release.
- the biggest items were stable versions of external dependencies as well as a potential I/O refactorisation
- @rrahn will provide an alternative design of the I/O including discussion of some design patterns on Wednesday 2020-10-28.
- the focus of the work towards 3.1 will be on fixing the public interfaces and not necessarily the implementations behind it, which can be added later through new releases.
-
CDash 3.0.0
- We will upgrade the CDASH to te newest version
- Check with the IT how to proceed
-
Compiler Workarounds
- We will keep the compiler workarounds and do not limit them to a specific compiler version
- Before the release, the workarounds are checked again with the new compiler versions and if they are ok we can remove them for the newer compiler versions
- Reason: users can use SeqAn with newer compilers even if it does not fix the workaround.
-
C++-20 bits header
- The STL has added bit support for C++20
- We adapt our functions to the ones of the standard
- Some regressions need to be fixed upstream with gcc
- member types of our configs The ctors have strong types for expressiveness but the member variables of the config (if not a std::variant) can have the normal types since the naming of the member variable already avoid ambiguities.
-
output_validator
The output validator should remain default constructible. The default for the
seqan3::file_output_open_mode
therefore was set tocreate_new
(warn if file already exists). -
Node access of the fm_index cursor
The cursor will get a member function
suffix_array_interval()
which returns a structseqan3::suffix_array_interval
. The members are still a half open intervalbegin_position
andend_position
. @eseiler will tests if this is possible and will always lead to valid intervals.
-
Names for file open options
- We agreed on a naming. (See PR)
- Maybe we can remove the template from the validator -> options can be part of the validator.
-
Check if option was set
- Do it as an enum (is_set_by_user, is_set_by_cmd, default). Check if implicit conversion to from
int
tobool
works forenum class
. -
parser.option_is_set
or ->parser.is_option_set
<-?
- Do it as an enum (is_set_by_user, is_set_by_cmd, default). Check if implicit conversion to from
-
Testing deprecated code
- Unit test: pragma push, ignore, pop - both hpp and cpp.
- Header test should complain.
-
Hierarchical caching
- Yes, please.
-
Node access
- Either we return a node or the three things individually.
- We will do individual getters since the cursor is actually an abstraction of the node.
- Public getters. Maybe call
depth
pattern_length
.
-
Nightlies for workarounds
- Before release: Keep workarounds for last two minor versions.
- But only officially support last minor version.
- Rather test library for major distributions.
-
State of MSA
- Svenja presented the current state of the MSA implementation.
- Return type should be generic ("function returns auto and has begin/end") - return
alignment_result_type
. - We did not come to a conclusion on whether we should rewrite the functions in SeqAn3 with SeqAn3 or SeqAn2 data structures.
- E.g. neighbour joining: SeqAn3-ise the SeqAn2 code or already adapt interface, data structures, etc...
- Main discussion point: Do interface changes / data structures already in the beginning or in the end?
-
Waterman-Eggert
- Can the algorithm go into the kernel?
- But it might just be a special traceback, in this case it should be possible.
- We will ask @rrahn.
-
FM index cursor
- Add getter.
- Do not add ordering.
- We need a new version of an XML of cppreference.com for our Doxygen. (Card)
- We decided to use GithubCI and @eseiler does a PR against
release-3.0.2
branch. (Card) - Rename the
scoring_scheme
concept toscoring_scheme_for
. (Card) - We want to integrate the Waterman-Eggert algorithm, but we haven't settled on a design, yet. (Card)
-
static code analysis
- Try it out.
- Don't invest much time.
-
search outparam
- evaluate proposals
- after release
-
sequence generator
- @rrahn and @marehr will decide
-
release
- do a release branch
-
view persist
- Design issue: This is not a standard conform view any more since it does not guarantee constant time destruction.
- Put it first into detail and declare public API deprecated.
- Make a changelog entry that this is deprecated and will be removed.
- Make deep const-ness in begin and end.
- On the long-term remove it completely. No one should use it any more.
-
file validator compression extension
- Generate a list of file extensions from the file formats that include the compression formats.
- Add a utility function that combines the compression formats with the file format extensions.
- Pass in the list of extensions to the file validator.
- For bam format we can explicitly omit the combination with other compression formats.
- We will start the MSA Project by trying to implement the interface and having seqan2 in the back doing all the work.
- We will remove
method_global
fromseqan3::align_cfg::edit
in https://github.com/seqan/seqan3/pull/1938 and rename it toseqan3::align_cfg::edit_scheme
(Card). - For now we will not introduce shortcuts for the free end gaps configuration (PR, Card)
-
Card free end gaps:
- Naming works as intended, https://godbolt.org/z/kMn_z_
-
Card LLVM charconv:
- Check licence compatibility Apache 2 and BSD 3
-
Card dna4 conversion:
- Feedback Sara: dna4 is good, dna5 not sure, consistency is probably better
- We had an open discussion about
to_rev_cursor()
(Former Resolution) where we wanted to wait for (external) input. We decided to removeto_fwd_cursor
andto_rev_cursor
(Card, updated Resolution). -
seqan3::align_cfg::parallel
(and search too) should warn (= exception) a user if nothread_count
was set before alignment computation when the configuration entries are validated (See Resolution) - Name
interleaved_bloom_filter.agent
->membership_agent
Resolution. - Rename
seqan3::views::trim
to::trim_quality
. Card
-
Technical binning directory
- Design issues: uncompressed vs compressed construction, how to handle template parameters and type deduction
- Possible alternatives: creator pattern -> compressed is constructed via static method of uncompressed, primary classes -> two class instances for uncompressed and compressed that reuse the same base class (detailed resolution)
- Actual design decision postponed (waiting for concrete use cases), putting it first in detail
-
Disabling ADL
- Some additional proposal (see comment)
- calling cbegin/cend within const qualified begin/end (card)
- constexpr declaration of views and view iterator (card)
- C++20 iterator traits obsolete: should be considered as legacy (card)
- C++20 iterator only equal comparison operator: wrap c++17 compatible operators in macro (card)
- Do not implement cbegin/cend any more (card)
- Use non-type template parameter in iterator definition for iterator and const-iterator, e.g.
basic_iterator<bool>
(card) - Use
base
member function to get the underlying base view, respectively base iterator if applicable (card)
-
- static thread_local buffer inside of function has 5-10% performance regression
- static thread_local buffer has weird semantics if we use single thread and multiple IBF instances
- possible alternative: pass second buffer to the bulk_contains function. (difficult in the usage)
- possible alternative: require the bulk_contains feature from the IBF and get a lightweight proxy (a search agent), which offers an efficient implementation of bulk_contains.
-
- call everything qualified
- calling detail functions qualified or make them function objects or named lambdas. Exceptions would be cases where we need different overloads of the same function (align_pairwise, search, ...).
- make public APIs either CPOs or function objects.
- prepare snippets for demonstrating the problem and the various solutions for discussion with the team.
- how to add documentation.
- how would compiler errors look like (function object, named lambda, qualified free function call)
- resolution
- call everything qualified
- Considering issue (https://github.com/seqan/seqan3/pull/1892), we will ask some people, if they don't need the
my_bi_fm_index_cursor.to_rev_cursor()
or see an immediate use-case, otherwise throw it from the API. (we will ask Sven and Simon Gene) - We want a new data structure
counting_vector
that get efficiently add aseqan3::binning_bitvector
by increasing each bin counter depending on the mask inseqan3::binning_bitvector
(see https://github.com/seqan/product_backlog/issues/77). We also need to document that accessing theseqan3::binning_bitvector
is currently inefficient, and feature requests are welcome if there is an actual use case. - The documentation of the concurrent access on the function
seqan3::interleaved_bloom_filter::bulk_contains
must be fixed (https://github.com/seqan/product_backlog/issues/78)
Make it private in the mean-time and add a new issue.
-
Issue IUPAC to dna4/5 behaviour:
- dna4 behaviour is fine as is and we will offer a cookbook entry for using a custom alphabet that converts to
A
by default. - Ask people what they would expect when converting an IUPAC character to dna5 (same behaviour as dna4 or convert to
N
?).
- dna4 behaviour is fine as is and we will offer a cookbook entry for using a custom alphabet that converts to
-
Issue Hidden friends in configuration objects:
- Free functions as function object: https://godbolt.org/z/LyacmH
- Function objects in detail namespace,
inline constexpr
instantiation in public; either CPO or function object depending on use case (get() -> CPO, search() -> function object). - Operators have the same problem, but an occurrence is very unlikely, thus we won't change them.
- Function objects in detail namespace,
- Or call everything qualified everywhere.
- Ask Hannes for his opinion / excerpt from his thesis.
- Free functions as function object: https://godbolt.org/z/LyacmH
-
Card Hidden friends in configuration objects:
-
cfg.get_or(alternative)
will be a member function (hidden from user) that checks whether there is a config element incfg
that has the same type as an alternative. -
remove
,exists/has_element
will become hidden friends (hidden from user). - If
exists/has_element
will become public it should become a unary type trait or inline bool constant instead of a static member function.
-
-
Card Alignment code on pairs:
- We want this, the code should be valid.
- We will explore possible solutions on how to achieve this.
-
Issue
pipeable_config_element
without value:- Add constructors. Follow-Up Issue
-
Card Iterator and sentinel naming:
- We would like
iterator
andsentinel
, with some base classbasic_{iterator,sentinel}<const / non-const>
.
- We would like
-
Card Make
empty_type
public:- It will stay private.
- Change return type of functions such as
query_id
toauto
and remark that calling it without proper config is UB. - Change
throw
tostatic_assert
. - TODO Follow-Up Issue TODO
- The alignment score type bug report is rather a user fail. We will therefore restrict the score type to signed integrals to avoid this in the future. We will add a static assert.
- The dna4 conversion bug report is not a bug but a documentation fail. We do need to update the documentation (ticket)! And we need to think about our dna4 conversion-default which is currently different than the dna5 conversion considering IUPAC characters (@smehringer). Either way we should provide a cookbook example of how to provide a different conversion-table-default (the easier conversion should be part of the cookbook). Card
- on_hit executor is accepted as proposed in https://github.com/seqan/seqan3/issues/345 with a
bulk_execute
.
-
Card PPC Debug benchmark timeout:
- We'll look into it.
-
Card SIMD nightly tests:
- Currently not, depends on proper infrastructure for nightlies.
- May want to use the SeqAn2 infrastructure.
-
Card Build matrix for nightlies:
- Depends on proper infrastructure for nightlies.
- Build gcc and doxygen from source and run tests.
- Best would be to build everything ... (sanitizer, compiler, release/debug, archs (PPC, ARM, LE/BE, X86, ...))
-
Card Movable search_result_range:
- Remove the iterator and adjust the test to first iterate for a few elements and then move.
-
Card Technical binning directory:
- Discussed. See Issue and Google Doc
-
Issue:
- We agreed on
sequence
,aligned_sequence
,writable_aligned_sequence
(as public concepts) - We agreed to implement a private
detail::pairwise_alignment
(uses thepair_like
concept) concept.
- We agreed on
-
Issue
debug_stream
was green lit to move to the next stage: prototype a solution for the described issues in a team of @marehr, core-team and non-core-team member; -
Card Should every public type be
debug-streamable
? No. Not a blocker for PRs unless it is required. Can be added if needed or requested on demand
-
Card: technical binning directory
- @eseiler will evaluate next week how the interface should look like: What is the concept behind that? What is the use case behind that? How is the big picture behind that.
- Then we can plan the work for the fifth iteration.
-
Card: sdsl index construction
- design macro-benchmark infrastructure for SeqAn3 - generate a ticket for this.
- make makro-benchmark for alignment and search.
- evaluate rank- and select-support structure for SeqAn and SeqAn3/SDSL.
- Can super-blocks solve the performance problem?
- EPR in SeqAn3/SDSL zu SeqAn (2-3x langsamer als SeqAn)
- Support Sven in evaluating the seqan2 index structures
- We only need new index construction mechanism if we want to support constructions of collections so we put it back until we have more results about the search performance.
-
Card:
make check
for the application template- Not all tests should be build but only a small subset to check general conformance and functionality.
- We don't need it right now -> put it back into backlog
- The application template is quite ready. @marehr has one more issue he want's to tackle and makes a ticket for this
- The app template can then be released in the first version and put into the SeqAn GitHub organisation.
-
Config discussion (see https://github.com/seqan/seqan3/issues/1580):
- The members of the selectors will be called:
selection
but it won't be publicly exposed but you can assign to the selector directly. - The name of class types for tags will be suffixed with
_tag
(e.g.seqan3::align_cfg::vectorise
->class seqan3::align_cfg::vectorise_tag{};
)name required/optional values? Default values of configs? - some public member names, see issue.
- selectors: can be default initialized (variant is valueless) and can be set later; The algorithm handles the valueless state by either throwing an exception (required configs) or by setting a meaningful default (optional configs).
- The members of the selectors will be called:
-
std::
features that already exists, but have a different meaning in c++20 will get a cpp20 namespace, e.g.std::cpp20::back_inserter
(Card). -
Rename
seqan3::dimension_v
will be renamed toseqan3::range_dimension_v
(Card). -
We will remove the
seqan3::detail::random_access_iterator
because of the name clash with a cpp20 concept. Data structures that use it should use theseqan3::detail::random_access_iterator_base
directly (Card).
- Decision on Wednesday. Think about possible naming and other open points. (Card)
- Work on coverage during next iteration, prototype/SPIKE. (Card)
- Track issues for supporting c++20 (gcc 10), make them available to be worked on.
- Add as good first issue and/or needs refinement in backlog. (Card)
- Put
seqan3/std
headers with the std headers in include list. Sort after suffix (without prefix seqan3/std). (Card) - We want to implement the behaviour. Next iteration. (Card)
- We want to implement the behaviour. Sometime later. @smehringer will present possible solutions. (Card)
- Since the reminders only work in slack we will not do it on an organisational scope. We can do it individually if we want or wait until gitter support is there. (Card)
- Where to place test binaries in the app-template? See resolution https://github.com/seqan/product_backlog/issues/44#issuecomment-613338666 (Card)
-
Card: views associated types
- Remove assoiated types from views but not from iterators or containers.
-
Card: algorithm result factory search aligment
- vote: We use a policy based algorithm implementation and have a policy that creates the alignment result and is friend of it
- vote: Use explicit members for the construction of the result object.
- vote: needs to have one template type for every stored argument of the result class.
- pass in callback to the alignment algorithm.
-
Card constexpr
- Use it whenever possible
- Add tests for constexpr expressions; If only available in future C++ standards document and test accordingly
-
Card
- Pass in a flag to indicate how to open a file for write: Overwrite, append, new
- Show the requirements for open a file in the help page message.
-
Card: safe range to borrowed range
- This is a good first issue
- Depends on some PR of the ranges library
-
Card C++-20 standard conformance
- Support for C++-20 for clang and gcc-10. Possibly in Q3. Evaluate after planning the current release.
-
Card: deprecate ranges and iterator shortcuts value, reference etc.
- move original shortcuts to detail and make facades for the detail stuff which then will be depreacted.
-
Card: span view
- use cv-unqualified version of
enable_view
for classes that are views. - span with static
size > 0
is not default constructible since it must ensure a size of 1 but can't own the memory for it. Onlyspan<t, 0>
or span<t, dynamic_extend>
are views, since semi-regularity is required for views which requires default constructibility.
- use cv-unqualified version of
-
Second round on
noexcept
. See issue https://github.com/seqan/product_backlog/issues/45 for a resolution -
If we want fancy testing behaviour, we will introduce our own macros. As a start
EXPECT_RANGE_EQ
will be very convenient. -
seqan3::search_result
Card:-
Names ( for the rest we plan this if we have an actual use case)
query_id()
,reference_id()
,reference_begin_position()
,index_cursor()
. -
Configuration
- Configuration with_text_positions:
query_id()
,reference_id()
,reference_begin_id()
- Configuration with_cursor:
query_id()
,index_cursor()
- Configuration with_text_positions:
- The
seqan3::search_result
will be built with the Builder/Factory Pattern and theseqan3::alignment_result
will be changed accordingly
-
Names ( for the rest we plan this if we have an actual use case)
-
noexcept
(provisionally):- Use
noexcept
whereever you can guarantee that the function never throws - Use
noexcept
if you cannot accept that the function throws (e.g. internal member function where you have a clear expectation). - If you cannot make assumptions because of a generic member (e.g. views) do not make function
noexcept
. - Try real hard to make the move constructor
noexcept
- Use
- We will do another macro benchmark and then unify the interface to only work on collections. Card, Issue
- We annotate serialisable types with
seqan3::cerealisable
. Card - We talked about some alignment pitfalls and internal cleanup:
viewable_range
's should be propagated fromalign_pairwise
down, don't re-instantiate align_result_selector in the algorithm (pass that type down) - phred42 should not have the same valid range as phred63 Card
- We decided to add CIGAR to default alignment record fields, so that reading and immediately writing out the same file works. Card
- ALIGNMENT will not be filled if the reference sequences are not set at compile time, but default constructed
- CIGAR will not be filled for BLAST, but default constructed
- We introduce a new macro
SEQAN3_STD_RANGES_NAMESPACE
as shown in the PR to reduce code. Card - Move c++20 things into the
std
namespace (e.g. moveseqan3::remove_cvref
tostd::remove_cvref
) Card - The independence of the argument parser is now blocked by 3 major things,
seqan3::debug_stream
(e.g.seqan3::detail::to_string
),File::file_extensions
(e.g.seqan3::detail::set_format
) and version checker. Card
-
Issue
The
/seqan3/core/debug_stream.hpp
shall include everything that is needed, s.t. every seqan3 type is printable. This means that thealigned_sequence_concept
must be included.
-
Card:
- make a
core/math.hpp
header and putseqan3::pow
there and document and test the functionality - make it type agnostic and use integer optimized power.
- make a
-
Card:
- Make cigar field default for the alignment file in/out
- Make alignment field optional
- Remove the dummy_sequence and dummy sequence alignment from alignment files
- Note: conversion of cigar to alignment should only be offered as interface of the alignment file
-
Card:
- always use
module/submodule/all.hpp
in tutorials - always use the specific header of some functionality within API snippets (don't use all.hpp here)
- Write tutorial page about why using all.hpp in tutorials (see card link above)
- Write tutorial about how library structure looks like (for user and contributor)
- always use
- IBF naming solved by qualified call. Card
- We double checked the automatic assignment. Card
- We decided to make
add_enum_bitwise_operators
's documentation private (\\!cond DEV
) and document related enums (\copydoc
?,\relates
?, TBD). Card Issue - Check performance (maybe an
if (text.size==1)
is better?). Make aware of rev/fwd index. Fix #1542.to_rev_iterator
(bi_fm) (indices of sequences in reverse order). Card - k-mer index, get first working version. In the end it should be templatised over the hash function object (how? Should probably type erase, because the rng_t is not interesting (only need semialphabet reference type)), and k's should be run time.
-
IBF: We will wrap the return value of
bulk_contains
and expose+document the functionality we need! -
We try to not expose API of dependencies, e.g. return an SDSL type.
-
We cleaned the webhooks. Card
-
SDSL
- Collections without overhead. Card
- Implicit sentinels (see branch)
- External SACAs? (SDSL does not handle collections)
- Conversion from dna_vector to sdsl::int_vector<8>
- int_vector
- Clean up int_vector test. Card
- Benchmark int_vector
- Do "non-overlapping" int_vector -> get EPR running
- BiFM Index
- What about the sampling of the 2nd index? Card
- Fix docs, e.g. Card
- Renaming
-
begin()
->cursor()
Card
-
- Optimize
path_label
for collections. Card - Check if
backward_search
really needs to be templatized. Card - Local buffers Card will be partially implemented by [#1445] (https://github.com/seqan/seqan3/pull/1445); we still need to add a buffer to the search.
- Rework configuration documentation. Card
- Make sure there are snippets. Card
- Move
search()
. Card - Remove concepts. Card
- ALways handle texts as collections. Card
- Collections without overhead. Card
- Guthub permissions:
- Make a Team group that contains every seqan member that does first reviews with write access (is needed to be able to write to the wiki pages).
- Enable automatic review assignment via Github for all members of the seqan team (without the second reviewers) if
seqan/team
is chosen for a review assignment. Assignment is done via the round-robin-principle. - Enable automatic review assignment via Github for
seqan/core
which assigns either rene and smheringer in case someone assigns core. - PRs now require 2 approvals from people with write access (
seqan/team
) before they can be safely merged.
- Design decisions for the seqan3::interleaved_bloom_filter were done by making it similar to the
std::unordered_multimap
.- The IBF is constructed with
seqan3::interleaved_bloom_filter{seqan3::bin_count, seqan3::bin_size, seqan3::hash_number_count}
. We decided forcount
(instead of number or size) for the amount of bins because the std::unordered_multiset calls itbucket_count
. We still call itbin
instead ofbucket
because we want to call the abstract solutionbinning_directory
and notbucketing_directory
. - A value (e.g. kmer or minimizer) will be inserted into the IBF with
ibf.emplace(value, seqan3::bin_index{0})
, analogously to the std::unordered_multiset.insert
will not be provided since we don't need it. - To retrieve the bitvector which is set to
1
if a bin contains the (kmer-)value or0
if not, we will usesdsl::bit_vector result = ibf.bulk_contains(value)
. This was chosen because the std::unordered_multiset usescontains
to indicate whether a key is contained in any of the buckets(/bins) but we directly output a bulk result with a boolean (bit) for every bin. - we will provide the getter:
ibf.bin_count
which returns the number of bins andibf.size()
which returns thebin_size
.
- The IBF is constructed with
-
GitHub permissions (Card)
- Make a group contributors and give it triage permissions
- Make a core group and give it maintain permissions
- @rrahn and @smehringer have additional admin permissions
-
Web page name for seqan docu (Card)
- Use docs.seqan.de/seqan as base but with redirect to index.html of top level
- Rename master and develop to 2-master and 2-develop
-
Leave Changelog structure empty (Card)
-
We will keep top-level CMakeLists.txt (Card)
-
Concept shims (Card)
- We don't want shims for instantiating concepts
- If we write a concept we need to test the concept with a separate tested interface/shim class.
- A concept should always get the template parameters it depends on from outside (e.g. output_range) and not define fixed types within the concept to test that it works for specific types.
- Avoid usage of
char * const
for string literals, if we want to support them usestring_view
(Card)- Use
string_view
for single query - If possible use
string_view
for multiple queries, if not throw away
- Use
-
fin{name, fields<field::ID>{}};
use new namefin{name, select_fields<field::ID>};
(Card) - We continued to discussed to reduce the hierarchy of certain configurations by making some of the arguments configurations. (Card)
- Deprecated uppercase seqan3::field names (IO) should be aliased to their lower case buddies. (Card)
- Tables in the documentation may span longer than 180 characters
- Memory mapping? Nice to have, plan for sprint.
- We discussed to reduce the hierarchy of certain configurations by making some of the arguments configurations. (Card)
- name it extract_half instead of extract_halve.
- provide cpack files: split documentation package from library package.
- release planning. create checklist
-
Concepts should be based on use case and not type requirements:
-
size()
is a feature ofrandom_access_container
. -
capacity()
is a feature of thecontiguous_container
. -
std::array
andstd::forward_list
are not modelling any container concept because they do not offerpush_back()
andinsert()
(both must be available). - We decided to only have a generic
back_insertable_with<C, T>
concept that checks ifc.push_back(t)
with aC c
and aT t
is valid. - We introduce concepts that model the std containers -
std_vector_like
,std_list_like
,std_deque_like
- to check our own types. They model the interfaces of the standard library. - Types can offer more interfaces in case it makes sense.
-
-
Rename
views::all
toviews::type_reduce
. -
We split the core module into core module and utility module: (Card)
- core is everything strongly coupled with our library, required by > 1 module, and is not stand-alone.
- utility contains features not related to biological use cases and could potentially live as stand-alone libraries outside of SeqAn.
- Here is a gist with an overview of the current state of core and the possible split.
- Naming argument_parser: SEQAN_INCLUDE_DIR -> SEQAN3_TEST_LICENSE_DIR Card
- What to do with get_display_name and typeid? Card
- get_display_name is (nearly) only used at runtime. We decided to favour the runtime version that uses typeid.
- is_in_alphabet_type uses it at compile time. char_predicate_base should make message generation at runtime.
- Open design decisions: Card
- arg_parser separate or not: yes we want that
- core VS utility: we create a list and decide the renaming next week.
- anything else?: We make a sprint for the search interface and alignment interface changes.
- Due to shortage of staff we canceled the meeting.
- We postponed the alignment interface discussion until @marehr is available. @rrahn briefly elaborated on the problem. Still gather use cases. Card
- We discussed Card (PR guidelines) during the retrospective.
- We want to change the error message in the alphabet after the include graph PR. Use
typeid
? Card. - Fix
char_type
tochar
, i.e. removechar_type
fromstream_type
. Streams for types besideschar
are not supported anyway. Card - We will make files default constructible in order to model a view. Card
- We will discuss the file column interface next time, @smehringer will have a look at the required changes.
- We further discussed Card.
- Enums shall be lower case. Card
- Alphabets can have another
char_type
thanchar
. In I/O, we don't enforce that thechar_type
of the alphabet is the same/convertible tochar
. We will statically assert for implicit convertibility and same size ofchar_type
of the alphabets of the ranges andchar
. Card - Instead of
async_output_buffer
we can usestd::basic_osync_stream
.
- We postponed the alignment interface discussion until @rrahn is available. Gather use cases to discuss. Card
- We will get rid of
using namespace
in tests. Card - New PRs are not allowed
using namespace
anymore, for old PRs follow the style of the files. Old files will be refactored consecutively. - We postpone Card until the committee decided about concepts of incomplete types and we thought about better names.
No meeting
- include what you use! Even if headers are included recursively, include what you use explicitly. In the tutorials we will enforce always includes headers for expressiveness, in the library we only enforce that the header tests pass.
- test accessor will be befriended below the member of interest. (see pr #1304 for an example)
- current: one pair for the end position of both sequences (inclusive) and one pair for front (inclusive). We want: one slice struct with four members (
aligned_region.first_begin_position
, ...) - std::string -> std::vector via
views::to_char<cigar>
,views::char_to<cigar>
(4 vs 1 votes) (Card) - seqan3::dynamic_bitset operator==: Use public interface (size() and raw_data()) and befriend all bitsets of different size
We discussed the todos before the 3.1 release:
- clean all interfaces and modules, including all.hpp and folder structures
- check all high-level API functions? Which can be removed.
- folder structure is fixed for 3.1 and should not be changed afterwards.
In general when a new module functionality is added and it is not sure if it is stable yet mark it experimental. Within the next minor release it should be made stable. On https://github.com/seqan/seqan3/projects/23#column-6871080 we gathered all the stories/tasks that need to be done before 3.1 release. Ad fixed date is not known, but we should strive for a point in spring 2020 so that we have 3.1 for the paper ready.
-
[[nodiscard]]
: We add it to all functions that are guaranteed not to have side effects and return non-void:- member functions that are const-qualified and have no parameters or only const-parameters
- free functions that only have const-qualified parameters
- functions with forwarding references are explicitly not part of this rule.
- rule for formatting:
- if everything fits in one line, put it in one line
- if not, put all special keywords (
static
,constexpr
,friend
,inline
and all attributes) in a single extra line -
[[nodiscard]]
implies we need to use trailingrequires
syntax. That's ok.
template <typename t>
[[nodiscard]] friend constexpr
bool operator+(my_long_type_my_long_type_my_long_type_my_long_type_my_long_type_my_long_type_ const a,
my_long_type_my_long_type_my_long_type_my_long_type_my_long_type_my_long_type_ const b)
-
temp filename path voodoo: Enrico and Marcel prepare examples of the different designs and post them to gitter
-
Either there is some consensus on the way forward this week, or this will be reevaluated when we do more refactoring at a later sprint.
-
Plan for next monday: Svenja and Rene figure out the plan and present it next Monday at 10am.
-
Final decision: next week All functions that take only const parameters / const qualified (do not change any state) and return something shall be declared
nodiscard
. All functions that may leak memory if the return value is ignored shall be declarednodiscard
. Card - We will introduce
views::move
. Note:std::ranges::move
algorithm also exists. Card - Argument parser CPO will be called
named_enumeration
. Card - enums shall be lower case. Card
- We decided that module specific views / ranges / decorators should be placed in their module. E.g.
alphabet/views/
contains all alphabet related views,alphabet/decorator
,alphabet/container
. Card
- We import ranges::v3::common_pair and implement our own version later Card
- We out-of-class define nested classes if they take more than one screen. (Most nested
some_range::iterator
must be defined out-of-class with this rule) Card - We will not introduce a new
SEQAN3_CONSTEXPR_ASSERT
macro to assert in constexpr context and do not introduce an assert macro that uses exceptions internally to provide stack traces. We can reevaluate a pretty assert if someone has time to do this. Card - Some decision on argument parser. Card, Card
-
Re-discussion about begin_ra of the gap-decorator. We will add random-access-functionality to the normal begin()/end() iterators in order to provide the most efficient interface to the user. The iterator tag will still be bidirectional. Begin_ra and end_ra are goind to be made private (private section in doxygen).
-
Do we want implicit conversion for alphabet_tuples? No but we do want extra member functions.
-
If we HAVE use a private member variable in a test, what do we do? We will make an accessor struct that will be befriended by the respective class
-
Module Responsibility:
- Alphabet: smehringer, eseiler
- Alignment: marehr, rrahn
- Argument Parser: smehringer, rrahn
- I/O: smehringer, rrahn
- Range: rrahn, marehr
- Search: eseiler, rrahn
- Core/std: all
- Infrastructure: marehr
-
Update library to new standards (before updating to the new range-v3 release):
- renaming of concepts
- move all of ranges c++20 namespace into std namespace with using inside an anonymous subnamespace
- change our subnamespace also to
views
and our directory too. The directory should be the same as the namespace.
-
The design fo the Binning directory is kept as is: The binning directory will not hash anything (such that the risk of inconsistent hash functions from filling and querying the binning directory is given to the developer. The BD will only perform setting and loop up of hashed values). The BD will not be able to be used with the search interface. But the dream index will wrap binning directories and then can be used with the search interface.
- We evaluated the current team structure:
core | application |
---|---|
@h-2 (2019) | @jwinkler (2022) |
@rrahn (2023) | @svnbgnk (2022) |
@marehr (2021) | @MitraDarja (2023) |
@smehringer (2023) | |
@eseiler (2022) |
-
We discussed the responsibilities for a membership in the core-team, the following list summarises the responsibilities
-
Responsibilities of core-team members:
- Must have a general overview over the SeqAn library
- Per module we have at least 2 core members
- Each core member has knowledge about at least 2 modules
- Must have knowledge about the library design
- Has a general responsibility/awareness for the team's procedures and communication, e.g. announces absences early
- Must be acquainted with the release procedure
- Strategy meeting
- Reviews discussion items before the meeting
- Properly described cards for things to discuss, e.g. links to background information, PRs, issues etc.
- Items for the next meeting should be added 2 workdays before the meeting the latest
- For discussion more than 50% of the core-team member must be present
- By marking a card with their name (@user) a member requests their presence for the discussion
- Votes for decisions are documented on the card
- Discussion can be postponed if
- the card is marked with a name and the respective member cannot take part in the discussion
- the vote has not reached a clear consensus
- it was interrupted since no agreement could be found in time (any core member can interrupt the discussion)
-
CPOs
- Short overview about customisation point objects and how they are implemented
- No consensus found on how to deal with overloads for third libraries using
seqan3::custom
infrastructure - The open question is whether the custom objects should be in the most granular form, i.e. for every customisation point or in a higher level grouping where the granularity is not defined, e.g. custom_alphabet for alphabet custom overloads?
- next week we will have larger meta-discussion about future roles and responsibilities in the core team
- Changelog is ordered by release (#) β Feature vs API vs bug (##) β module (####) β bullet-point list
- Guidelines for testing:
- enforce style guide
- give more weight to "element-of-least-surprise" than "don't-repeat-yourself"
- criterium should be understandability of the test
- care less about π then in library code
- refactorisation/clean-up/unification of tests can happen incrementally (not part of API)
- rename
is_collection
tolayout
in interface offm_index
- add alphabet type back to the fm_index template parameters
- factor out interface of containers into a CRTP mixin so we can save writing 10 inserts and member swap...
- our containers don't get range constructors
- release plan 3.0.1: feature-based release based on vectorisation and large scale interface fixes
- Questions from Chris: add alphabet back to fm_index template parameters? @eseiler will discuss details with Chris. We might change it back.
- Questions from Chris: EPR currently only supports alphabet sizes that fit perfectly into 64. This is a problem. We need a different bitvector.
- Hannes presents design for
view::async_buffer
, use concurrent_queue
- We decided that we do not need a bulk
insert_gap
for thegap_decorator
, but instead we make thegap_decorator
use a hint when inserting (resulting in constant time inserts). Card - We will try to fix the code coverage builds. Card, Issue
- We will keep the name
binning_directory
. Card - We postponed a decision about front and end coordinates in the alignment until we assessed whether this has any influence on performance. Card
- We decided that we want to exclude code lines from code coverage that are unhittable. Card
- google tests and other google libraries live the ABSEIL philosophy that means
master
is always stable and should be used. We try to regularly update after releases the google version. The next version update is planned to after 3.0.1 release. Card, Issue - We decided that we allow detail test, but that we exclude those from CODE_COVERAGE to see that we cover only public-API. Card, Issue
- We don't enforce to align
defaulted/deleted
in the comments on constructors. Card, Card, Card - We put a magic header on c++ files that don't have a c++ extension, s.t. they get automated highlighted in the editor. Card, Issue
- Let's wait for c++20 until we decide what needs to be done exactly for the 3.1.0 release
- Const iterability of our views will still be supported even if the concept is broken itself.
- Executors will be modelled as config elements that you can pass than alignment/search config element.
- Alignment result and Search result should always return a tuple of which sequence was aligned against which other / which sequences was found in which database sequence.
- input:
collection
-> n x n - input:
collection, collection
-> n x m - input:
collection<pair>
-> 1x1
- input:
- No tests for
::detail
anymore:- All detail code is there for some non-detail code
- Full code coverage should be reached with public interfaces (This will also detect dead code paths)
- Complex public interfaces need to be tested by small unit tests
- Exceptions:
- Detail functions that will most likely become public
- Stand alone-components that are used in multiple contexts (likely stuff from core)
- CMake version:
- minimum version for config is 3.4
- minimum version for all test is 3.7.
- we should add an extra test that only tests the config
- std::hash overloads for ranges should not be linear, see https://github.com/seqan/seqan3/issues/1122
- updated style-guide in regard to newlines in function bodies.