Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition and ExtendableSquareMatrix data structures #399

Draft
wants to merge 34 commits into
base: devel
Choose a base branch
from

Conversation

kocotom
Copy link
Collaborator

@kocotom kocotom commented Mar 17, 2024

I have implemented two data structures:

  1. Partition, an efficient representation of set partition with the ability to
    • matching carrier set elements with corresponding blocks in constant time
    • split blocks
    • easily access former blocks which had been split before and iterate through them efficiently
  2. ExtendableSquareMatrix, an abstract representation of binary relation over a set, a matrix of counters etc. with the ability to
    • assign a value to the matrix cell
    • read a value from the matrix cell
    • extend the n x n matrix to the (n+1) x (n+1) matrix (if n < capacity)
    • implement the exact inner representation of the matrix in various ways and choose the set/get/extend strategies depending on the context

I have also implemented three concrete representations of the ExtendableSquareMatrix which inherit from that structure and implement the virtual methods get, set and extend:

  1. CascadeSquareMatrix (I'm not sure whether the name is appropriate)
    • the matrix is represented as a single 1D vector which simulates some kind of "cascade" traversal of the matrix in respect to the way the matrix is iteratively extended
    • good for relations/matrices of counters over the sets which are not huge
  2. DynamicSquareMatrix
    • the matrix is represented as a vector of vectors
  3. HashedSquareMatrix
    • the matrix is represented as an unordered hashmap

These data structures are implemented in the file partition-relation-pair.hh. Soon there will be a data structure PartitionRelationPair which use combination of Partition and ExtendableSquareMatrix but it is not fully done yet so I have not included it.

I have also added a lot of tests which work with these data structures.

The code is massively commented to ease understanding.

@kocotom kocotom requested a review from kilohsakul March 17, 2024 05:37
Copy link

codecov bot commented Mar 17, 2024

Codecov Report

Attention: Patch coverage is 98.13433% with 5 lines in your changes missing coverage. Please review.

Project coverage is 73.90%. Comparing base (278a599) to head (9067a1f).
Report is 17 commits behind head on devel.

Current head 9067a1f differs from pull request most recent head f37352b

Please upload reports for the commit f37352b to get more accurate results.

Files Patch % Lines
include/mata/utils/extendable-square-matrix.hh 96.80% 0 Missing and 3 partials ⚠️
src/partition.cc 98.78% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #399      +/-   ##
==========================================
+ Coverage   72.14%   73.90%   +1.76%     
==========================================
  Files          30       33       +3     
  Lines        3712     3982     +270     
  Branches      847      887      +40     
==========================================
+ Hits         2678     2943     +265     
  Misses        738      738              
- Partials      296      301       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Adda0
Copy link
Collaborator

Adda0 commented Mar 18, 2024

Thank you for the PR. I will be reviewing the PR throughout the week. Is that good enough for your needs?

@kocotom
Copy link
Collaborator Author

kocotom commented Mar 19, 2024

@Adda0 Thanks for your reaction, yes, that is ok!

I have just made few little changes:

  • functions isReflexive, isAntisymetric, isTransitive are now methods of the ExtendableSquareMatrix structure, they are not ordinary functions anymore
  • I have added the function create, it is a factory function which creates an ExtendableSquareMatrix instance depending on a given type
  • I have added the function copy, its purpose is to create a deep copy of ExtendableSquareMatrix depending of its type
  • I have added few tests

@kocotom
Copy link
Collaborator Author

kocotom commented Mar 20, 2024

I have just made one more little change:

  • I have implemented custom copy constructor for the structure Partition
  • I have implemented custom operator= for the structure Partition which preserves information about the capacity of the partition vectors
  • I have added few tests to test copy constructor and operator=

@kocotom
Copy link
Collaborator Author

kocotom commented Mar 21, 2024

I have just realized that I had passed several vectors to functions without using constant reference even though these vectors are not modified within corresponding functions, so I have fixed it. Namely:

  • StateBlocks partition = StateBlocks() changed to const StateBlocks& partition = StateBlocks() in Partition structure constructor
  • std::vector<State> marked changed to const std::vector<State>& marked in Partition::splitBlocks method
  • std::vector<State> states changed to const std::vector<State>& states in Partition::inSameBlock method

@Adda0
Copy link
Collaborator

Adda0 commented Mar 25, 2024

Hey. Sorry for the delay. Still working on it. Things got in the way.

@kocotom
Copy link
Collaborator Author

kocotom commented Mar 26, 2024

I have just did one more little change. I have deleted the copy function which had cretead a deep copy of a given ExtendableSquareMatrix instance.
Instead, I have replaced it with a pure virtual method clone which is a part of the ExtendableSquareMatrix structure. Each substructure reimplements it on its own. I think this is a better solution for this problem.

Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thoughts. I did not review the feasibility of the solution, only the format and the overall structure.

tests/partition-relation-pair.cc Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more readable if we had all class declarations first, and only then all class method definitions. But it might be OK as it is. Any opinions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. I am able to do this if the others agree.
In context of readability, I have to say that there will be one more class in this file in the future (namely PartitionRelationPair which will combine Partition and ExtendableSquareMatrix). It means that there will be several hundred of LOCs added to this file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would probably split as much as possible, into different files and delimiting declarations from definitions if it is not possible otherwise. But that depends on what others have to say on this topic, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Adda0 Thanks for your answer. I have just realized that the current approach can't work. I wrote everything in one header file which contained

  • classes with simple declarations of methods inside of class { ... }
  • definitions of class methods oustide of class { ... }
  • inlined ordinary functions

However, I have misunderstood before and the methods were not inlined since they were not defined directly inside of class { ... } (I thought that it is sufficient that they are class methods to be inlined and the exact location of their definition does not matter). This caused violation of ODR. Thus, I had to change the structure of the file.

Since several of the methods are quite long, I have decided to split the file into multiple files. Currently (after my last commit), the structure is as follows:

  • include/mata/utils/partition.hh contains declaration of the Partition class and methods
  • include/mata/utils/extendable-square-matrix.hh contains definitions of the ExtendableSquareMatrix class and its methods (the same for the subclasses CascadeSquareMatrix, DynamicSquareMatrix,HashedSquareMatrix)
    • The methods are quite short so I kept them inside the class { ... } definition and I did not create extendable-square-matrix.cc file
  • src/partition.cc contains definitions of the Partition class methods
  • tests/partition.cc contains tests which cover Partition class
  • tests/extendable-square-matrix.cc contains tests which cover ExtendableSquareMatrix class

Is this ok? I was not sure whether the location of the src/partition.cc is OK since there is no src/utils directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mirror the folder hierarchy for header files in source as well. Having src/utils/partition.cc seems reasonable to me.

Otherwise, I personally am very happy with this separation. Lukáš should have the last say, however.

include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
include/mata/utils/partition-relation-pair.hh Outdated Show resolved Hide resolved
@kocotom
Copy link
Collaborator Author

kocotom commented Mar 27, 2024

Thanks for your suggestions. I have resolved several of them and I will continue later today. The last fix caused few unsuccessful checks but I have not discovered the cause yet, I will inspect it more thoroughly later today.

@kocotom
Copy link
Collaborator Author

kocotom commented Apr 15, 2024

I have made several changes in the Partition class inner logic to make it easier to manipulate with that data structure.

In the previous version, BlockItem, Block and Node were simple structures whose only purpose was to store various indices of partition vectors. The matching of the corresponding structures was done using several methods like get_node_idx_from_block_item_idx and so on.

In the current version, BlockItem, Block and Node are private classes defined inside of the Partition class. They still store indices of the partition vectors but they also contain methods which simplify manipulation with the partition.

  • class BlockItem

    • attributes
      • idx_ - index of the block_items_ vector corresponding to itself
      • state_ - corresponding state
      • block_idx_ - index of the corresponding block
      • partition_ - const reference to the partition
    • methods
      • idx and state getters
      • block - returns a const reference to the corresponding block
      • node - returns a const reference to the corresponding node
      • repr - returns a const reference to the corresponding representative
  • class Block

    • attributes
      • idx_ - index of the blocks_ vector corresponding to itself
      • node_idx_ - index of the corresponding node
      • partition_ - const reference to the partition
    • methods
      • idx getter
      • node - returns a const reference to the corresponding node
      • repr - returns a const reference to the corresponding representative
      • first - returns a const reference to the first corresponding block item
      • last - returns a const reference to the last corresponding block item
      • begin and end - return const iterators which enable using iteration through the block
  • class Node

    • attributes
      • idx_ - index of the nodes_ vector corresponding to itself
      • first_ - index of the first block item in the node
      • last_ - index of the last block item in the node
      • partition_ - const reference to the partition
    • methods
      • idx getter
      • repr - returns a const reference to the corresponding representative
      • first - returns a const reference to the first corresponding block item
      • last - returns a const reference to the last corresponding block item
      • begin and end - return const iterators which enable using iteration through the node

The other methods were changed to reflect the current representation of the partition.

@Adda0
Copy link
Collaborator

Adda0 commented Apr 16, 2024

This looks beautiful. I will review the changes during this week. The description above looks great, though.

Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me overall. I have fixed a few typos and simplified some constructs, which will be added in the following commits.

Before merging, all debug prints should be removed from the tests.

When all of these issues are resolved, I think we can merge the PR.

Considering your time constraints, would you like me to resolve these issues so that you can focus on your own problems?

src/partition.cc Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should go to src/utils/partition.cc. Since we do not have a src/utils/ folder, I would say we should create one and move this file there.

*
**/

using MatrixType = enum MatrixType { None, Cascade, Dynamic, Hashed };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use enum class. It is much safer to use (prevents accidental implicit conversions). You use the elements in this enum as elements of an enum class in the code, anyway.


// type of the matrix which will be chosen as soon as the
// child class will be created
MatrixType m_type{MatrixType::None};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MatrixType m_type{MatrixType::None};
MatrixType type_{MatrixType::None};

If the m_ stands only for a member. We use <name>_ for private member instead of m_<name>.

* @param[in] j column of the matrix
* @param[in] value a value which will be assigned to the memory cell
*/
virtual void set(size_t i, size_t j, T value = T()) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default arguments on a virtual method should be prohibited. It a very error-prone feature of C++. See SO discussion, for example.

For the sake of clarity, I think we should split this method into

void set(i, j) // Running the default constructor on the type.
void set(i, j, value) // Passing the user-given value by value. 

* @param[in] placeholde value which will be assigned to the
* newly allocated memory cells
*/
virtual void extend(T placeholder = T()) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as with set() should be done for extend(). So:

void extend() // Extending with default initialized values.
void extend(T placeholder) // Extending with a user-specified value.

* the default value of the type T.
* @brief changes the n x n matrix to the (n+1) x (n+1) matrix with
* copying of the existing data
* @param[in] placeholde value which will be assigned to the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no placeholder in the function arguments. Also, typo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug prints in the tests.

@Adda0 Adda0 marked this pull request as draft August 16, 2024 06:59
@Adda0
Copy link
Collaborator

Adda0 commented Aug 16, 2024

Due to some performance issues with this approach, I am converting the PR to a draft until we decide to work on this PR further. @kocotom, when you have free time again, feel free to let me know how to proceed with this PR further. No rush, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants