You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suggestions from @cur33 . These can (and should) be broken into separate issues as is appropriate, and handled separately.
Proposed changes to core library:
- Change Dataset::print to accept any ostream, default to cout, in order to
simplify addition of __str__ method to py_dataset.cpp
Design decisions:
- The method of struct initialization recommended by the Python/C API docs is
to use C99 designated initializers, which are very useful but not available
in C++. Instead, I followed this approach:
- for structs with sufficiently few attributes, like PyModuleDef, I used
a standard initialization approch by providing values for all members
(except trailing ones)
- for structs with many attributes, like PyTypeObject, I created
init_*_members() procedures that would set only the needed members,
which are called in the PyInit_fastkmeans() function
However, having seen other extension modules such as those in numpy which
simply provide 0 or NULL values for all unused struct members, I'm
starting to think that might just be simpler and more maintainable than
I'd originally thought.
- The Kmeans::initialize method needs an unsigned short * as an argument, so rather
than jumping through a lot of hoops to provide this simple functionality,
I made a simple Assignment class that wraps an unsigned short *. An
Assignment object is returned by the fastkmeans.init_centers* methods and
then passed to the [SomeKmeansType].initialize method.
- The Dataset type defined in py_dataset.cpp is technically a mapping type
in order to allow the `a_dataset[i, j]` syntax. The Assignment type is a
normal sequence type since it only needs a single value passed to the
bracket operator.
Possible future enhancements:
- It generally seems frowned upon to directly manipulate custom object
types, e.g. NaiveObject instances. Rather, to access specific struct
fields, one should provide macros or (usually inline) functions like
`NaiveObject_SOME_FIELD(PyObject *)` that will return the fields desired.
This way, a method that accepts a custom object instance does not need to
convert the PyObject * to a custom object *, and does not need to know
implementation details of the custom object type.
This should be easy to change because I only had to directly manipulate
Dataset and Assignment object pointers in one or two methods.
- Originally, and primarily because Python does not natively have the concept
of ``abstract base classes'', I followed the method of simply copying the
file for one Kmeans subclass and altering it for each other subclass.
However, I have since realized that the `NotImplementedError` could be
used to simulate base classes and allow for inheritance. This would make
the code base much more maintainable and easier to read. However, it might
be overkill since the project is not huge.
- Since the original structure of the project merely contained all the
source files in the `src` directory, I kept it that way, but now that the
number of files has increased quite a bit, it might make more sense to
move the Python module files into a separate directory. This shouldn't
create a problem for the Python module to find the header files of the C++
classes and functions, because the setup.py file can just be altered to
set the `include_dir` keyword argument to be something like '..' instead
of '.'.
The text was updated successfully, but these errors were encountered:
Suggestions from @cur33 . These can (and should) be broken into separate issues as is appropriate, and handled separately.
The text was updated successfully, but these errors were encountered: