-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extensions for use cases from DKPro Core and cTAKES to the CAS interface #83
Comments
Do you have a typesystem xml which I can use for that? |
You mean for testing purposes? I think it would be the job of the other library to provide the initializer. Although for DKPro, Cassis could also provide it directly :) |
The idea with the initializer is a bit more than just The idea is that the initializer patches the CAS instance with additional methods, e.g.
... and that we could e.g. have an for DKPro Core and another one for say cTAKES and both would patch the CAS with the same convenience methods but internally resorting to different select statements. The initializer would work like a visitor, e.g.
@jcklie such a thing works with Python, right? |
We can provide these methods, I am not sure about the implementation though. My question was:Is there some official DKPro typesystem XML which I can use or can you provide me with some Java Code to generate it to keep it in sync with DKPro? |
The "best" solution for this would probably be to use DKPro Meta :) |
Well, basically what you do is create a Maven Project which has a dependency on all
|
I would implement it as extending Cas, the constructor loads the DKPro sype system then. Simple and not so magic. |
But then we'd end up having to import CAS from different libraries... |
I would add it to cassis, so it is |
For the moment, I don't feel very comfortable with this. I don't like the idea of the CAS becoming something new just because it contains certain types. The idea of the CAS is that it is a generic data structure. If we subclass it for a particular framework, I feel it goes against this idea. Actually, the strategy you have shown me OTR for the Pandas accessors looked nice. It makes very clear that there is one generic data structure and there are separately different ways of accessing it. |
Is it ok if we implement this in cassis or should it be part of pydkpro? |
I understand that IDEs may not support auto-complete for such extensions. But I wonder if IDEs like PyCharm really only do static code analysis or also consider whether a method has actually been called somewhere before. E.g. if I call method |
Pycharm offers some auto completion based on what was called before (the typing is limited then) and there are stub files where you can maybe add more information: https://mypy.readthedocs.io/en/latest/stubs.html . But it does not know that there is an extension, as it is added at run time (except when I just add it as a field to cassis and throw an error if it is not compatible). |
The idea of involving cassis came to me because I though we should/could pass the type system "strategy" to the constructor - i.e. cassis would somehow have to understand the strategy and react to it. If we use a completely different mechanism which does not require cassis to be aware of the mechanism, it could be done elsewhere. A compromise between subtyping and adding dynamically might be a generic type (if such a thing is possible?), e.g.
|
I think we need features from Python 3.8 for that and even then I am unsure. So what we have now is:
|
I think this issue contains two things, the DKPro type system and extension. I will track the type system stuff in #9. |
I did some quick and dirty script to convert a typesystem XMI to Python classes for the DKPro Core type system. One can get type hints for the wrapped CAS, the accessor and does not need to redefine all cas methods: The code basically is
I can write a decorator for init and |
I do not know whether I want to keep the type hints for the extension methods, but I like how to define extensions. |
So the IDE dynamically evaluates the DKProAccessor to discover the fields? |
We tell the IDE that |
How does the IDE know that e.g. Token has the field |
I generate type descriptions Python code from the XML. If you have a fixed type system, then you can do that and check the generate python code in your source control. I will later push the code for that; this issue should maybe focus on the extension only. |
Generating classes from the type system description - so a "pycasgen" - an equivalent of the "jcasgen" we have in Java which generates Java classes from the type system. Why not? :) I think such a "pycasgen" script could be part of cassis and projects like DKPro Core or cTAKES could pre-generate the classes and push them to pypi as separate packages. WDYT? |
We can do that. My question right now is where to put the extensions, I like to have them in cassis itself, as they are related to CAS/XMI stuff. Also, I need them sometimes for my own code and dont want to install pydkpro just for the extensions and types. |
If by extensions you mean e.g. the generated types - I think these should be released separately and with the same version numbers as the corresponding DKPro Core / cTAKES / etc versions. They do not follow the same release cycle as cassis. |
I mean the dkpro/ctakes accessor and util functions that were requested. |
@zesch @aggarwalpiush WDYT? Type-system-specific accessors and Python classes generated from type systems should probably be kept together and have a release cycle mirroring the release cycle of the type system they mirror. Have them as a separate project under DKPro already now (which I think would be nice since we could already make use of them in INCEpTION)? Have them with your pipelining code later? |
Not sure I really understand the implications. Whatever works best on your side. |
I would create a new repository and Python package dkpro-typeshed where we add the extension methods and generated types to get a nice API. This would then only depend on cassis. pykdkpro then can use it to make its API nicer. We use a seperate package in order to track the dkpro version and respective types new/different types. |
Sounds good |
We have various DKPro projects and they all have different release cycles. I think the type system is generated for a particular version of a particular project. Thus having a single repo where all generated types are located doesn't seem sensible to me. We would always have to release all types at the same time and it would be impossible for users to choose a version combination they would care for. I think having a type companion repo for each DKPro project would make sense, e.g. |
This sounds like a lot of work and maintenance nightmare, right now it also works without (type unsafe in the same way the raw Java cas interface has no safety and type information). So I would then just add the accessor which returns the right FeaturesStructures but gives no IDE support, i.e. changing
to
as a first step. |
What's a maintenance nightmare? |
Having a repo for each would mean to set up many repositories and pypi packages. I would rather not do that right now. |
We only need to set up one for DKPro Core. I even thought about putting the generated Python classes directly into the "dkpro-core" repository along with all the Java stuff. But considering that the Python stuff is still "young", we might care to refine/release it more often than the Java stuff, so it might have a faster release cycle (e.g. "2.0.0, then 2.0.0.1 because we fix a bug in the code generator, then 2.0.0.2 because we fix another bug, etc."). |
I have added a repo here and you should all have proper access to it: https://github.com/dkpro/dkpro-core-python-api We can still rename it / move around things later if we decide to change anything. For now, we'll only create types for DKPro Core anyway. |
I will come back to this after the ACL deadline. |
it would be nice to be able to initialize a CAS with a certain type system, e.g.
The text was updated successfully, but these errors were encountered: