You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to be able to make datapackage pipelines connecting to many disparate JSON, XML and HTML data sources. Often this requires changes to custom parsers of tabulator but I cannot then easily re-use the stream_remote_resources code. I do not want to copy and paste a whole module just to change one line. Therefore I resorted to the above hack of importing the dpp mopdule at the abottom of the file.
In the above case I had a JSON parser for the JSON API spec that also did pagination. This follows a similar pattern to the SQL data parser. I also have a similar one for SPARQL endpoints.
To frame the issue in another way problem when reusing code from the dpp project is that the CSV dump code is class based and can be easily overridden but the stream_remote_resources module has import time logic making its re-use difficult.
If there is a desire to retain the simplicity of functional approach for dpp modules, might it be possible to have a magic "run" function. This would retain backwards compatibility but if users wanted they could put their import time logic in the run function instead.
This would allow users to import specific functions and override others without a full class-based approach.
In order to submit an issue, please ensure you can check the following. Thanks!
python --version
)Currently I have to create a sseparate task in order to use a custom parser like this:
https://github.com/strets123/frictionless-pres/blob/master/smdataproject/stream_remote_resources_custom.py
This breaks pep8
The text was updated successfully, but these errors were encountered: