Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary packages, cleanups. #500

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from
Draft

Binary packages, cleanups. #500

wants to merge 23 commits into from

Conversation

TkTech
Copy link
Contributor

@TkTech TkTech commented Feb 13, 2022

Didn't mean to make this PR against upstream just quite yet, ah well. WIP.

Changes:

  • Test Cleanups

    • Move tests outside of the module and into tests/
    • Replace imports from test_data with pytest fixtures, which is the pytest way of doing that.
    • Remove assert_np_arrays_equal and assert_np_arrays_not_equal, replacing them with the built-in numpy.testing.assert_array_equal
    • Some light PEP8ing
    • Skip polars tests if polars is not available. Remove polars from requirements_test.txt and manually install for tests workflow.
      • Polars is an annoying dependency to build in our compilation matrix, as it has binary wheels for just py3.6 and just a couple of platforms, which means we have to download rust and compile it python_versions * architecture times. We still test in tests.yml, but not on releases. They're working on packaging, this can be reversed when that's done.
    • Compile talib just once before running the test workflow jobs, then share it around.
  • Release

    • Added a release workflow which will compile binary wheels for py3.7-3.10 for a number of architectures.

Source and tests should be distinct.
This rule is blocking files in .github for example.
@TkTech TkTech marked this pull request as draft February 13, 2022 03:58
@TkTech TkTech force-pushed the ta_bundled branch 5 times, most recently from 0b45906 to 8722ece Compare February 13, 2022 04:36
It's stored as an artifact and passed into the test matrix jobs.
Use proper pytest fixtures for shared test data.
Remove two unnecessary array comparison functions.
@TkTech TkTech force-pushed the ta_bundled branch 2 times, most recently from c4b4e74 to 71ca307 Compare February 13, 2022 05:48
@mrjbq7
Copy link
Collaborator

mrjbq7 commented Feb 13, 2022

Cool! Let me know when you want me to review this.

@TkTech TkTech force-pushed the ta_bundled branch 7 times, most recently from 5f49bf3 to 3a6fbdf Compare February 13, 2022 09:12
@TkTech
Copy link
Contributor Author

TkTech commented Feb 13, 2022

Almost ready, just some fixes for m1 macs to go and to re-add the upload to pypi I removed while testing.

This builds binary packages for py3.7-310 on arm64-linux, arm64-m1, Power-linux,, x86_64 win/mac/linux.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Feb 13, 2022

That's pretty cool. What controls when it uploads? Is it tagged releases only?

@TkTech
Copy link
Contributor Author

TkTech commented Feb 13, 2022

I'm using my same workflow modified from pysimdjson. This pr is currently missing half the workflow because I don't want it to actually upload while I'm testing :)

When you push a tag that starts with v+number it'll make a release, push it to pypi, and create a release on GitHub adding all the build artifacts to it. Then you can just add a description.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Feb 13, 2022

So far my convention has been tagging, for example TA_Lib-0.4.24, maybe we can have it test for that

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Feb 13, 2022

Btw, thank you for the contributions!

@TkTech
Copy link
Contributor Author

TkTech commented Feb 13, 2022

It's just a regex, easy change.

@TkTech
Copy link
Contributor Author

TkTech commented Feb 14, 2022

@mrjbq7 the tests look like they're finding some precision issues on i686:

 ============================= test session starts ==============================
  platform linux -- Python 3.7.12, pytest-7.0.1, pluggy-1.0.0
  rootdir: /project
  collected 40 items / 1 skipped / 39 selected
  
  ../project/tests/test_abstract.py ................                       [ 40%]
  ../project/tests/test_func.py ..........F...F..                          [ 82%]
  ../project/tests/test_pandas.py ..                                       [ 87%]
  ../project/tests/test_stream.py .....                                    [100%]
  
  =================================== FAILURES ===================================
  _________________________________ test_BBANDS __________________________________
  
  series = array([ 91.5 ,  94.81,  94.38,  95.09,  93.78,  94.62,  92.53,  92.75,
          90.31,  92.47,  96.12,  97.25,  98.5 , ... 109.25,
         107.  , 109.19, 110.  , 109.2 , 110.12, 108.  , 108.62, 109.75,
         109.81, 109.  , 108.75, 107.87])
  
      def test_BBANDS(series):
          upper, middle, lower = func.BBANDS(
              series,
              timeperiod=20,
              nbdevup=2.0,
              nbdevdn=2.0,
              matype=talib.MA_Type.EMA
          )
          i = np.where(~np.isnan(upper))[0][0]
          assert len(upper) == len(middle) == len(lower) == len(series)
          # assert abs(upper[i + 0] - 98.0734) < 1e-3
  >       assert abs(middle[i + 0] - 92.8910) < 1e-3
  E       assert 0.0010000000000047748 < 0.001
  E        +  where 0.0010000000000047748 = abs((92.89 - 92.891))
  
  /project/tests/test_func.py:116: AssertionError
  ___________________________________ test_RSI ___________________________________
  
      def test_RSI():
          a = np.array([0.00000024, 0.00000024, 0.00000024,
            0.00000024, 0.00000024, 0.00000023,
            0.00000024, 0.00000024, 0.00000024,
            0.00000024, 0.00000023, 0.00000024,
            0.00000023, 0.00000024, 0.00000023,
            0.00000024, 0.00000024, 0.00000023,
            0.00000023, 0.00000023], dtype='float64')
          result = func.RSI(a, 10)
          assert_array_equal(result, [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,0,0,0,0,0,0,0,0,0,0])
          result = func.RSI(a * 100000, 10)
  >       assert_array_equal(result, [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,33.333333333333329,51.351351351351347,39.491916859122398,51.84807024709005,42.25953803191981,52.101824405061215,52.101824405061215,43.043664867691085,43.043664867691085,43.043664867691085])
  E       AssertionError: 
  E       Arrays are not equal
  E       
  E       Mismatched elements: 6 / 20 (30%)
  E       Max absolute difference: 7.10542736e-15
  E       Max relative difference: 2.13162821e-16
  E        x: array([      nan,       nan,       nan,       nan,       nan,       nan,
  E                    nan,       nan,       nan,       nan, 33.333333, 51.351351,
  E              39.491917, 51.84807 , 42.259538, 52.101824, 52.101824, 43.043665,
  E              43.043665, 43.043665])
  E        y: array([      nan,       nan,       nan,       nan,       nan,       nan,
  E                    nan,       nan,       nan,       nan, 33.333333, 51.351351,
  E              39.491917, 51.84807 , 42.259538, 52.101824, 52.101824, 43.043665,
  E              43.043665, 43.043665])
  
  /project/tests/test_func.py:162: AssertionError
  =========================== short test summary info ============================
  FAILED ../project/tests/test_func.py::test_BBANDS - assert 0.0010000000000047...
  FAILED ../project/tests/test_func.py::test_RSI - AssertionError: 
  =================== 2 failed, 38 passed, 1 skipped in 0.56s ====================

@TkTech
Copy link
Contributor Author

TkTech commented May 20, 2022

@mrjbq7 This is 90% of the way there, and just needs a few more fixes.

This PR now bundles ta-lib (actually a git subtree from https://github.com/TkTech/ta-lib-src) which is unmodified except for updates for configure's database so it can compile on PineBook pros and other newer architectures. This means that all the installations steps in the readme are just poof, gone. Just do a python setup.py install anywhere with a C compiler ta-lib will work without any configuration.

The release action for building binary wheels from 3.8 to 3.10 is done and working, https://github.com/TkTech/ta-lib/actions/runs/2357646627. I've commented out support for ppc64le and aarch64 in release.yml, because while it works, it takes 4+ hours to build due to needing to build numpy, pandas, and other native dependencies under emulation. See the numpy issue linked in the release.yml for when numpy will provide binary wheels for these architectures, and then this can be uncommented. That said, installing ta-lib with pip works on these platforms, and when running on a real machine usually only takes a couple of minutes so I'm not super concerned with excluding them.

I've removed py3.7 from the workflows as both numpy and pandas have dropped support for these. There's no version of pandas that supports both 3.7 and 3.10, so you have to chose. That said, someone else can totally add support for more versions, they just need to add another build step to install older versions of numpy+pandas for those versions of Python.

What's left for this is:

  1. Now that we're done testing, re-add the actual upload step to PyPi to release.yml.
  2. The regular test.yml is getting a weird error where it can't import the C extension. This has been difficult to debug since it only occurs on the Github Action.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented May 21, 2022

That seems great, I wonder if we can apply those fixes to https://github.com/ta-lib/ta-lib and make that our official source.

It's possible the current directory is wrong for the test.yml, i haven't looked into it specifically but if you try and import ta-lib from inside the ta-lib source directory but have not done a build in-place, then it will look at the current directory version of ta-lib not the one that was installed and built due to shadowing.

@TkTech
Copy link
Contributor Author

TkTech commented May 21, 2022

That seems great, I wonder if we can apply those fixes to https://github.com/ta-lib/ta-lib and make that our official source.

I'd like if there was one spot, but that repo definitely seemed to be abandoned. PRs and issues have been ignored for years.

It's possible the current directory is wrong for the test.yml, i haven't looked into it specifically but if you try and import ta-lib from inside the ta-lib source directory but have not done a build in-place, then it will look at the current directory version of ta-lib not the one that was installed and built due to shadowing.

AHHHH, you know what, I bet you this is exactly what's happening! pip just changed the behavior for build-in-place too and we update pip as the first step.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented May 22, 2022

I'd like if there was one spot, but that repo definitely seemed to be abandoned. PRs and issues have been ignored for years.

We made this as a clone of the source forge repo, there was some idea of taking over maintenance, but that seemed overly ambitious.

@TkTech
Copy link
Contributor Author

TkTech commented May 22, 2022

I'd like if there was one spot, but that repo definitely seemed to be abandoned. PRs and issues have been ignored for years.

We made this as a clone of the source forge repo, there was some idea of taking over maintenance, but that seemed overly ambitious.

Ah I didn't see you on the contributor list when I took a peek. If you have commit access to it that's great, we can definitely use it instead.

@xmatthias
Copy link
Contributor

xmatthias commented Sep 8, 2022

any update on this? it'd be great to have ta-lib release binaries (mainly for windows) - especially since the prior source (https://www.lfd.uci.edu/~gohlke/pythonlibs/#ta-lib) seems to have been archived - and didn't update since June.

This makes every ta-lib update a pain for users - since installing on windows is currently really not fun.

@xmatthias
Copy link
Contributor

I've taken the liberty to play with this a bit, updating it with the current master, and testing the wheels.

The CI runs on current master are available here:
tests: https://github.com/xmatthias/ta-lib/actions/runs/3027067796
build: https://github.com/xmatthias/ta-lib/actions/runs/3027067795

As mentioned above already - the "release" step has to be updated to only run on tags - and an "upload" step that publishes the wheels to pypi (if that's something you're interested in in the first step - this could also be done manually initially).

I've manually tested the wheels on windows and linux, and can confirm they work fine.

I've added a "test wheels" step to CI too - which is using a clean environment, only installing the wheels - to ensure they're working on windows, mac and linux.
This will ensure that wheels about to be uploaded are actually built properly (i don't fully trust wheel building - therefore prefer to test them individually) - and is basically taking the approach used in pytables CI as well.

The "manual test" step here involves calling python -m talib.abstract. It'll fail if ta-lib binaries are not properly installed / available.
If there's a better test command we can and should obviously update to that - i'd however not run pytest - as test-files are not included in the release.

@mrjbq7 i'd make a PR out of that if you're interested - it'll include the work of this PR - but with the intend to have it merged ASAP - simplifying people's update and installation steps.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 10, 2022 via email

@xmatthias
Copy link
Contributor

I love this idea, not sure about using this for releases, until I understand how to only have it release tags, and also how to download and test the intermediate artifacts myself?

You can download the artifacts from the build all the way at the bottom (see screenshot).

image

it's a zip file, which contains the source tarball (what you release currently) - as well as all wheels for different plattforms.
The best way to install these is by using pip - with pip install --no-index --find-links <unzippeddir> TA-Lib. (please note that the command assumes your env/virtualenv already has dependencies (mainly numpy) installed - as it specifies --no-index to avoid accidental installation from the package index).

not sure about using this for releases, until I understand how to only have it release tags

that's no problem, really - you'll always have the option to download the wheels first - inspect/test them - and then push them to pypi manually (basically with twine upload <wheeldir>.
The benefit comes from having all wheels available (also for platforms you don't directly own).

i think users will benefit from this by having the installation simplified with pip install ta-lib(no additional steps prior to that) - which will no longer depend on having the C library installed upfront - as well as reduced support - as i noticed many issues opened here are around "failed to install for whatever reason" ...

@TkTech
Copy link
Contributor Author

TkTech commented Sep 10, 2022

I've added a "test wheels" step to CI too - which is using a clean environment, only installing the wheels - to ensure they're working on windows, mac and linux.

If you're basing your changes off of mine, that's already what cibuildwheel is doing. It'll build the wheels, copy them into a new env, and run the tests before allowing it to pass.

@xmatthias
Copy link
Contributor

If you're basing your changes off of mine, that's already what cibuildwheel is doing. It'll build the wheels, copy them into a new env, and run the tests before allowing it to pass.

i am aware of that - but i don't fully trust that for libraries like ta-lib (Or pytables, for that matter).

the wheels would also work if ta-lib is installed system-wide - reusing the library from some place like /usr/lib or similar.
So if the build method doesn't install the C library in the environment, but in the system (someplace), they'd be available for the tests - while a "clean" installation on a fresh system would fail.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 13, 2022

Do we want to bundle upstream or download from upstream directly, e.g., ta-lib.org?

Can we get a version of this that does everything but doesn't upload releases, and get that merged?

@TkTech
Copy link
Contributor Author

TkTech commented Sep 13, 2022

You need to bundle upstream - the reason I include a git subtree of my fork of ta-lib is for platform compatibility, and ta-lib.org will never see updates. The fork will build on a Pinebook Pro for example, whereas ta-lib.org will not.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 13, 2022 via email

@TkTech
Copy link
Contributor Author

TkTech commented Sep 13, 2022

Want to collaborate in the ta-lib/ta-lib GitHub org? We can push those patches there, and maybe move this to ta-lib/ta-lib-python?On Sep 13, 2022, at 8:04 AM, Tyler Kennedy @.> wrote: You need to bundle upstream - the reason I include a git subtree of my fork of ta-lib is for platform compatibility, and ta-lib.org will never see updates. The fork will build on a Pinebook Pro for example, whereas ta-lib.org will not. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

Sure, I have no issue with this. I think we discussed this before and the reason I forked was that the repo seemed inactive.

the wheels would also work if ta-lib is installed system-wide - reusing the library from some place like /usr/lib or similar.
So if the build method doesn't install the C library in the environment, but in the system (someplace), they'd be available for the tests - while a "clean" installation on a fresh system would fail.

ta-lib should never be used from the system now, we explicitly bundle it. If there's a place where we're accidentally allowing a system ta-lib to get linked in we've got a bug. There are fixes for 32bit/64bit precision and platform support that the archaic system package manager versions will not have.

@xmatthias
Copy link
Contributor

xmatthias commented Sep 14, 2022

@TkTech What's the actual source for the "inlined", upstream ta-lib repository?

comparing it to https://github.com/TA-Lib/ta-lib (basically did a clone - and copied the contents of upstream over the clone) - i do see quite severe differences.
starting with the copyright being from 2007 instead of 2008 - which suggests it's missing at least commit 5a833ad57bbedb5fdd3c6fa62c7c25b36e9cf94e (a commit made mid 2008) - and potentially everything that went in on top of that.

i've now tried to use the upstream one (branch master) - the diff this caused can be found here:
xmatthias@8353ef7 (it's currently not building as i reverted all changes).
seems like the upstream version has more indicators - for example IMI - which is a file that was absent from the inlined source in this PR.

edit seems like there's actually been a 0.5.0 release of TA-Lib - and a 0.6.0 dev (which represents the emaster branch).
it's a decision point which version should be bundled ...

@TkTech
Copy link
Contributor Author

TkTech commented Oct 1, 2022

@xmatthias sorry, missed your reply. The upstream version I used is the official ta-lib release from https://ta-lib.org/. The version on github is an "unofficial" (but defacto official at this point) version by @mrjbq7 and friends (I believe)

@trufanov-nok
Copy link
Contributor

trufanov-nok commented Oct 1, 2022

Official https://ta-lib.org/ conatins a realease of 0.4.0 and official https://ta-lib.org/ 's project at sourceforge contains unreleased 0.6.0-dev. It's not about github or unofficial forks in any case.

Didn't read the whole thread - just leave here a note from my old findings from my old article about compatibility:

Deviations of TA-Lib 0.6.0-dev from TA-Lib 0.4.0
0.6.0-dev is more sensitive to the small values.
TA-lib is using TA_IS_ZERO() macro to compare floating point integers to zero. In 0.4.0 it assumes all integers which absolute value is smaller than 1e-8 to be zero. Since this commit the threshold was decreased to 1e-14. Dozen of technical indicators in TA-Lib are making decisions based on TA_IS_ZERO() result. Thus you may face with discrepancies in results on TA-Lib 0.4.0 and TA-Lib 0.6.0 or TA-Lib RT for such indicators. For example, if your input data contains long enough periods where values almost not changes.

Thet's a most important (may be only one - don't remember) difference in my opinion.

@xmatthias
Copy link
Contributor

i've (successfully) built wheels for all versions (0.5.0, 0.6.0-dev, 0.4.0 <- all from github, 0.4.0 from orig source)

i also agree with @trufanov-nok that the floating point fix should be included.
The project i maintain has a "install script" for ta-lib which applies this fix for linux installations to avoid problems.
It's been included in the unofficial windows wheels (linked on the readme), too - which can be tested with the new test-case (linked below)

a rough overview of this change can be found in this commit: xmatthias@6452708 (which also adds a testcase ensuring this fix is included).

I think which version should be used initially is a decision @mrjbq7 will have to take (please let us know your thoughts).

Personally, i'd recommend to go with a 0.4.0 version initially, with the float fix applied (don't care if it's the github one or the ta-lib.org one, really).
That way it'll be aligned to current behavior / install instructions as much as possible.

Future updates to 0.5.0 or 0.6.0 can then be made as future steps.

Copy link

@kk00karl kk00karl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 改,其他过。 ta
-taller

@francipvb
Copy link

Hello all,

Another idea is to include the talib source as a git submodule. This way even source distributions can be built on the go by any user.

This approach would simplify the management of the package and the source versions can be pinned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants