Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 support #62

Open
tom-de-smedt opened this issue Dec 19, 2013 · 70 comments
Open

Python 3 support #62

tom-de-smedt opened this issue Dec 19, 2013 · 70 comments

Comments

@tom-de-smedt
Copy link
Member

Pattern should start supporting Python 3. Looking at the amount of code, it is a non-trivial task and any help is much appreciated.

@pemistahl
Copy link

Hi Tom,

I'm a graduate in computational linguistics and would like to contribute to Pattern. Can you be more explicit about how Pattern should support Python 3? That is, do you want to maintain two different branches in parallel, one for Python 2 and one for 3? Or do you want to have a single code base that works both with 2 and 3? In the latter case, a library such as six would be useful.

Let me know what you think.

Cheers,
Peter

@tom-de-smedt
Copy link
Member Author

Hi Peter,

My goal would be to have a single code base that works with 2 and 3, but I have little experience with Python 3 so I don't know how feasible it is. In any case, the task is becoming more urgent so I will start looking into it more. I took a look at six which seems very useful. It's MIT-licensed so it could be included in Pattern.

Any help is appreciated! Let me know what you think.

Best,
Tom

@hayd
Copy link

hayd commented Apr 16, 2014

👍 on a single codebase.

I think the first stage is to add travis for testing (I looks like you're missing a requirements.txt file, so I'm unsure what deps it's missing (?) ). Travis will really help with conversion (and ensuring it continues to work on multiple platforms).

  • Do you want to continue supporting python 2.5? (Travis no longer supports it, 2.5 usage is pretty low and not sure which deps still support it...)
  • Check which deps support python 3 (things in requirements.txt), fix them up if need be.
  • Once that's hooked up, you could run a modernizer/futurize (see http://python-future.org/automatic_conversion.html#automatic-conversion ) on your code (or use six) and see what happens...

Happy to help if you can pass a requirements.txt.

@waylonflinn
Copy link

Got through the first two steps outlined by @hayd in this fork (repo has a requirements.txt and .travis.yml).

Some of the tests need to be excluded.

from test.py

# pattern.db tests require a valid username and password for MySQL.
# pattern.web tests require a working internet connection 
# and API license keys (see pattern.web.api.py) for Google and Yahoo API's.

Travis is just running python -m unittest discover -s test right now.

@waylonflinn
Copy link

Ran futurize on the codebase. Here are some preliminary findings:

  1. Some of the bundled dependencies appear to have already been futurized (they contain from __future__ import and from future import statements). Now that we have pip and virtualenv does it make sense to unbundle these?
  2. Unicode is used extensively throughout the codebase. I used from __future__ import unicode_literals in several places (mostly for raw string literals), but this should probably be handled more carefully in the long term.
  3. Does it make sense to replace web.json.encoder with the standard library module? There was a section starting with the comment ## HACK: hand-optimized bytecode; turn globals into locals that I wasn't sure how to deal with and had to comment out.

I'm a bit new to python, so any feedback is appreciated. This is a beautiful library and I'd love to see it get the unicode love from python 3.

@hayd
Copy link

hayd commented Aug 28, 2014

My 2cents:

  • from __future__ and from future are perfectly fine.
  • there may be a reason for the byte hack (is it faster than std library?)
  • mysql tests are fine, you have to set up the db with travis (one line in .travis)
  • travis internet tests also fine, although they may fail sometimes... worry about that later (when/if it becomes a problem e.g. decorator to skip test if there is connection exception or something).

Not sure what to do about API keys, was wondering what other modules e.g. pandas did for those parts... IIRC there may be keys you can use for testing of clipped results...

Perhaps it makes sense to create a PR for this and comment there, then you can comment on specific bits of code :) ... first pass tests then make pretty

@tom-de-smedt
Copy link
Member Author

There's an "official" fork of Pattern with the specific aim of making it compatible with Python 3:
https://github.com/pattern3

The wiki has some more information:
https://github.com/pattern3/pattern/wiki

The compatibility update is supported by a grant from the Python Software Foundation. This money is to be divided among contributors. You can read the grant proposal here:
http://www.clips.ua.ac.be/media/Pattern-3-grant-proposal.pdf

The fork is initiated by myself, Waylon Flinn and David Branner. Everyone (Peter & hayd?) is welcome to join as admin of the project. As admin, you'll be able to edit anything so feel free to take initiative! (we do encourage pull requests, so we can keep track of who did what)

@hayd
Copy link

hayd commented Oct 28, 2014

Happy to help with this, however when I tried (and trying again just now) running the tests I get a load of exceptions (python 2.7). I suspect this is just initial set up on my machine...

What do I need installed / setup to run the test suite (locally)?

Assuming fresh python install (or env) the following is failing:

git clone ...
cd pattern
python setup.py install  # this *ought* to install dependencies, but I don't think it does
nosetests  # this should sniff out and run all the tests, and does.

See to the travis run in the above fork: https://travis-ci.org/pinleague/pattern/builds/32799385 (this is the kind of thing that's failing though that's a couple of months old).

@hayd hayd mentioned this issue Oct 28, 2014
@tom-de-smedt
Copy link
Member Author

Hi Andy,

My knowledge of Travis is zero, but different people including yourself have suggested it as a first step so I will examine it more closely. Looking at the output of the link you provided, these look like typical Python 2 vs 3 errors, e.g., using print stuff instead of print(stuff) and except Exception, e instead of except Exception as e. These are easy to fix, I previously used regular expressions to update them in the source code, but not yet in the unit tests. I'll look at updating the unit tests and push it to pattern3.

Best,
Tom

@hayd
Copy link

hayd commented Oct 28, 2014

@tom-de-smedt Lots of stuff to migrated to python 3 but this can really only done with confidence once tests pass (and at the moment I can't get them passing either locally or on travis on python 2.7!!!).

At the moment they (the python 2.7 tests) fail with errors from the bottom of this page: https://travis-ci.org/pinleague/pattern/jobs/32799386. Any ideas why?

@pemistahl
Copy link

Hi Tom,

as I wrote at the beginning of this year, I'm still interested in contributing to pattern. However, I have not started yet because I didn't really know where to start. But now there exists a concrete plan and I would like to be part of it. I haven't written Python code for more than a year now but it should be easy for me to get into it again (I wrote a lot of Python code during my studies and I like the language very much). Last but not least, I have been out of the computational linguistics area since I started my current job a year ago, but it would be great to deal with that stuff again.

Some things are not yet clear to me:

  1. You wrote that the fork should be made compatible with Python 3.3 but 3.4 has been released already. Shouldn't it be compatible with 3.4 then?
  2. In the fork's wiki you wrote that the fork should be made compatible with both Python 2.7 and 3.3. But what's the point of creating a fork explicitly named pattern3 if it should still support Python 2.7? In my opinion, we can provide for a much cleaner and optimized code base if we completely drop 2.7 support. Then, the usage of libraries such as six would become obsolete. Of course, the downside of this approach is to maintain two separate code bases.

I cannot tell you yet which module I would prefer to work on. First, I need to take a look at the code again. I'm not sure though whether it's a good idea to have a lot of admins for the fork. Working with pull requests is much better anyway due to the reasons you mentioned.

@hayd
Copy link

hayd commented Oct 28, 2014

This was partially my misunderstanding (!), just running nosetests ran the abstract test methods, which fail (at least that's part of it). cleaning these classes is probably a good thing to do anyways (they are in an "interesting" style... e.g. IMO the suite functions should go), I've cleaned up a little...

I had to capture a few actual test failures and some HTTP403Forbidden and HTTP404NotFounds. There's also a couple of proper errors (in python 2), which for now I'm skipping those tests, but they really need looking at, I've labelled them FIXME in my branch (should I PR to pattern3 or here once passing?)...

As I said above, it worth making necessary that these tests pass reliably in python 2 before even attempting to migrate to python 3 (otherwise it's shooting in the dark). That said, I think the issues I've found (and labelled FIXME) are minor (or at least I'm hopeful that's the case if someone can look at them who understands the codebase!).

See hayd/pattern@c5d9c23...ce1fe81 (and on travis https://travis-ci.org/hayd/pattern/builds/39245044, unfortunately not quite passing python 2.6 and 2.7, I may have to skip/fix a couple more? Some tests seem flaky - especially those that compare e.g. to 0.771!).

Note1: This allows the test suite to be run by simply calling nosetests (or py.test).

Note2: I'm skipping the mysql tests atm, but that's no biggie to fix just an install in the yml (our objective is for no tests to be skipped on travis), the others are more important, but I'm afraid I need a patterns expert to look at the FIXMEs!!


Just to clarify the objectives here:

  • skip tests which fail (on python 2) and label them FIXME - this is mostly done
  • have travis running successfully (with some skips) on python 2
  • remove the skips (by fixing the bugs (?) for the FIXMEs)
  • once all tests are running on travis (and locally on tox), migration can begin safely

@hayd
Copy link

hayd commented Oct 28, 2014

To answer @pemistahl I don't think going fully py3 (and dropping support for py27 is (Edit: NOT) a good option for a library... for the next decade!). I would like to see a shared code base and drop support for python <= 2.5 (nearly every library is dropping python 2.5 support).

I'd really like to see pattern3 (once ready) merge upstream into pattern.

@pemistahl
Copy link

@hayd OK, I get your point. I'm okay with that. It just reminds me again of how unhappy I am about the Python 3.* transition in general across the Python community.

Another question @tom-de-smedt : If working with pull requests is the preferred way for contribution, then why did you create the pattern3 fork? Anyone who wants to contribute would create their own fork anyway. Wouldn't it be sufficient to simply create a branch here in the main repo for this purpose?

@hayd
Copy link

hayd commented Oct 29, 2014

I've submitted a couple of PRs to the pattern3 branch, I think it makes sense to fix that up then merge back here (it's going to be easier to keep track of things if they are in separate repos, separate issues/PRs etc). I would strongly recommend downing-tools for a short-while (here on clips/pattern) - hopefully for only for a few weeks, and concentrate on the pattern3 branch/repo.

I'm "somewhat hopeful" it's not a massive job (famous last words). Once the python3 imports are working it should be clearer where the hit list is going to be (I suspect the toughest are the str/bytes handling).

@hayd
Copy link

hayd commented Oct 30, 2014

Just to update those following at home, last night I got python 3 running all tests without syntax or import errors (of course, half those tests are failing), python 2 is still passing all the tests (except those tests which failed before migration which are skipped).

pattern3#6

(It did require ripping out the bundled (vendorized) packages and making them dependancies - I think this is a good idea anyway... so, more "home-testing" in python 2 may be a good idea before this update is merged back clip/pattern? esp. where there is poor coverage.)

This means there is a more obvious hitlist of things to do. For those who want to help I recommend (once this is merged), attempting to make all the tests pass on specific testing files you're interested in (e.g. for database):

$ nosetests test/test_db.py
$ nosetests test/test_db.py:TestClass
$ nosetests test/test_db.py:TestClass.test_method

$ nosetests test/test_db.py --pdb --pdb-fail  # drop in when there's a failure/exception

A more complete todo list issue: pattern3#5

I haven't really thought about how six fits here, IMO if it makes fixing a test easier then use it ?

@hnykda
Copy link

hnykda commented Dec 9, 2014

Hello,

I'm looking forward to use Pattern with Python 3, because my work is written in it. I'm kind of confused with current state of Python 3 support. This package is not installable (at least, not through pip - I'm getting Python 2 errors) and and the pattern3 doesn't contain all the code base (at first sight).

By the way, Python 3 is getting more and more focus today and it's very good idea to follow this trend. You use a lot of packages, somehow embedded which is definitely not good idea for the future (e.g. BeautifulSoup_v3.2.1 is not supported for years).

@hayd
Copy link

hayd commented Dec 9, 2014

@kotrfa pattern3/pattern isn't on pip yet (so not installable), the tests aren't passing for python 3 either so it's not ready for release yet - though quite a bit of work has been done. I think the plan is for this fork to become the pattern on pip (at least that's my understanding), and it'll support both python 2 and 3.

In pattern3/pattern I've ripped out a load of the vendorised deps (which is perhaps why it looks like the code base is so different), for example beautiful soup. The tests from clips/pattern are still all there and all pass (in python 2), so nothing was removed in this process (I claim).

If you'd like to help out, which would be fantastic, please clone pattern3/pattern and see if you can help with anything in the todo list (maybe pick a test file and get it passing in both python 2 and 3, perhaps the section you need in your work?). I have a few of the areas of the codebase passing already (in both python 2 and 3), IMO it's not a huge amount of work to go :) mostly fiddly unicode stuff, then we can get it out on pip...

@hnykda
Copy link

hnykda commented Dec 10, 2014

Hello,

yeah - I was speaking about installing this fork, not Pattern3, which is, as you said, not available on pip.

I don't really need any part of pattern currently - my work is almost done and I've found Pattern to late, unfortunately. Nevertheless, maybe I could replace some parts of my current code using Pattern and simplify it. In that case, I would definitely like to help. But it doesn't seem likely I'll do it in following weeks, since end of semester is coming.

You have done quite a lot of amazing job by the way, thank you!

@hayd
Copy link

hayd commented Dec 11, 2014

FYI all, I did a little the last couple of days, now test_db and test_web are the only remaining py3 failing tests files (also test_examples, but that's IMO a special case). I don't think they should be too bad to fix... e.g. main things

  • the web stuff has one infinite loop (when crawling), which I'm not sure how to debug (!)
  • the db stuff complains about already created tables (in the tests), hopefully this won't be too bad

Surprisingly these are py3 only failures (the py2 still passes)...

That said, there are some hacks - especially the unicode workflow - which could be cleaned up.

Edit: Too hasty in victory, I've nearly got vector working https://travis-ci.org/hayd/pattern/jobs/43751620

@hnykda
Copy link

hnykda commented Dec 11, 2014

Thanks for the information! It is really promising. 👍

@hayd
Copy link

hayd commented Dec 12, 2014

@tom-de-smedt actually the vector thing is a little weird, it looks like that vector tests fails about 50% of the time on python 3 although it passes all the time on python 2; from running the test 10 times on both. In a way it's good that I think we're into a place where expertise is needed! :) see pattern3#17

@Zearin
Copy link

Zearin commented Jul 14, 2015

+1 for Python 3 support.

I realize the need to support a mature, powerful, and loyal community of legacy Python users, but Python 3 is only going to get more relevant with time, not less.

More importantly, Python 3 is just better. Its standard library organization is much cleaner, its syntax is more readable, and in many common cases it performs significantly better than Python 2 (speed and/or memory footprint).

That said, it’s often tricker to port to Python 3 than it “feels” like it should be. For a while, six has helped make this a little easier, but it only went so far.

To make the transition as painless as possible, I strongly recommend the Python-Future package. It is way more powerful than six; it has tools focused on automating as much of the transition as possible; and it has truly excellent documentation.

I believe it was mentioned earlier in this thread, but I just wanted to reiterate its awesomeness for anyone that might have missed it. Seriously—just browsing its documentation can evoke the inspiration to transition to a 2-3 compatible codebase.


I haven’t used Pattern yet, but it also has excellent documentation (great job!). Unfortunately, my current research is in Python 3. That’s how I found my way to this page. I hope Pattern gets to Python 3 soon!

Keep up the excellent work, and May The Source™ Be With You!

@hayd
Copy link

hayd commented Jul 14, 2015

@Zearin I used future to do the majority of the heavy lifting in the python 3 port, see the pattern3 repo. Please do try it out.

@MarcosGinel
Copy link

How could you define the "state" of the project for porting Pattern into Python 3?

I used two years ago for Python 2.7 and it was awesome, now I'm going to work with Python 3 and I would love to use it (Pattern) again!

Thanks!

@legel
Copy link

legel commented Aug 11, 2016

Greetings, we came across this from here, and I just noticed that while a lot of the build looks stable, support for Python 3.3 seems not to be working? At least that is how I would interpret the Travis CI page. Thanks.

@markus-beuckelmann
Copy link
Collaborator

I realize I'm a bit behind on keeping people following this issue up to date with the latest progress! Google Summer of Code is over, since a couple weeks now already, and it has brought substantial progress (see full list of commits). We are now in a position where we have a version on the development branch that supports all modules except for pattern.server on both Python 2.7 and Python 3.5+. For people who want to find out more about the specifics and intermediate steps, go ahead and read my detailed GSoC reports on the Newsaudit blog (#1, #2, #3).

So now the plan is to smooth out the rough edges and release a new major version Pattern 3.0 within the next months. There is really only one known bug at the moment that is solely related to Python 3 and it only affects the information gain tree classifier IGTree in pattern.vector. Then there are a couple of issues like deprecated web APIs in pattern.web that should be addressed before the next release.

In the meantime, everybody feel free to check out the development branch and report any issues that may come along!

@transfluxus
Copy link

not sure how long this is gonna take. but as soon as the page is up again maybe you can add some info about that dev works with py3, what the restrictions are (what does not work yet) and how to install it. e.g.
go into your python3 environment!

git clone https://github.com/clips/pattern
cd pattern
git fetch
git checkout development
python setup.py install

might make a lot of people happy

@markus-beuckelmann
Copy link
Collaborator

Sure, I will update the README.md on the master branch with some more information in the next days.

@jpfairbanks
Copy link

@markus-beuckelmann Is there any update on this? Is the current advice to build the development branch if we need python 3 support?

@markus-beuckelmann
Copy link
Collaborator

@jpfairbanks, yes, if you need Python 3 support right now you can check out the development branch: git clone -b development https://github.com/clips/pattern

@derNarr
Copy link

derNarr commented Jan 4, 2018

On debian 9 mysql_init was missing, while trying to install the development branch of pattern under python3. After I installed the mariadb drop-in replacement for mysql_init with sudo apt-get install libmariadbclient-dev everything I tried worked as expected with the pattern module.

@masaguaro
Copy link

I have installed pattern3 with pip in a conda virtual environment. I am working in Windows 8.1 64-bit .When I try to execute
from pattern3.en import tag
I have the following errror:

Traceback (most recent call last):
  File "test_pattern.py", line 5, in <module>
    from pattern3.en import tag
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\en\__init__.py", line 22, in <module>
    from pattern3.text import (
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\__init__.py", line 28, in <module>
    from pattern3.text.tree import Tree, Text, Sentence, Slice, Chunk, PNPChunk, Chink, Word, table
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\tree.py", line 37
    except:
         ^
IndentationError: expected an indented block

Any idea or help will be appreciated :)

@JanmajaySingh
Copy link

JanmajaySingh commented Mar 6, 2018

@masaguaro I have the same issue while trying to use gensim lemmatization. Maybe a recent push gone wrong by a mix of tabs and spaces?

I'll open a new issue about this.

Update: the last commit to pattern3.text.tree.py seems to be 3 years ago.
Issue #217

@masaguaro
Copy link

Thank you @JanmajaySingh . I posted the same question on stackoverflow but I had no answer . It seems that there is not too much Python 3 support. Perhaps you are right, and it's just a mix of tabs and spaces, which shouldn't be difficult to fix (using Sublime, for example). Right now, I am doing some work with NLPTK, but I will keep your idea for future use.

@markus-beuckelmann
Copy link
Collaborator

@masaguaro @JanmajaySingh (#217), you are using the deprecated pattern3 repository which contains a completely different code base that is not maintained anymore. There is a development branch here on clips/pattern with Python 3 support. You can clone it, git clone -b development https://github.com/clips/pattern and install with pip or conda. Let us know if there are any issues with the development branch...

@JanmajaySingh
Copy link

@markus-beuckelmann Thanks! Issue #217 was closed.

@masaguaro
Copy link

Hello @markus-beuckelmann @JanmajaySingh
I am working in Windows 8.1 64-bit, in a conda virtual environment.
I was trying to run pattern/examples/01-web/04-twitter.py and I had the following error:

Traceback (most recent call last):
  File "C:\Users\Desktop\pattern\examples\01-web\04-twitter.py", line 12, in <module>
    from pattern.db import Datasheet, pprint, pd
  File "C:\Users\Desktop\pattern\examples\01-web\..\..\pattern\db\__init__.py", line 1879, in <module>
    csvlib.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long
[Finished in 1.5s]

Any idea ? Thank you in advance.

@ash-williams
Copy link

Hey,

Is there any time frame on an official release of the python3 support? Or an idea of how close it is to being ready?

Thanks,
Ash

@tuxayo
Copy link

tuxayo commented May 23, 2018

If enough people are interested, especially if it helps them working. Maybe we can consider putting a bounty on this task? It would be nice if such work would be paid :)
https://www.bountysource.com/issues/1685084-python-3-support
Bountysource

@JanmajaySingh
Copy link

@zedrem @tuxayo Considering that the development branch was last updated 9 months ago, I guess the primary contributors have been busy.

The dev branch in its current form works without issues (at least for me). You can refer to @markus-beuckelmann 's comment (March 6).

@JanmajaySingh
Copy link

@masaguaro I dunno if you could find a workaround to your issue, but I guess modifying the C source code to something like long long int might help. But it may break other modules in unexpected ways. I don't know any other details about your project though, you're better off asking on S/O.

@ash-williams
Copy link

Yea thanks, I've read the thread and understand that you can use the tool from the development branch. For what I needed, I was happy to even use the pattern3 side-project that was set up initially and then discontinued.

However, I really like patterns article extraction tool and want to incorporate it into another tool that I'm building. As far as i'm aware (?) there is no way to do that with pattern in its current condition. I'm guessing that you can't specify specific git branches in your requirements.txt for example?

If anyone is aware of any similar article extraction tools, please let me know (but I'm conscious that it is off topic for this thread).

@septian-putra
Copy link

septian-putra commented May 28, 2018

@zedrem For me, I can install it by running this

sudo apt-get install libmysqlclient-dev
git clone -b development https://github.com/clips/pattern
cd pattern/
sudo python3 setup.py install

@fabianhoward
Copy link

@zedrem You can certainly specify commits in requirements.txt such as git+https://github.com/clips/pattern@ec95f97b2e34c2232e7c43ef1e34e3f0dea6654b

As @septiangilang says on ubuntu you will need libmysqlclient-devas a requirement.

@tom-de-smedt
Copy link
Member Author

A lot of work was done by @markus-beuckelmann during last year's GSoC. During this year's GSoC, @Xsardas1000 (Maksim Filim) is doing great work (Markus & me are mentoring). Check Max' progress here: https://github.com/clips/pattern/tree/devmodified

We should be able to get out an "overall stable" official release by the end of the month, if everything goes well.

If you notice things that don't work yet, please report them here. Better yet, if you want to help out, please let us know, we can give you some editing privileges and author credits to move things forward more quickly.

As a side note, the documentation needs to move to a new location too (e.g., www.pattern3.net). Let us know if you'd want to contribute some web development skills to this end.

Thanks for your patience, we're nearing a stable release of Pattern 3.

@tales-aparecida
Copy link

Hy, I'm an undergrad student at Unicamp (Brazil) and got interested in helping this repo. I thought about starting with code coverage and found that there's some duplicated code at server/__init__.py, db/__init__.py and others, is this intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests