Connection object behavior within context managers #1294
Replies: 18 comments 2 replies
-
I was actually opposed to adding the context at all because it just confuses things. If I understand correctly, the functionality you are looking for can be accomplished without it like this: def somefunction():
cnxn = pyodbc.connect('mydsn')
cursor = cnxn.cursor()
cursor.execute("delete something")
cnxn.commit() If everything is successful, the code reaches the bottom and commits. If an exception occurs, both C Python uses both a reference counting scheme plus a cycle-detecting garbage collector so items that go out of scope and aren't referenced elsewhere are immediately deleted, unlike garbage collection in Java, Javascript, or lisp. There is no way this is going to be changed any time soon and an enormous amount of code takes advantage of that. (It is also what keeps Python programs from using so much memory, but it is slower than standard garbage collection.) I was careful to design the objects so cursors refer to connections but not vice versa so there are no cycles. With autocommit turned on, each statement is committed as it occurs, so there is no reason to use a context manager. Fun fact: the cursor has a reference to the connection and a commit method specifically so you can use a cursor without needing to manage the connection object also: def somefunction():
cursor = pyodbc.connect('mydsn').cursor()
cursor.execute("delete something")
cursor.commit() |
Beta Was this translation helpful? Give feedback.
-
I certainly agree that context managers can cause confusion because the functionality is deliberately hidden away behind the syntactic sugar. Hence, the onus should be on making the context manager functionality as simple and intuitive as possible. Right now that doesn't appear to be the case, especially to me. From what I can make out, it seems the current functionality is simply a way of getting around setting the autocommit flag to True. In effect, use a context manager and all your SQL statements are going to be committed, regardless of whether you set the autocommit flag or not. Although this is convenient for programmers in many cases, it is not at all obvious. As I mentioned in my original comment, who would intuitively think that: with pyodbc.connect('mydsn') as cnxn:
do_stuff actually means: cnxn = pyodbc.connect('mydsn')
do_stuff
if not cnxn.autocommit:
cnxn.commit() Not me. Even when it's spelled out, it seems odd, and very different from the way context managers work with files, etc. At the very least, it might be worth putting this in the docs. |
Beta Was this translation helpful? Give feedback.
-
I think the behavior needs to stay as it is. As you pointed out, it would be consistent with other database libraries. It would also be consistent with early copies of PEP 343. Note the example does exactly what everyone is doing. I want to reiterate that I feel they are completely unnecessary. I use pyodbc in multiple 24x7 servers in the financial industry and have never used a context manager with it, or with anything else like files. I gave examples of how you could write code without a context manager which is much clearer. (And that is what I meant about people getting confused - not the action of the context manager itself, but the fact that they are completely unnecessary in C Python today.) For example, this code from your post would never be written that way: with file('/tmp/myfile', 'w') as f:
f.write('hello\n') You would just do this: open('/tmp/myfile', 'w').write('hello\n') You'll see code like this all the time. (I know your example was shortened and there are usually more lines written, but stick with me.) Objects in Python are deleted automatically and the all "do the right thing" when they are deleted. Files close. Sockets close. Database connections rollback and close. In a longer function with more lines, it would look like this: def f():
fd = open('/tmp/myfile', 'w')
fd.write('1\n')
fd.write('2\n') The file will be closed as soon as the function is exited, either normally at the bottom or if an exception occurs. The only time it could be useful is in a loop where you catch exceptions. In that case you can eliminate a def f():
while x > y:
try:
data = wait_for_data()
with fd as open('filename', 'w'):
fd.write(data + '\n')
except:
logger.error('Error occurred', exc_info=True) In this case In my servers connections are primarily used in two patterns: In "task loops", a connection is created and used by each task in the queue. When an error occurs, the error is logged and the connection is closed as part of the cleanup. A new one is allocated and the process continues. This type of code is too complex for a The second type is a function allocates a connection, uses it, then commits, just like the examples from my previous post. These are very clean and easy. Connections and cursors are closed automatically: def somefunction():
cursor = pyodbc.connect('mydsn').cursor()
cursor.execute("delete something")
cursor.commit() I recommend giving this a try in a couple of places and see if it works for you. It is one of the things I greatly prefer about Python over Javascript and Java. Both of those languages can use context managers. (C++ and C# support object types on the stack that run destructors, so a simple "wrapper" object would be just as easy.) |
Beta Was this translation helpful? Give feedback.
-
I think I'm finally starting to understand what you mean about not needing context managers. As a database guy dealing with terabytes of data, and single database transactions that involve updating multiple table with multiple SQL statements, which in turn add gigabytes to the database transaction log, database transactions are kinda important to me. The data integrity of my database depends on it. Yet database transactions are not explicitly part of PEP 249. Sure, they are implicitly there, with pyodbc.connect() and cnxn.commit()/close(), but there is no explicit PEP 249 equivalent of: BEGIN TRANSACTION
UPDATE TABLE T1 SET ...
UPDATE TABLE T2 SET ...
UPDATE TABLE T3 SET ...
END TRANSACTION My assumption was that context managers would help manage this kind of behavior. Apparently this isn't the case. I was also concerned that cursors would maintain locks on tables or rows unless they were explicitly closed, and I didn't fully comprehend that cursors would definitely be closed (and the locks released) when they left the scope. Again, locking of table rows is a big deal for me. Thank you for taking the time to give comprehensive answers to my questions, Michael. It's definitely appreciated. |
Beta Was this translation helpful? Give feedback.
-
Hi, I know this is an old thread. Just wanted to say that the detailed answer was very helpful for me. I was worried that the connection may get closed on calling For implementing my connection pool, I wrote a Now, I do want/need all the things that pyodbc connection's So, basically, I count on Thanks, |
Beta Was this translation helpful? Give feedback.
-
I would rather see connections/cursors closed by context manager and commits are invoked explicitly than connections/cursors not closed and commits made implicitly. Main expectation from context managers is resource management and current design fails to manage connection/cursor. I've read that one might see it as a transaction context, not connection/cursors' one. But that doesn't make sense. One can look at cursor as a transaction, but not at connection. It also doesn't make sense to commit transaction on error. If there is an issue with backward compatibility, then a separate method could be used for that matter. Like sconnect. And scursor. Meaning safe or secure. |
Beta Was this translation helpful? Give feedback.
-
I'd also like to voice in on the unexpected behavior of pyodbc's context manager not closing connections. import pandas as pd, pyodbc
with pyodbc.connect(connection_string) as con:
test_df = pd.read_sql_query(query, con)
con.closed # False. pyodbc connection is not closed when leaving the context. @mkleehammer I understand why you were opposed to adding a context manager in the first place, and I agree that adding this context manager certainly adds confusion. You mentioned that in your typical work, you just depend on the connection to go out of scope and then relying invocation of the connection's destructor method to close the connection. I want to add that, if the connection object is created as part of a long-running job, then the connection object will not go out of scope and the connection will remain open. An example of this would be if the connection is created within a Jupyter notebook, which are very popular among data scientists, who also use SQL a lot. Example: A data scientist writing this in his notebook might assume he is safe to use this in his notebook to retrieve data from his SQL database: import pandas as pd, pyodbc
with pyodbc.connect(connection_string) as con:
test_df = pd.read_sql_query(query, con)
# (the data scientist continues working with the test_df results and forgets `con`, assuming the context manager will naturally close it, just like context managers typically do).
# (the notebook keeps running and the connection is kept open until the server closes it. 😯)
# (the database administrators get super angry/annoyed with our data scientist because he makes a lot of long-lived connections and never closes them. 😡) Honestly, the expected behavior for a context manager is that the context manager will take care of closing connections and handles, leaving the system in a "safe" state when the context is closed. Otherwise, what is the point of a context manager? @mkleehammer would you ever be open to adding another context manager to pyodbc that actually closes the connection when leaving the context? I know it is fairly trivial for each user to implement, but it just adds one more custom function or class that us data science users will have to import (or copy/paste) into our notebooks whenever we want to use an sql connection. And again, many users of pyodbc might not be aware that context-managed connections are not actually being closed when leaving the context. Example of user-implemented context manager: from contextlib import contextmanager
@contextmanager
def connect(connection_string):
con = pyodbc.connect(connection_string)
try:
yield con
finally:
con.close()
# Example usage:
with connect(connection_string) as con:
test_df = pd.read_sql_query(cmsobjectmodel_query, con)
con.closed # True. Connection is closed when leaving the context. For the record, SQL-Alchemy does close context-managed connections when leaving the context: import sqlalchemy, pandas as pd
engine = sqlalchemy.create_engine(connection_string)
with engine.connect() as con:
test_df = pd.read_sql_query(query, con)
con.closed # True. SQLAlchemy closes context-managed connections when leaving the context. However, adding SQLAlchemy to a project's dependencies, just to get a "sane" connection context manager seems a bit bloated. With all that said, I want to just say a quick thank you for your work on |
Beta Was this translation helpful? Give feedback.
-
I ran into almost this exact issue, with my explicit |
Beta Was this translation helpful? Give feedback.
-
I'll add my voice here, after dealing with the same issue this week. I was surprised and misled by this behavior, as were multiple other veteran engineers on my team. The revelation by @vl85 about immediate GC finalization not being reliable appears to unrecoverably dismantle the justification provided up to that point for rejecting this suggestion. I expect context managers to work like RAII. The hazard of reasonable assumptions creating a production incident due to misleading behavior is a far greater concern than whatever benefit is perceived to accrue from the "convenience" of using it to autocommit without autocommit. Too much sugar rots your teeth. I find design precedent from other Python ODBC managers unpersuasive, and would apply the same objection there. |
Beta Was this translation helpful? Give feedback.
-
I am open to both adding a new context manager or even changing the behavior of the old one if possible without breaking everything. I will say, this is the first time I've seen a very good example of when one could be useful - Jupyter Notebooks. In the past, the only way one could be open forever is if it is a global variable or you have a function you literally never leave. In those cases, I don't think it should surprise someone if it is still connected. In a sense, Jupyter notebooks are nothing but a script where everything is a global variable. Therefore decisions that make sense in a larger program might not make sense. That said, I would not characterize GC finalization as being unreliable. I've purposely ensured the internals don't create cycles the GC system can see for this reason. Unless you create a chain by adding attributes to a connection that somehow point back to the same connection, it should be safe if you are sticking with CPython. Now, what options are there?
Anything else? Any votes? |
Beta Was this translation helpful? Give feedback.
-
The basic question seems to be whether the context manager on the Connection class should either manage the connection (i.e. close it on exit) or manage a database transaction (i.e. commit on exit). Some people think the former, some people think the latter. Full disclosure, I'm very much in the former camp, which I think is more Pythonic. The current implementation of the context manager does the latter. Ref: I agree it's a bit of a pain to add a custom context manager in code specifically to manage a pyodbc connection but I just wanted to remind folks that there is a built-in context manager utility contextlib.closing that essentially does the job of closing the connection on exit, for example: from contextlib import closing
import pyodbc
with closing(pyodbc.connect('mydsn', autocommit=False)) as cnxn:
crsr = cnxn.cursor()
crsr.execute("UPDATE T1 SET ...")
cnxn.commit()
cnxn.closed # True |
Beta Was this translation helpful? Give feedback.
-
Well it is called |
Beta Was this translation helpful? Give feedback.
-
Just a thought, but one possibility is to add a context manager to pyodbc specifically to encapsulate a database transaction. Something equivalent to: from contextlib import contextmanager
@contextmanager
def transaction(*args, **kwargs) -> Cursor:
if kwargs.get('autocommit', False) is not False:
raise ValueError('Cannot set autocommit in the transaction context manager')
cnxn = connect(*args, **kwargs)
crsr = cnxn.cursor()
try:
yield crsr
cnxn.commit()
finally:
crsr.close()
cnxn.close() # includes a rollback under the hood The parameters for this context manager would be the same as for the with pyodbc.transaction('mydsn') as crsr:
crsr.execute("UPDATE T1 SET ...")
crsr.execute("UPDATE T2 SET ...") This would at least make it clear the context is a database transaction. |
Beta Was this translation helpful? Give feedback.
-
I've moved this from an issue to a discussion so it doesn't get lost. |
Beta Was this translation helpful? Give feedback.
-
I like the pyodbc.transaction (or txn? I'm pretty lazy at times) for those that want a commit context manager. In addition to the connection args, I would think it should work with an existing connection or cursor:
or
That would be very flexible and work with most program designs. |
Beta Was this translation helpful? Give feedback.
-
I also agree with Keith's suggestion that it would be more Pythonic to have the connection
The second half would be to design a way to transition the behavior from commit to close I don't know how many people even use the connection as a context manager. It may be a very What about something like this:
|
Beta Was this translation helpful? Give feedback.
-
I have a suspicion there is a element of misunderstanding about database transactions in the nexus between Python, SQL, and ODBC. Database developers are used to the idea of a database transaction as something you explicitly open, then you run some SQL queries, then you end the transaction with a commit. For example, in Transact-SQL: BEGIN TRANSACTION
UPDATE my_table SET ... ;
DELETE FROM my_other_table WHERE ... ;
INSERT INTO my_third_table VALUES ... ;
COMMIT TRANSACTION; The problem is that neither PEP-249 nor the ODBC 3.0 API have any explicit operation to open a transaction. They both have operations to commit a transaction (PEP-249, ODBC), but they don't have anything equivalent to that crucial "BEGIN TRANSACTION" SQL statement. This is where I think a lot of confusion comes from. I know I certainly took a while to get my head around that, and it prompted me to write a pyodbc wiki article on the subject. The issue is that, as far as PEP-249 and ODBC are concerned, transactions are never explicitly opened. Instead, a transaction is implicitly opened when a connection is made to the database (or after a commit/rollback). In other words, database transactions are effectively controlled via database connections (!). This is unintuitive to some database developers, including me, hence I can understand vl85's comment earlier in this discussion "One can look at cursor as a transaction, but not at connection.". Incorrect but understandable. When I started working with pyodbc 9 years ago, I too struggled with this at first. But this is the reality. If you want to start a brand new database transaction isolated from any other previous SQL statements (i.e. "BEGIN TRANSACTION"), the cleanest way to do this is by starting a brand new connection to the database. Getting back to my Anyway, just my thoughts. |
Beta Was this translation helpful? Give feedback.
-
That's a good point @keitherskine but I feel that a lot of people would think it a bit too cubmersome to force them to use a new connection. When using pyodbc on Windows with MSSQL in some large apps, I would use the ODBC connection pooling which was fast enough. However, not every ODBC environment has great pooling. I think most functions will either receive a connection and have to trust it is clean, or they'll already be making their own connetion and the point is moot. Fortunately, I think we can have the best of both worlds if we don't force a new connection. Once we get to the point that the cnxn commits in
Those that want to reuse a connection can use
|
Beta Was this translation helpful? Give feedback.
-
Python context managers for connection objects are typically used like this:
Right now in pyodbc 3.0.10, that seems to be the equivalent of:
Could we re-visit this functionality because I do have some reservations about that behavior, especially when autocommit is off:
To elaborate on those points:
which is equivalent to:
From my perspective, the point of a context manager is to make sure the context object is cleaned up when the context exits (typically by closing the object). I'm surprised pyodbc leaves the connection open, to be closed only when it is deleted.
Following on from 1), my understanding is that contexts should be essentially self-contained, and they should tidy themselves up during the exit process. In this case, for me that includes rolling back transactions if any exception has occurred. Instead, pyodbc leaves that up to surrounding code. Bear in mind, explicit rolling back would not be needed if the connection was closed, the database would do this anyway.
Looking at other Python odbc managers, it appears I am going against the tide on this, but I still feel strongly that the context manager should never issue an implicit commit when autocommit is off.
http://cx-oracle.readthedocs.org/en/latest/connection.html#Connection.__exit__
https://docs.python.org/3.4/library/sqlite3.html#using-the-connection-as-a-context-manager
http://initd.org/psycopg/docs/connection.html#connection.commit
https://pythonhosted.org/oursql/api.html#cursor-context-manager
https://dev.mysql.com/doc/connector-python/en/index.html
http://sourceforge.net/p/adodbapi/code/ci/default/tree/adodbapi.py
My preference would be that the pyodbc connection context manager would be just like the file context manager and doing nothing more than the equivalent of this:
I realize this would be a change in behavior, but right now, the connection context manager has behavior that makes it essentially unusable for me when I want to use a connection with autocommit off.
Finally, pretty much all of what I have said here is equally applicable to pyodbc cursor objects too.
Beta Was this translation helpful? Give feedback.
All reactions