Tuesday, May 4, 2010

Refcounting and weirdness (in Python and Go)

Reading Cliff Click's blog is always insightful. Thanks to his blog, I realized that a refcounted language would have a weird enough memory model, if it allowed real multithreading. Better yet: CPython folks still think there may be some way to remove the GIL and keep refcounting with its "nice properties", but here, we discover why they would get fundamentally braindead semantics.

Hey, this discussion could even be relevant about Google's new Go language, since they plan to use a (somehow smart) reference counting, as shown in their roadmap. The below discussion however still refers to CPython, since I never studied the details of deferred reference counting schemes. And it seems that Recycler (the algorithm they want to implement) does not suffer of this problem, luckily. Still, it is probably still interesting to see the interplay of refcounting with other features. And this probably affects the reference counting scheme used in COM.

Let's suppose a version of CPython using refcounting for garbage collection, but without a Global Interpreter Lock, is produced somehow. I assume that a decently performing implementation would use atomic counters for refcounts, rather than protecting those counts with a lock. Let's call it "GIL-less refcounted CPython".
Now, clearing a reference in GIL-less refcounted CPython (i.e. setting it to null) would not be atomic. In fact, suppose T1 does this, in a situation where another thread T2 can access the pointer. T2 can read the pointer just before you clear it, and increment the refcount of the pointed-to object O1 just after T1 deleted O1 and its memory was reused for another object O2 (so no, you don't see a zero refcount, you just corrupt something unrelated). Possibly, those other threads may even try to modify O1 and end up modifying O2.

Quoting from its post:

"Why is ref-counting hard to use in managing concurrent* structures?"

The usual mistake is to put the count in the structure being managed. The failing scenario is where one thread gets a pointer to the ref-counted structure at about the same time as the last owner is lowering the count and thus preparing to delete. Timeline: T1 gets a ptr to the ref-counted structure. T2 has the last read-lock on the structure and is done with it. T2 lowers the count to zero (T1 still holds a ptr to the structure but is stalled in the OS). T2 frees the structure (T1s pointer is now stale). Some other thread calls malloc and gets the just-freed memory holding the structure. T1 wakes up and increments where the ref-count used to be (but is now in some other threads memory).

Since in CPython refcounts are embedded in objects (and it can't be done in other ways), this problem applies.
*I think he really means, in the title, lockless structures, otherwise you can just use a lock to clear the pointer and free the object atomically wrt. readers. And that's the way you'd solve this in CPython: protect any pointer reachable by other threads with a lock.

While the above reasoning just says "hey, if you keep refcounting, removing the GIL will give weird semantics", there is more to notice. Namely, the Python program would probably segfault, or experience undefined behavior. And that's not supposed to happen in a managed language, no matter how buggy is your script, unlike in C. I.e. Python is supposed to be a safe language.

Contrast this with the Java Memory Model. The JMM guarantees that reference updates are atomic even on 64bit platforms, so clearing a pointer is always atomic. Additionally, concurrency mistakes never cause totally undefined behavior.
This is inherited even by the Jython memory model, and in absence of documentation, Jython users know they can rely on this both on Jython and on CPython. Our GIL-less refcounted CPython would break this.

The same scenario could also happens when making the reference point to a different object, but doing this without a lock is not always safe. For instance, to make sure that "this.o = new C();" is safe, the field 'o' must be volatile, or the access must be locked, to ensure that the writes initializing the new object are visible before the write to the field 'o'.