(or: Re-entrant mutexes considered harmful)
A common method of protecting data from race conditions in an object-oriented system is to use a simple mutex. The pattern goes something like this:
- Call some access or modification method on an object
- On entry to the method, mutex (associated with the object) is obtained
- … method does its stuff
- Mutex is released
- Method returns
Java, for instance, provides built-in support for this particular sequence with the “synchronized” keyword. A method tagged as “synchronized” automatically obtains the mutex on entry and releases it on return, which removes a lot of the burden from the programmer who otherwise has to make sure that the mutex is properly released on every code path (including exceptions). This works fine in a lot of cases. Java’s mutexes are also re-entrant meaning that one synchronized method can call another one in the same object without causing instant deadlock, even though they share the same mutex (the mutex is associated with the object, not the method). However, “synchronized” is not suitable as a general purpose mutex, and in fact its re-entrancy can lead to bad design.
Consider a design which uses the listener pattern. That is, some class has a modifier method which modifies data and notifies a number of listeners of the change (via some listener interface). How could such a design be made thread-safe? it might be tempting to mark the modifier method as synchronized, and in many cases this would work fine. If a listener needs to access or modify data, it can do so, and the re-entrant nature of “synchronized” means no deadlock will occur. However, this approach is a bad design, for subtle reasons, which basically boil down to this: a mutex should not necessarily prevent data access or modification by threads which do not hold the mutex.
Huh, you may be thinking, what’s this guy smoking? That’s precisely what a mutex is for. But that’s not exactly what a mutex is for; a mutex is a mechanism to allow threads to co-ordinate access to data (or some resource) in order to avoid race conditions, but it is not specifically meant to limit resource access to the single thread holding the mutex. In particular, if the thread holding the mutex is waiting on a mutex held by a second thread, and the second thread is operating under the assumption that this is the case, then it should be fine for that thread to access the resource that is protected by the mutex – clearly there can be no race condition, since the only other thread that otherwise has any right to access the resource (i.e., the thread that holds the mutex) is blocked.
If you think this argument is absurd, consider two cases in Java where this situation can come about:
- An arbitrary thread needs to modify the data (the “model”) of a Swing component, and needs to do so synchronously, from within a listener callback which is called with a mutex protecting an object “O” held. To modify the model safely the listener needs to use EventQueue.invokeAndWait() or equivalent. The code invoked on the dispatch thread also attempts to access the object “O”. Bang – deadlock.
- Again, a listener callback is called with a mutex held. This time the listener calls a method on a remote object (in a second Java VM) via RMI. The remote invocation calls back in to the first VM and attempts to obtain the (supposedly re-entrant) mutex, however, due to the RMI implementation, is now running on a different thread in the first VM. Bang – deadlock.
The obvious workaround – invoking the listener callback(s) outside of a synchronized block, i.e. without the mutex held, does work; but it leaves the possibility that the data is modified between the modification event and the listener being notified, which can be undesirable. The best solution from a design point-of-view is to move the acquisition and release of the mutex outside of the responsibility of the object being protected by it. Thus the sequence at the start of the would be changed to:
- Acquire mutex
- Invoke method
- … method does its thing
- Method returns
- Release mutex
Also, it should be assumed (even enforced) that the mutex is not re-entrant, to allow for the case where the executing thread is not the one which actually acquired the mutex. This is more complicated, and places more burden on the programmer – which is not desirable, especially when dealing with concurrency – but it is a better overall design. For one thing, it allows a solution for the two problems outlined above; secondly, it separates concerns – why should an object worry about synchronising if it may or may not be used in a multi-threaded system?
Ideally there’d be some language support for this sort of design too. It would be nice if there was a way to tag that a specific object should only be touched when an associated mutex was held, and have static analysis to determine when such a rule was being broken. Unfortunately such things are not yet readily available/usable, at least not as far as I’m aware.
In conclusion: be careful when using “synchronized” or re-entrant mutexes; consider using non-re-entrant mutexes instead.