Concurrent programming was a major theme at Microsoft's recent Professional Developers Conference. We all know the reasoning: processors are no longer getting faster, but multiple processors are now commonplace. Even my desktop PC is a quad-core. However, having multiple processors is no guarantee that applications will run faster. For that to happen, the application has to be coded with concurrency in mind. Therefore, if we want to write fast applications, we have to learn concurrent programming.
Unfortunately concurrent programming is hard. Think deadlocks, synchronisation, non-determinant, race conditions, and so on. Fortunately, Microsoft is good at making hard things easy. The original Visual Basic transformed the coding of a graphical user interface from something intricate done in C or C++, to something anyone could do with a few clicks of the mouse.
Now Microsoft aims to do something similar for concurrency. The .NET Framework 4.0 incorporates the Parallel FX Library, which does a great job of simplifying multi-threaded development. The new Task Parallel Library is smart about how many processors it finds at runtime, and spins the right number of threads to take advantage of them. PLINQ lets you easily apply parallelism to LINQ queries. Daniel Moth does a great job of demonstrating the benefits in his session at PDC, which you can watch online; and if you work with .NET I highly recommend it.
The worrying aspect is that while Microsoft is making concurrency easy, it is not making it safe. When I was playing around with Visual Studio 2010 and .NET Framework 4.0, one of the first things I did was to look at my code for counting primes and to see how I could optimize it. I found I could do so easily by changing a For loop to use Parallel.For instead, one of the features of the new library. I got a nice speed boost on a two-core machine, not quite double, but nearly so. One snag: the result changed every time I ran it.
I soon figured out what was wrong. My code declares a numprimes variable, then increments it within the loop. Everything is fine when single-threaded, but if you have two threads running the loop in parallel, they might both increment the variable at nearly the same time. That means reading the value, incrementing it, and writing it back. Occasionally this happens:
Result: the value is one less than it should be. The fix is to lock the value first, or to find a better approach; this exact problem is discussed here (scroll down to Aggregation).
Bugs are nothing new in programming, but concurrency bugs are particularly tiresome. The code may work fine on some machines; in my example, the bug would not occur on a single-core PC. If your loop is a little less tight than a prime number calculator, it might be that the odds of the bug occurring at all are rather small. It could even pass all your unit tests. Deploy the application though, and you can bet that wrong results will soon crop up with the usual potential for calamity.
Haven't we already had, and survived, this problem with previous abstractions like BackgroundWorker, a class in .NET Framework 2.0 that makes it easy to push some code into a background thread? True to some extent, but BackgroundWorker is less dangerous because it is typically used more for keeping an application responsive, than to increase performance with parallel threads on a multi-core system.
The bottom line is that while I am full of admiration for the work Microsoft has done with the Parallel FX, I have a nagging worry that as more and less skilled programmers are encouraged to do this, we may be introducing a new wave of buggy code, as argued in detail by Edward Lee in his paper The Problem with Threads [PDF]. The hardware trend is real though, so I suspect greater concurrency in everyday business applications is coming whether we like it or not.
The other conclusion is that before using something even as apparently simple as Parallel.For, developers have a responsibility to learn about the pitfalls as well as the benefits.