Blaisorblade programming thoughts: Stereotypes, averages and benchmarks

While reasoning on the idea of stereotypes and how useful they are to understand a different culture, I realized that a stereotype, together with problems in using it, is at least as as bad as an average over a diverse set (here we'll ignore the fact that it's also a judgement by a culture on another culture, often presuming without reason that). And as we know,

And after realizing this, one sees connections with many issues in research and in Computer Science. For instance, averages of different benchmarks. It's often fine to benchmark two different algorithms on some data sets: if the algorithm domain is simple enough, the variation on different inputs is small, and the input can anyway be characterized by a small number of features. Think of sorting algorithms: the inputs you pass them can be random, Gaussian, ordered, reverse-ordered, and that's it more or less. OK, I'm oversimplifying a bit, I guess. But in other cases, the input space has a much higher number of dimensions.

And then, even worse, benchmarks about programming languages. Not only we have all these problems about sampling the input space, but in this case the input spaces of two language implementations, for different languages, are not the same, and they don't match in a meaningful way: they are not isomorphic, just like spoken languages. In other words, there is not a unique way of translating a program, and when you do, there's no way to make the pieces match one-by-one. There are infinite ways to solve a problem in any Turing-complete language (and a not-so-small number of reasonable alternative), and an intrinsically different infinity of ways for another language. And maybe, your translation is too slow because it's not appropriate for that language, and you should write the same program in a more idiomatic style.

The same concept is expressed here in a less abstract way.

And in this situation, not only we have actual benchmarks about programming languages, but even performance myths for or against Java, C#, Python, C, and so on. I.e., stereotypes, once again, about languages. And this time, mostly false.

For instance, we could talk of people thinking that Java is slow and Python is more lightweight, while it's actually the other way around, as long as we speak of throughput. What those people think is only true about startup latency, and only partially about memory consumption (Python has prompter garbage collection because of refcounting, but its object layout could benefit some huge improvement). And now in this example, we see that not only the input space is high-dimensional, but that even the performance space cannot be characterized by a single number. To compare memory consumption, we need to give a function of the used memory for a given object set, for Java and for Python. And the object sets are, again, not isomorphic! We're over.

Trying to sum up too much any result is going to give us lies. We can't help it; we should stop asking for simple answers to hard questions, and for silver bullets, and for a lot of other easy things, and go instead to work hard and enjoy the results.

Blaisorblade programming thoughts

Wednesday, June 9, 2010

Stereotypes, averages and benchmarks

No comments:

Post a Comment