1.4k
views12
comments

Hi,

I have two versions of the same function, one in functional style (as best I can) and one in imperative. The function finds the maximum of a given function by trying a discrete number of points.

When I time these functions with nx = 20000000, the functional version takes 1043 ms and the imperative version takes 859 ms.

My question is - is there a way that I can write the functional version so that it will show better performance?

let getMaxFunc f (minX, maxX) nx = 
    let dx = (maxX - minX)/float (nx - 1)
    
    //this function is tail recursive
    let rec findMax i cmax =
        let s = minX + (float i)*dx
        let a = f(s)
        let maxA = max cmax a
        match i with
        | _ when i = nx - 1 ->
            maxA
        | _ ->
            findMax (i+1) maxA
            
    findMax 0 0.0

let getMaxImp f (minX, maxX) nx y = 
    let dx = (maxX - minX)/float (nx - 1)
    
    let mutable maxA = 0.0;
    for i = 0 to nx - 1 do
        let s = minX + (float i)*dx
        let a = f(s, y)
        maxA <- max maxA a
        
    maxA

My question is - is there a way that I can write the functional version so that it will show better performance?

When I run in release without the debugger attached, it seems like the functional version wins by a little.

When I run in debug with the debugger attached, the match statement seems to be a little bit slower than an equivalent if-else statement.

By none on 3/11/2009 10:07 AM (permalink)

I copied your code into F# and did the exact same test, and for me the functional version took 1015ms and the imperative version took 1031ms. I ran each one 10 times and got different results each time, but on average they were almost identical

The first thing I encountered though was that the f in the imperative version is not the same as the f in the functional version. They take different numbers of arguments. Are you testing with the same function? If the function you're using for f in the functional version was passed in by binding the other argument beforehand with a partial function application then I suppose there should be no difference, but you might also try running each one multiple times and making sure the average is what's off, because it was more or less identical for me.

By divisortheory on 3/11/2009 8:51 AM (permalink)

Thanks. Yes, I removed an argument before posting to simplify it. I meant to do this with both versions but forgot.

Anyway, my orignal timings were in a Visual Studio debug build, which is non-optimized I guess. I've now run it in a release build and I got the same results as you.

More generally, is there any reason to think that pure functional programming should be faster or slower than imperative, any particular situations where it excels? Without having much experience with it, it seems that the best it could do is be equal in speed to an equivalent imperative program, but perhaps there are optimizations that the compiler does that lead to some shortcuts.

By Pamphillian on 3/11/2009 9:49 AM (permalink)

Regarding whether or not functional languages can compete with imperative languages, I don't think F# is mature enough yet to be able to compete with functional (or imperative) languages that have been around for a really long time, but a good algorithm written in a functional language using a really good compiler can be light years faster than an "equivalent" imperative program, for lots of reasons (it can also be slower). Take a look at this page:

[link:shootout.alioth.debian.org]

Some of the benchmarks for Haskell GHC are right up there. In fact, the website originally had to redesign all of their tests because of Haskell GHC. It was beating even the best languages by orders of magnitude, like tenfold due to laziness. Laziness is one aspect that leads to higher performing code, another is parallelism. F# isn't purely functional, but in a purely functional language (where there are no side effects anywhere, no state variables) the compiler can reason very deeply about, and do complex data flow analysis on your code to prove certain things about it, leading to things like automatic parallelization of large portions of code, among other things. In an imperative language, when you're sharing data across multiple threads, you have to synchronize access to that data, which by definition means someone is going to be waiting idly to access the data when they could be doing something useful. If your program is completely stateless, there is no shared data, and nothing to synchronize. Things can automatically run on different cpus without the need to worry about it.

Also, one last example, consider the case in an imperative language where you have a linked list and you want to make a copy of it for someone else to use. So you have A = [3, 4, 5, 6, 7, 8, 9, 10]. someone makes a copy of it, and deletes the first item, and replaces it with its negative and calls this new list B. so now there's 2 lists, A = [3, 4, 5, 6, 7, 8, 9, 10], and B = [-3, 4, 5, 6, 7, 8, 9, 10]. More than likely you've allocated 16 different nodes in memory if you're doing this in an imperative language. In F#, as in most other functional languages, there are only 9 different nodes allocated. There's the list K = [4, 5, 6, 7, 8, 9, 10], and then two other lists, [3] -> K, and [-3] -> K. The tails use the same actual memory, and the reason this is safe is because they're immutable, it's provably safe because there's no chance someone will ever end up with unexepcted changes to the middle of the list, since it's not modifiable in the first place.

Oh, one more example. Suppose you're trying to find all right triangles with integer sides whose perimeter is less than or equal to 750. Who knows how you'd do this in C or C++, it wouldn't be super difficult or anything, but on the other hand it wouldn't be trivial. Here's the code in Haskell.

[ (x,y,z) | x>0, y>0, x^2+y^2==z^2, x+y+z<=750 ]

(untested).

But look at how much work the compiler is doing for you. The more work the compiler does for you, the more freedom it has to pick the best possible way to do it.

By divisortheory on 3/11/2009 1:25 PM (permalink)

Some of the benchmarks for Haskell GHC are right up there. In fact, the website originally had to redesign all of their tests because of Haskell GHC. It was beating even the best languages by orders of magnitude, like tenfold due to laziness. Laziness is one aspect that leads to higher performing code, another is parallelism. F# isn't purely functional, but in a purely functional language (where there are no side effects anywhere, no state variables) the compiler can reason very deeply about, and do complex data flow analysis on your code to prove certain things about it, leading to things like automatic parallelization of large portions of code, among other things. In an imperative language, when you're sharing data across multiple threads, you have to synchronize access to that data, which by definition means someone is going to be waiting idly to access the data when they could be doing something useful. If your program is completely stateless, there is no shared data, and nothing to synchronize. Things can automatically run on different cpus without the need to worry about it.

That was a long list of serious misconceptions.

Firstly, Haskell looks artificially good on the shootout tests because they are flawed benchmarks. Specifically they are trivially reducible in ways that real software is not. Other benchmarks have already proven that Haskell cannot provide competitive performance. Moreover, you need someone with a PhD in writing optimizing Haskell compilers just to get close.

Secondly, Haskell looks artificially good on the shootout because Haskell's authors optimized the Haskell compiler and its libraries specifically for the shootout in order to make Haskell look artificially good.

Thirdly, laziness is extremely detrimental to performance because it makes time and space consumption wildly unpredictable. There are many practical examples of "laziness gone bad" in this context. A Burrows-Wheeler block sorting data compressor that was optimized for two weeks by expert members of the Haskell Cafe mailing list only to be left 10,000x slower than a C alternative. The Darcs version management software was the only software ever written in Haskell to garner a significant userbase but is now falling out of favor because it is unusably slow (even the developers of the Glasgow Haskell itself compiler dropped it because the alternative, written in Python, was an order of magnitude faster).

Fourthly, laziness seriously undermines parallelism because it introduces lazy thunks everywhere that can be mutated from unevaluated expressions to evaluated values simultaneously from multiple threads, i.e. it creates billions of race conditions around almost every subexpression evaluated in the entire lifetime of a program. To get an idea of how detrimental this is, just read the latest academic paper about GHC's new parallel garbage collector. Their technology is far behind .NET's concurrent GC and they are already reaching the limits of feasibility (they developed a concurrent GC but shelved it because it was unusably slow). Moreover, the hacks they had to use to get even today's poor performance completely undermine reliability. In particular, they document subtle bugs that introduce infinite memory leaks in the garbage collector (!).

Finally, automatic parallelism has been studied intensively and every single working implementation was a complete failure. The idea that purely functional programming magically makes parallelization trivial is just complete nonsense. Haskell provides combinators for informally guiding the compiler with regard to parallelism but it is a toy compared to the state-of-the-art wait-free work-stealing queues already shipping in Microsoft's Task Parallel Library.

Haskell does have some merits but performance and parallelism are certainly not among them.

By jdh30 on 3/28/2009 8:07 PM (permalink)

[link:shootout.alioth.debian.org]
Some of the benchmarks for Haskell GHC are right up there. In fact, the website originally had to redesign all of their tests because of Haskell GHC. It was beating even the best languages by orders of magnitude, like tenfold due to laziness.

No, that is not correct.

Rather than repeat myself, please read these relevant posts.

By Isaac Gouy on 3/11/2009 10:41 PM (permalink)

[link:shootout.alioth.debian.org]
Some of the benchmarks for Haskell GHC are right up there. In fact, the website originally had to redesign all of their tests because of Haskell GHC. It was beating even the best languages by orders of magnitude, like tenfold due to laziness.
No, that is not correct.
Rather than repeat myself, please read these relevant posts.

Well that's interesting, for reference my source was here (which is mentioned in that thread anyway, but here it is for reference). But all I really said is that it was "beating" other languages. If the means by which it was beating them is by throwing away unnecessary work that the other languages weren't throwing away, that still exactly highlights one of the advantages of functional languages. Anyway apparently this issue is a fairly tense one, which I had no idea about so I'll just drop it :)

I was really just trying to illustrate the point that functional languaes are intelligent about dropping unnecessary work.

By divisortheory on 3/12/2009 8:15 AM (permalink)

[link:shootout.alioth.debian.org]
Some of the benchmarks for Haskell GHC are right up there. In fact, the website originally had to redesign all of their tests because of Haskell GHC. It was beating even the best languages by orders of magnitude, like tenfold due to laziness.
No, that is not correct.
Rather than repeat myself, please read these relevant posts.
Well that's interesting, for reference my source was here (which is mentioned in that thread anyway, but here it is for reference). But all I really said is that it was "beating" other languages. ...

Didn't you say "In fact, the website originally had to redesign all of their tests because of Haskell GHC."? :-)

Didn't you say " It was beating even the best languages by orders of magnitude, like tenfold due to laziness." ? :-)

The measurements are still available - the lazy Haskell program was slower than a Java program that set an initial heap, the lazy Haskell program took 3.07s compared to 3.79s for a C program that actually followed the rules.

You'll notice there are half a dozen C and C++ programs also considered not to have followed the rules which only took 0.1s - 0.6s.

By Isaac Gouy on 3/12/2009 9:54 AM (permalink)

You'll notice there are half a dozen C and C++ programs also considered not to have followed the rules which only took 0.1s - 0.6s.

I make mistakes too.

I copied the wrong column - there are half a dozen C and C++ programs also considered not to have followed the rules which only took 0.26s - 1.60s compared to 3.07s

By Isaac Gouy on 3/13/2009 11:14 AM (permalink)

I think it is true that, in general, it is easy in lazy languages to write code that you think will spin the processor. But, it will simply be optimized away.

My guess is that this kind of cruft bogs down many large scale imperative programs. An easy (and small) example is a stack of C functions that takes the same pointer, and every function checks

if( p == NULL )

all the way through the stack.

Thanks for the insight. ...to bad it was on reddit ;)

By none on 3/12/2009 7:39 AM (permalink)

Ahh, but in an imperative style if you didn't want a copy you can add a node to the front of linked list with just one node allocation. Similarily you can add to the end of the linked list with one allocation as well. Whereas in a functional style, to add a node to the end of the list, even if I want a copy, you have to reallocate the entire list. So it's a two-way street.

By gneverov on 3/11/2009 3:35 PM (permalink)

Definitely a two way street, if it were a one way street there'd be provably no reason to use imperative languages :) Yea you can always add a node to the front of a linked list in an imperative language, but the only point I was really trying to make is that in a functional setting the same memory is re-used by default, whereas in an imperative setting, you have to actively think about whether or not it makes sense use the same reference in multiple places, and often times it doesn't.

By divisortheory on 3/11/2009 5:04 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags