3.3k
views24
comments

F# Excessive Consing Killing Performance

I am evaluating F# and am using a Puzzle solving app written in C# as a benchmark. I've translated the C# version into F# making only a few changes -- mostly to get rid of mutated variables. The data structures themselves are pretty much the same; i.e., linked lists, fixed length arrays. I also converted one c# class into a tuple since it was small and had no non data members.

What I'm seeing is that the F# version performs MUCH MUCH worse than the c# version. What appears to be an (the? underlying cause is the order of magnitude increase in GCs when the f# version runs. It's not clear (to me) what particular F# constructs are contributing to this, but the increase is so dramatic that I'm hoping I'm just doing something stupid.

So, my question is, how might I go about isolating the source of all this consing -- or am I simply doomed to take the performance hit so that I might enjoy the functional approach?

Thanks in advance,

Bill

Can you share the code?

If you do the same thing in F# as in C#, you should see roughly the same performance, so this does sound like a data structure/algorithm change. As someone else mentioned, if you switch from C# Lists to F# lists and do appends (@), you will be in trouble, this is a common perf error.

As for diagnosing, given that you have a performant 'reference algorithm' in the C# code, you may just be able to apply various functions to various size data and measure to see e.g. that in the C# version, function Foo is O(N^2) when passed a list of size N, whereas F# function Foo is O(N^3), and use that to narrow down the source of the problem to a particular function. All of the usual strategies for tracking down perf problems in C# should apply to F# as well.

By brianmcn on 11/23/2008 10:44 PM (permalink)

Brian

OK, I've culled the binaries out of the Zip files and am attaching the F# version to this message. C# version to follow in my next posting.

Bill

Can you share the code?
If you do the same thing in F# as in C#, you should see roughly the same performance, so this does sound like a data structure/algorithm change. As someone else mentioned, if you switch from C# Lists to F# lists and do appends (@), you will be in trouble, this is a common perf error.
As for diagnosing, given that you have a performant 'reference algorithm' in the C# code, you may just be able to apply various functions to various size data and measure to see e.g. that in the C# version, function Foo is O(N^2) when passed a list of size N, whereas F# function Foo is O(N^3), and use that to narrow down the source of the problem to a particular function. All of the usual strategies for tracking down perf problems in C# should apply to F# as well.

By Bill_Cohagan on 11/24/2008 6:07 PM (permalink)

Brian
Here's the C# version...

Bill

Brian
OK, I've culled the binaries out of the Zip files and am attaching the F# version to this message. C# version to follow in my next posting.
Bill
Can you share the code?
If you do the same thing in F# as in C#, you should see roughly the same performance, so this does sound like a data structure/algorithm change. As someone else mentioned, if you switch from C# Lists to F# lists and do appends (@), you will be in trouble, this is a common perf error.
As for diagnosing, given that you have a performant 'reference algorithm' in the C# code, you may just be able to apply various functions to various size data and measure to see e.g. that in the C# version, function Foo is O(N^2) when passed a list of size N, whereas F# function Foo is O(N^3), and use that to narrow down the source of the problem to a particular function. All of the usual strategies for tracking down perf problems in C# should apply to F# as well.

By Bill_Cohagan on 11/24/2008 6:13 PM (permalink)

---

By helium on 11/25/2008 8:07 AM (permalink)

Helium

Strangely your posting (see below) appears to be content free; however the email notification I got of your posting asked about my use of "match" in places where a simple "if" would have worked, so ...

The answer is simply ignorance. At the time I wrote that code I had the impression that all conditionals took a "match" form. Later, when I discovered the existence of "if", I also read that the two forms were "equivalent"; thus didn't bother to change back to if. I agree however that using "if" in these situation does make the code easier to scan. I'll just say in my defense that in my past I wrote a lot of Lisp and Scheme and my previous exposure to "cond" made the use of "match" seem not quite as awkward in these cases.

---

By Bill_Cohagan on 11/25/2008 9:43 AM (permalink)

Are there performance penalties associated with using match instead of if? I personally find match statements much easier to read than if statements. In the case where it's equivalent, i.e.

match n with
| 4 -> 0
| _ -> n

you would think the compiler would be smart enough to produce the exact same code as an if statement.

By divisortheory on 11/25/2008 10:30 AM (permalink)

Reflector is an excellent tool if you want to know what code the compiler produces.

let f n =
    match n with
    | 4 -> 0
    | _ -> n

public static int f(int n)
{
    switch (n)
    {
        case 4:
            return 0;
    }
    return n;
}

By on 11/25/2008 2:47 PM (permalink)

Here is one huge bit; I did not look for others.

// you had this, .RowColumns re-evaluates a Seq and ToArrays it every time
  type  Puzzle() =
    static member PuzzleSize = 4
    static member RowColumns = [| for r in 0 .. Puzzle.PuzzleSize - 1 do for c in 0 .. Puzzle.PuzzleSize - 1 -> (r, c) |]

// change to this, only evals once
  type  Puzzle() =
    static let rc = [| for r in 0 .. Puzzle.PuzzleSize - 1 do for c in 0 .. Puzzle.PuzzleSize - 1 -> (r, c) |]
    static member PuzzleSize = 4
    static member RowColumns = rc

BTW, I found this just by running in the debugger and hitting 'break all' a few times and noticing I was always inside Puzzle.RowColumns inside LowerBoundOnScore.

By brianmcn on 11/24/2008 7:15 PM (permalink)

Brian

I made the suggested change, not only for the PuzzleSize property, but for the other static properties as well. Making these changes brings the GC counts back into the same neighborhood as the C# version (in fact slightly better than the C# version.) Unfortunately the performance is still significantly worse in the F# version. I'm attaching my spreadsheet which shows the performance and GC activity for each version. The topmost F# version (revision 1759) reflects the described changes.

Since <i>LowerBoundOnScore<i> appears to be the hotspot I'm guessing that my recasting of that into a more functional form (using Seq.fold and a functional arg for the scoring strategy) may be introducing overhead relative to the original C# version (within which the 2 strategies are hard coded with some what-if-yes-but code logic.) Anyway, that'll be my next attempt at isolating the difference. Thanks again for the help in observing the "implicit get" problem. I could have stared at that code forever and wouldn't have discoverd that! Bill <quote> Here is one huge bit; I did not look for others. ... </quote>

By Bill_Cohagan on 11/26/2008 8:49 AM (permalink)

Brian

Thanks; however this is confusing to me. That RowColumns is reevaluated each time it is referenced implies that RowColumns is a function value, but it is typed (by the F# compiler) as (int * int) [] which is not a function. So, what am I missing?

BTW, LowerBoundOnScore is in fact the "hotspot" for this algorithm in the C# version so I'm not surprised that it appears to be in the F# as well. This doesn't address the consing however. Perhaps I've misunderstood something about RowDolumns though. I'll do some experimenting.

[Added later] I've tried your suggestion and it does help quite a bit -- so I remain confused. What am I missing on the static member evaluation?

Bill

Here is one huge bit; I did not look for others.

// you had this, .RowColumns re-evaluates a Seq and ToArrays it every time
  type  Puzzle() =
    static member PuzzleSize = 4
    static member RowColumns = [| for r in 0 .. Puzzle.PuzzleSize - 1 do for c in 0 .. Puzzle.PuzzleSize - 1 -> (r, c) |]

// change to this, only evals once
  type  Puzzle() =
    static let rc = [| for r in 0 .. Puzzle.PuzzleSize - 1 do for c in 0 .. Puzzle.PuzzleSize - 1 -> (r, c) |]
    static member PuzzleSize = 4
    static member RowColumns = rc

BTW, I found this just by running in the debugger and hitting 'break all' a few times and noticing I was always inside Puzzle.RowColumns inside LowerBoundOnScore.

By Bill_Cohagan on 11/24/2008 7:48 PM (permalink)

It's a property (with an implicit getter).

type Foo() =
    static member SomeProp =
        printfn "hi"
        42

let a = Foo.SomeProp         
let b = Foo.SomeProp         
let c = Foo.SomeProp         
printfn "done"

By brianmcn on 11/24/2008 7:53 PM (permalink)

Brian

Thanks for the explanation. Now I understand. I really missed the "implicit getter" thing entirely. I think I'm still going to have a large difference between the c# and f# versions, but this'll help a lot.

Thanks again,

Bill

It's a property (with an implicit getter).

type Foo() =
    static member SomeProp =
        printfn "hi"
        42

let a = Foo.SomeProp         
let b = Foo.SomeProp         
let c = Foo.SomeProp         
printfn "done"

By Bill_Cohagan on 11/24/2008 8:06 PM (permalink)

BTW, the "member x.Foo = .." syntax for writing properties is actually a shorthand for writing the following:

 
type A() =
  member x.Foo with get() =
    printfn "hello..."
    42

This should explain the behavior you're seeing!

By Tomas Petricek on 11/25/2008 3:29 AM (permalink)

Brian

OK, so here's the code. There are 2 zip files, each a complete folder structure from a VS2008 solution. The P1 version is the F# version. If you look in the performance.xlsx spreadsheet (in the P1 version) you'll see the comparative runtimes (in ms) and gc counts for each collector generation. The timing and gc differences are pretty dramatic, more than 10x.

OOPS! Apparently there's a 64K limit on uploads so I can't upload either of the Zip files. Each zip is ~310K in size. If you'll send a private message with your email address I can send 'em that way.

Regards,

Bill

Can you share the code?
...

By Bill_Cohagan on 11/24/2008 1:11 PM (permalink)

310k is pretty heavy for a bunch of compressed text files. Did you include any of the binaries VS generates when you build? output EXE, PDB, etc? Another thing that would be useful is to comment out certain areas of the code, or certain features of the algorithm, and watch how memory usage changes. Then you can pinpoint what part of the algorithm is causing it, and you'd only have to post a small code fragment that reproduces the problem.

By divisortheory on 11/24/2008 4:36 PM (permalink)

Good point. I'll check the zip files as they probably do contain some binaries. As for commenting out parts of the algorithm I'll think about that, but it's difficult because it's a tree search.

Bill

310k is pretty heavy for a bunch of compressed text files. Did you include any of the binaries VS generates when you build? output EXE, PDB, etc? Another thing that would be useful is to comment out certain areas of the code, or certain features of the algorithm, and watch how memory usage changes. Then you can pinpoint what part of the algorithm is causing it, and you'd only have to post a small code fragment that reproduces the problem.

By Bill_Cohagan on 11/24/2008 5:54 PM (permalink)

Brian

Yes I can share the code. Should I just post it as an attachment here or is it better to email it? Either is fine with me. BTW, I don't do any @, just ::

FYI, the puzzle in question is called "NPuzzle" and is the N x N (typically N = 4) puzzle consisting of NxN-1 numbered tiles in a frame. The object is to sort the tiles by sliding tiles into the empty space. The goal is to do the sort with a minimum number of such "moves". So, in the abstract this is just a (very large) game tree problem and one can (in theory) walk the entire tree and pick the shortest path from root to leaf as the solution. There are several refinements that limit the tree search and allow for much more efficient solution. These are in the form of "scoring" algorithms that allow you to get a lower bound on a solution from any node of the game tree.

Thanks,

Bill

Can you share the code?
If you do the same thing in F# as in C#, you should see roughly the same performance, so this does sound like a data structure/algorithm change. As someone else mentioned, if you switch from C# Lists to F# lists and do appends (@), you will be in trouble, this is a common perf error.
As for diagnosing, given that you have a performant 'reference algorithm' in the C# code, you may just be able to apply various functions to various size data and measure to see e.g. that in the C# version, function Foo is O(N^2) when passed a list of size N, whereas F# function Foo is O(N^3), and use that to narrow down the source of the problem to a particular function. All of the usual strategies for tracking down perf problems in C# should apply to F# as well.

By Bill_Cohagan on 11/24/2008 7:49 AM (permalink)

Yes you are perfectly right - the @ - Operator is a performance killer (as far as I know this is no big brainer - a C# List is build as a linked list so consing is a O(1) Operation - F# lists are of the form (head, tail) where tail is another list - so in order to cons a element you have to iterate through both lists again)

So either try avoiding the @ in performance crticial places (i.e. the inner loops etc.) by building up your lists with :: recursivly or switch in your algorithms to the C# System.Collection.Generic.List. Or think again about your algorithm and try identifing the right datastructures for it (for example when you use @ often a binary tree might be better suited - or a (Hash-)set that is using this structure).

Greetings.

By Carsten Koenig on 11/23/2008 9:43 PM (permalink)

I'm not sure to which type you are referring when you say a "C# List", but the list types in the .Net framework appear to be singly or doubly linked lists. The F# list is singly linked (I assume); thus the :: operator should be O(1) and should not involve any copying at all. It will of course cause a single (head,tail) pair to be allocated on the heap.

The append (@) operator will cause copying of the left argument; however my algorithm doesn't use the @ operator at all, just ::

Yes you are perfectly right - the @ - Operator is a performance killer (as far as I know this is no big brainer - a C# List is build as a linked list so consing is a O(1) Operation - F# lists are of the form (head, tail) where tail is another list - so in order to cons a element you have to iterate through both lists again)
...

By Bill_Cohagan on 11/24/2008 7:33 AM (permalink)

You're right, like I mentioned you can even prove to yourself that there's no copying going on with a cons (via List.Cons, or the :: operator) by using Object.ReferenceEquals. Are you using a deep recursion anywhere that is not tail recursive? It's hard to make conjectures with no code sample. I've done a number of comparisons myself and the performance has always been good with F#.

By divisortheory on 11/24/2008 7:47 AM (permalink)

Divisor

Yes, the algorithm does do deep recursion that is not tail recursive; however that is true of both the c# and the f# implementations. The algorithm is basically searching a "game tree"; thus the recursion. Deep recursion shouldn't by itself cause a lot of consing, right? Unless f# frames are heap allocated?

I'll be posting the code here some time today.

Bill

You're right, like I mentioned you can even prove to yourself that there's no copying going on with a cons (via List.Cons, or the :: operator) by using Object.ReferenceEquals. Are you using a deep recursion anywhere that is not tail recursive? It's hard to make conjectures with no code sample. I've done a number of comparisons myself and the performance has always been good with F#.

By Bill_Cohagan on 11/24/2008 7:53 AM (permalink)

Bill,
some data structures in F# are immutable (list<'a> for example). So every time you operate on these structures you need to create copies, a thing you can often avoid in C#.
So the overhead from copying data around could be a cause of performace degradation. Just a guess.
I haven't tried before on F#, but I think you could do some profiling with standard .NET tools. DotTrace Profiler is a great tool, but it ain't free.

Hope it helps
Mau

By mau on 11/23/2008 8:51 PM (permalink)

Thanks for the response; however I don't believe that the :: operator causes any copying. I would assume that it allocates storage for the new list head element, but it certainly does not copy the tail of the list. Why would it?

As for profiling, I wasn't aware that these tools gave you info on heap allocation and GC. Is that the case? That'd be good news!

Regards,

Bill

By Bill_Cohagan on 11/24/2008 7:15 AM (permalink)

While true, it's not quite as bad as it sounds. For example, try the following at the FSI prompt:

> let x = [1; 2; 3; 4; 5];;
val x : int list = [1; 2; 3; 4; 5]

> let y = 0 :: x;;
val y : int list = [0; 1; 2; 3; 4; 5]

> Object.ReferenceEquals((box x.Tail), (box y.Tail.Tail));;

(no F# compiler on the computer i'm typing from atm, but if the syntax is incorrect, it should be trivial to fix).

the result of the above expression will be true. So while while yes there's copying going on, the fact that data structures are immutable means that most of the time it's the reference that's being copied, not the entire contents of the object. If data structures were mutable, this would obviously have devastating results. But since they're not, it's ok. For some reason I doubt that excessive cons'ing is causing a serious performance problem, although I could always be wrong. Code samples would help though.

By divisortheory on 11/23/2008 9:26 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags