2.8k
views16
comments

I realized that in my previous message I did not include
any code. Unfortunately the code base I've been using
is quite large and I can't extract any meaningful
portions to show here. What I've done instead is this:
I've created a little independent benchmark that attempts
to replicate the issue. I've written the code both in F#
and in SML, for comparison purposes. The F# code can be
found here, in a file named pmtest.fs:
[link:uploadingit.com]
The SML code, in a file pmtest.sml, can be found here:
[link:uploadingit.com]

The code simply declares a couple of moderately sized
recursive datatypes (10 constructors each). Each constructor
holds some text data along with a numeric code, in addition
to the recursive children. I then defined some mutually
recursive functions for traversing a given AST and producing
a new AST obtained just by reversing the text data of each
node. So this is very typical AST-manipulating code, the
kind that is pervasive in compilers and theorem provers.
I then defined a sample AST, again of moderate size, and
ran the traversal function on it 100,000 (10^5) times.
The results were not as bad as the results that I got
on my own code, but they were disappointing nevertheless:

F#: 20.5 seconds

SML-NJ: 10.4 seconds

MLton: 3.7 seconds

Compiling the code with fsc.exe (with --optimization on)
barely makes a dent in the F# running time, shaving off
about 1 or 1.5 seconds at best.

My question is: What is it about this code that makes
MLton outperform F# by such a factor? Any suggestions
for improving the performance of the F# code?

Thanks again...

on my machine, following changes makes good improvement of performance. almost 2 times faster

    let stringToCharArray ( str : string ) = str.ToCharArray ()
    
    let explode = stringToCharArray
    
    let implode ( cl : char [] ) = 
        let buf = new System.Text.StringBuilder ()
        let _ = buf.Append ( cl )
        buf.ToString ()    

    let rev s = implode ( Array.rev ( explode ( s ) ) )

By deex on 2/10/2011 3:41 AM (permalink)

Hi deex,

I'll take your word for that, but you're now comparing
apples and oranges, because the SML code uses lists of
characters, not arrays. The signatures of explode and
implode in SML are:

string -> char list
char list -> string

whereas the signatures of your respective
implementations are:

string -> char []
char [] -> string.

If you also do array-based string reversal is SML,
of course you'll speed up that code too. Again,
this is not about how to reverse strings efficiently.

By NewFSharpUser on 2/10/2011 6:01 AM (permalink)

In fact you compare specialized version of `implode` & `explode` from SML with your own realization on F#. So for `implode` you have `String.ToCharArray >> Array.toList` dataflow and for `explode` you have `List.toArray >> StringBuilder.Append >> StringBuilder.ToString`. In total you have 3 temporary and unnecessary structures and big overhead memory and speed. Slow F# version is not big surprise in that case. Try to do array-based string reversal is SML and measure it. In that case it will be fair comparision.

By deex on 2/10/2011 8:43 AM (permalink)

Yes, another way to look at it is that although string reversal isn't the stated goal, if replacing one implementation with another that is three times faster makes the whole program run about twice as fast (as the OP as stated) then that means that about 2/3 of the program time is spent in string reversal, so indeed we have been basically profiling string reversal. If string reversal is just a notional palceholder for some other operation, the why not just define

let rev = id

and be done with it? (Unless the code actually uses of string reversal to get some read-world job done, of course.)

By anotheraccount on 2/10/2011 9:21 AM (permalink)

Hi anotheraccount,

You must have read my mind. I was just about to get rid
of this whole string reversal business because it was
really proving distracting. So I did go ahead and
defined rev to be the identity function, both in
the F# code and in the SML code. Since the initial
benchmark now completes much faster, I upped the
number of iterations from 100,000 to 500,000.
Respective results:

F#: 15.9 seconds

SML-NJ: 7.6 seconds

MLton: 2.3 seconds

Pretty much the same situation we had initially.

Btw, I was intrigued by your earlier suggestion
that records may incur an overhead due to .Net
class-related stuff. I might try making the
datatype constructors take simple tuples
as arguments, instead of records, just to see
if that makes a difference. [Of course, if
DUs themselves incur a significant overhead,
record arguments or not, then it's a lost cause.
I don't see how you can have efficient functional
programming without efficient DU's.]

By NewFSharpUser on 2/10/2011 9:50 AM (permalink)

Having F# be twice as slow (assuming you don't get better) isn't necessarily bad at all, if you balance this out with the fact the code is now part of the ".Net ecosystem". F# objects can be used from any other .Net language. If that doesn't bring you any advantages re: interoperability, customers, etc., and a stand-alone processor is fine for you, then maybe your code is good as-is - which is always nice.

If a real-world task takes 120ms instead of 60ms, then the user probably won't even notice. If it takes six hours instead of three - you leave work early either way. If it takes two minutes instead of one, that could be annoying - funny how that works :)

But F# is a very efficient language in most respects.

Not sure how the overhead of tuple creation and unpacking compares to your use of records.

However, it could be interesting to have an attribute to control the representation of records and DU's to use more efficient schemes if possible, such as structs and classes with mutable public fields for recs, and tag/union storage rather than classes for DU's.

By anotheraccount on 2/10/2011 11:12 AM (permalink)

I see what you're saying, but I'm not so sure this
applies in my case. As I said initially, this is
a benchmark I threw together quickly in an attempt
to replicate the problem in my own code. In my own
code, the F# version is about 4-5 times slower than
SML-NJ, at best (some parts slow down even more).
Also, keep in mind that anything written in SML-NJ
can be compiled by MLton for free; SML-NJ is used for
development and MLton cranks out the final executable
at the end. Given that the MLton is anywhere from 2-4
to 10+ times faster than SML-NJ (depending on the code),
this means that the F# version would be anywhere from
10 to 40-50+ times slower than what I would get with SML.
That's too big a hit for any serious interpreter to take.
I'd love to continue the F# project, but unless I have
some revelation soon about what's causing the poor
performance on AST manipulation, I'll have to either
switch to OCaml or stick with SML. [ocamlopt, btw, according
to my tests so far, is (very) marginally better than
SML-NJ, but pretty significantly worse than MLton.]

By NewFSharpUser on 2/10/2011 12:25 PM (permalink)

Which .NET version are you using?

If you are not yet using .NET 4, you might benefit from an upgrade. See [link:flyingfrogblog.blogspot.com]

By wmeyer on 2/17/2011 3:27 PM (permalink)

The string reversal not being the thing notwidthstanding, I cannot reproduce your results the code I provided was slower than your original code. (It really can't be, given all it's doing.) The code below shows it to be about 3x faster and produces ~15% of the garbage. (And initalizing the stringBuffer with the size of the string speeds things up a little more to ~700ms. for one million reps.)

let stringToCharArray(str:string) = str.ToCharArray()
let stringToCharList(str:string) = Array.toList(stringToCharArray(str))
let explode = stringToCharList
let implode(cl) = 
    let buf = new System.Text.StringBuilder()
    let _ = buf.Append(List.toArray(cl))
    buf.ToString()
let rev1(s) = implode(List.rev(explode(s)))

let rev2 (str:string) = 
 let sb = new System.Text.StringBuilder() 
 for i in str.Length-1 .. -1 .. 0 do
  ignore <| sb.Append( str.[ i])
 sb.ToString()

let repeat n f = 
    for i in 1 .. n do ignore (f())
let testStr = "A" + (new string('-',100)) + "Z" 

> repeat 1000000 (fun() -> rev1 testStr);;
Real: 00:00:02.528, CPU: 00:00:02.464, GC gen0: 1001, gen1: 1, gen2: 0

> repeat 1000000 (fun() -> rev2 testStr);;
Real: 00:00:00.851, CPU: 00:00:00.842, GC gen0: 152, gen1: 1, gen2: 0

I mention this because if you're seeing this make your code take "three times as long", then something else may be amiss in the timings.

By anotheraccount on 2/10/2011 7:40 AM (permalink)

Reply to anotheraccount:

Thanks for your latest email. I think there may be a bit
of a misunderstanding here. If you notice my previous reply
to your original post, you'll see that it only refers to
your suggested definitions of implode and explode, *not*
to your ground-up definition of rev in terms of StringBuilder.
So the triple times were in reference to an implementation
that used your implode and explode but kept the same
definition of string reversal:

implode(List.rev(explode(s))).

This was done in order to keep the comparison with SML
fair, since that's how reversal is done in SML, by
imploding the reverse of the string's explosion.

When I use your ground-up reversal implementation
in terms of StringBuilder, I do get a drastic
speed-up, about twice as fast, if I recall correctly
from last night. But of course that's not particularly
relevant to the situation: Given how reversal is done
in SML, the same reversal algorithm must also be
used in F#: implode - List.rev - explode. It's
fine for the purposes of this comparison to give
better F# definitions of implode and explode,
but not so for string reversal. So there's nothing
amiss in the timings.

I hope this clears things up a bit - thanks.

By NewFSharpUser on 2/10/2011 8:13 AM (permalink)

Got it, thanks.
If the goal is to get your code running to your satisfaction in F#, then ISTM you should use ideomatic/efficient code in F# where possible -- effecient constructs are always different between languages.
The style of clearest and fastest F# port may not be the best for your other languages either.
Because common operations like string reversal is done this way in language B, I'm sure they make sure these operations are very quick - but .Net provides a plethora of native string operations so one is expected to go that route.
If we remove issues like string reversal and perhaps tail-calls (by using higher-order fn's to traverse your AST instead of recursion if possible, which may also look clearer) we may be able to see a 1:1 comparison of the actual AST xform.

DU's and Records do come with overhead of full .Net classes with properties (getter and setter fn's) rather than direct field accessors, so there may be some perf issues compared to some other languages - but I think you should be able to get the perf to where you want it. But if you're wanting an auto-magic port of the code maybe with some search-and-replace to get things going, maybe not.

By anotheraccount on 2/10/2011 8:38 AM (permalink)

I can't speak to the rest at a glance, but this certainly is the scariest string reversal I've ever seen :) - though I get where it's coming from:

 
let stringToCharArray(str:string) = str.ToCharArray()
let stringToCharList(str:string) = Array.toList(stringToCharArray(str))
let explode = stringToCharList
let implode(cl) = let buf = new System.Text.StringBuilder()
                   let _ = buf.Append(List.toArray(cl))
                   buf.ToString()
let rev(s) = implode(List.rev(explode(s)))

More ideomatic versions of above could be:

1
2

let implode str = str |> List.fold (fun a b -> a + string b) ""
let explode str = str |> Seq.toList

However, if you're going to break out StringBuffer anyway (and you're much better off than creating lists), try replacing all of the above with:

 
let rev (str:string) = 
 let sb = new System.Text.StringBuilder()
 for i in str.Length-1 .. -1 .. 0 do
  ignore <| sb.Append( str.[ i])
 sb.ToString()

That may or may not address the AST xform as a whole, but it couldn't hurt.

By anotheraccount on 2/9/2011 8:18 PM (permalink)

Thanks for the feedback.

Unfortunately, your suggested definitions of
implode and explode fare much worse - F#
takes 3 times as long with them (about 62
seconds on my laptop).

At any rate, the point is not to come up with
a better definition of string reversal. Obviously
there are more efficient implementations. The
point is to compare MLton and SML-NJ with F#
on how they handle recursing down the AST.
What they do on the AST nodes is completely
immaterial as long as it's the same in both
cases. And the SML code does use the same
definition of string reversal:

fun rev(s) = implode(List.rev(explode(s)))

Now, perhaps MLton and SML-NJ have more efficient
implementations of implode and explode than what
I've given here for F#, but I doubt that that's the gist
of the problem. For instance, here's a more efficient
implementation of explode in F#:

let explode(str:string) =
let rec makeList(i,res) =
if i < 0 then res else makeList(i-1,(str.[i])::res)
makeList(str.Length - 1,[])

Using this version makes little difference, shaving off only
about 1 second of running time: I got 19.2 seconds vs. the
original 20.5 seconds. Perhaps a more efficient version
of implode might shave off another second (or even two).
But that's still a very long way from the 10.4 of SML-NJ,
let alone the 3.7 of MLton. So there's something else
going on here.

By NewFSharpUser on 2/9/2011 9:37 PM (permalink)

32bit tail calls are much faster than 64bit ones. Are you running this on a 64bit machine?

64bit architectures have a different calling convention which requires complex stack manipulation.

By Faisal Waris on 2/10/2011 7:42 AM (permalink)

On my machine a simple change to implode shaves about 15% off of the running time:

let implode(cl) = System.String(Array.ofList cl)

Off hand, I'm not sure what is causing the remainder of the difference in performance with SML-NJ (I don't think it's fair to compare F# against MLton, which is a whole program compiler). As always when characterizing performance, using a good profiler should give you an idea of where to start.

On another note, if you're using fsi, use the #time;; directive rather than your timing function (and you should probably be using the System.Diagnostics.StopWatch class anyway rather than a DateTime value, although for operations which are this slow it won't make any difference).

By Keith Battocchi on 2/9/2011 10:03 PM (permalink)

Yes, as I said, the string reversal was not meant to be more than a micro-optimization; the fact that it tuned out not to be even that is vexing. I find it hard to believe that the imperative StringBuffer code is less efficient than consing and joining and converting the char list - I'll have to run that test myself... (It could be that while it looks bad, they are all natively implemented and fare better, though they'd still create more garbage.)
Please keep this thread up to date w/your findings...

By anotheraccount on 2/9/2011 9:48 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags