It looks like there are a few things you can do to speed up the more functional version. First of all, in your inner for loop, you don't need to explicitly create a list; just use "for i in int64 n * int64 n..int64 n..int64 max" instead. This will avoid lots of allocations, and on my machine speeds up the overall code by roughly 30%.

To eke more performance out of the code, you'll need to make changes which are more dramatic. It looks like the main list comprehension slows things down a lot. The following refactoring runs much faster on my computer (but still about 4 times slower than the imperitive version):

1
2
3
4
5
6
7
8
let getPrimes1 max = 
  let primes = new BitArray(max+1, true)
  seq { 2 .. max } |>
  Seq.filter (fun n -> 
    if primes.[int n] then
      for i in int64 n * int64 n..int64 n..int64 max do primes.[int i] <- false
    primes.[int n])
By on 2/25/2009 8:49 AM ()

It looks like there are a few things you can do to speed up the more functional version ... on my machine speeds up the overall code by roughly 30% ... The following refactoring runs much faster on my computer (but still about 4 times slower than the imperitive version):

1
2
3
4
5
6
7
8
let getPrimes1 max = 
  let primes = new BitArray(max+1, true)
  seq { 2 .. max } |>
  Seq.filter (fun n -> 
    if primes.[int n] then
      for i in int64 n * int64 n..int64 n..int64 max do primes.[int i] <- false
    primes.[int n])

If one were really worried about saving 30% of processing speed, they would not use the packed BitArray object to reduce memory use by eight times at a cost of about a 350% increase in processing time; even a use of two megabytes is immaterial on a modern personal computer. If one were a little bit concerned, they wouldn't use this implementation of the Sieve of Eratosthenese to check all possible numbers by doing the slight modifications to only scan the odd numbers for a reduction of half the memory requirements and some of the processing required to cull the even values, finally adding the value 2 as the only even prime. One would also make slight changes to only do the culling for values of prime up to the square root of the maximum value checked since there are no culls required above that point.<bf>

The above ideas are embodied as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
let GetOddPrimes max =


  let LSTNDX = (max - 3) / 2


  let LSTCULNDX = (int (sqrt (double max)) - 3) / 2


  let inline prime i = i + i + 3


  let inline strtndx i = ((prime i * prime i) - 3) >>> 1


  let buf = Array.create (LSTNDX + 1) true


  {0 .. LSTCULNDX}


    |> Seq.filter (fun i -> buf.[&#0;i])


    |> Seq.iter (fun j -> seq {strtndx j .. prime j .. LSTNDX} |> Seq.iter (fun j -> buf.[&#0;j] <- false))


  {0 .. LSTNDX} |> Seq.filter (fun i -> buf.[&#0;i]) |> Seq.map (fun i -> i + i + 3)

Of course, no current implementation of functional programming techniques are going to be as fast as using imperative loops for this application as the method calls required for the IEnumerable implementations add a lot of overhead (comparatively) for the required tight loops; this is especially bad for even the current CTP 2.0 version of F# due to a very inefficient implementation of the Core run time library which uses a succession of generalized virtual method calls to implement the required Range IEnumerator methods in order to keep the Range objects suitable for general use - much optimization needs to be done in this area, as even using equivalent algorithms F# is about three times slower than the more highly optimized C#.

For instance, the following C# method using an iterator is functionally identical to the above F# code but runs a full 20 times as fast (!!!):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
IEnumerable<long> GetPrimes(int max) {


  bool[] nonprimes = new bool[&#0;(max - 3) / 2 + 1];


  int culllim = ((int)Math.Sqrt((double)max) - 3) >> 1;


  int lim = nonprimes.Length;


  for (int i = 0; i <= culllim; ++i)


    if (!nonprimes[&#0;i]) {


      int prime = i + i + 3;


      for (int j = (prime * prime - 3) >> 1; j < lim; j += prime)


        nonprimes[&#0;j] = true;


    }


  for (int i = 0; i < lim; ++i)


    if (!nonprimes[&#0;i])


      yield return i + i + 3;


}

Even the following F# code where some of the elegance of the functional style has been sacrificed for imperative loops as per the C# method is still over 8 times slower (!!!):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
let GetOddPrimes(max) : seq<int> =


  let LSTNDX = (max - 3) / 2


  let LSTCULNDX = (int (sqrt (double max)) - 3) / 2


  let inline prime i = i + i + 3


  let inline strtndx i = ((prime i * prime i) - 3) >>> 1


  let buf = Array.zeroCreate (LSTNDX + 1)


  for i = 0 to LSTCULNDX do


    if not buf.[&#0;i] then


      let mutable j = strtndx i


      let p = prime i


      while j < buf.Length do


        buf.[&#0;j] <- true


        j <- j + p


  seq { for i = 0 to buf.Length - 1 do


          if not buf.[&#0;i] then


            yield prime i }

To show how inefficient the current sequence implementation is, the following is the same routine as above but the final sequence is implemented by a custom IEnumerable/IEnumerator, resulting in a speed that is just the same as doing the same job using C#:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
//enum used internally


type internal State =


  |Reset = 0


  |Running = 1


  |Finished = 2


let GetOddPrimes(max) : seq<int> =


  let LSTNDX = (max - 3) / 2


  let LSTCULNDX = (int (sqrt (double max)) - 3) / 2


  let inline prime i = i + i + 3


  let inline strtndx i = ((prime i * prime i) - 3) >>> 1


  let buf = Array.zeroCreate (LSTNDX + 1)


  for i = 0 to LSTCULNDX do


    if not buf.[&#0;i] then


      let mutable j = strtndx i


      let p = prime i


      while j < buf.Length do


        buf.[&#0;j] <- true


        j <- j + p


  //define own sequence implemented using object expressions


  let seqi =


    let i = ref -1


    let st = ref State.Reset


    { new IEnumerator<int> with


        member this.Dispose() = ()


        member this.Current with get() = prime !i


      interface IEnumerator with


        member this.Reset() =


          i := -1


          st := State.Reset


        member this.MoveNext() =


          if !st = State.Running then


            i := !i + 1


            while !i <= LSTNDX amp;& buf.[&#0;!i] do


              i := !i + 1


            if !i >= LSTNDX then


              st := State.Finished


            if !i > LSTNDX then


              false


            else


              true


          elif !st = State.Finished then false


          else // reset


            i := !i + 1


            while !i <= LSTNDX && buf.[&#0;!i] do


              i := !i + 1


            if !i > LSTNDX then


              st := State.Finished


              false


            else


              st := State.Running


              true


        member this.Current with get() = (prime !i) :> obj


    }


  let genseqnc =


    { new IEnumerable<int> with


        member this.GetEnumerator() = seqi


      interface IEnumerable with


        member this.GetEnumerator() = (seqi :> IEnumerator)


    }


  genseqnc

In addition, even in the imperative looping style, it seems that the current version of F# does not do the optimizations eliminating the array range checks when the loop variable is restricted to within the range of the array for an additional over two times in computational performance hit for highly optimized use.

In short, until the run time library is rewritten and more attention is paid to code optimization in the F# compiler, F# will be unsuitable for applications where high computing performance is a requirement unless one implements many of the run time objects themselves.

In other words, if one were really concerned with saving an extra 30% in computational time, they would not use F#, or at least in its current form without a lot of extra overhead code!

By on 3/4/2011 11:22 PM ()

The way you have written it, strtndx in the F# version isn't as efficient as its expanded C# counter-part (i + i + 3 is computed twice).

Regarding your comment that the F# compiler does not perform optimization allowing the CKR to remove bounds check: I would be surprised if the C# compiler and the F# compiler differ in this respect. It's not obvious that culllim is always smaller than lim.

By on 3/6/2011 12:50 AM ()

The way you have written it, strtndx in the F# version isn't as efficient as its expanded C# counter-part (i + i + 3 is computed twice).

"strtndx" pays little penalty in speed as it and the "prime" function that it uses are written as "inline", which means that they just get inserted directly into the code with no stack function call. Also, the compiler will internally create a cached "closure" of the value of "prime" the first time it is computed and that is what will be used from there on for a given value of "i". Even if this were not true, the time impact of computing "prime" twice for "strtndx" for one two additional addition operations is negligible for the whole routine, which spends the majority of the time inside the loop culling composite numbers from the array and the loop scanning the array for the prime values. The "strtndx" is calculated only once per prime number less than the square root of max, where as the inner culling loop rolls for about 2.25 times max and the prime scanning and output loop runs one half of the max value times.

For this negligible gain in computational speed, you could expand "prime" and combine terms to get the following:

let inline strtndx i = (((i + 3) * i) <<< 1) + 3

Regarding your comment that the F# compiler does not perform optimization allowing the CKR to remove bounds check: I would be surprised if the C# compiler and the F# compiler differ in this respect. It's not obvious that culllim is always smaller than lim.

You are correct that the loop that counts up to "culllim" will not have its range check optimized away, but that isn't the major time consuming loop of the routine. The loops firstly using "j" to cull the composite numbers from the "nonprime" array up to its maximum limit and secondly the final "i" loop scanning the "nonprimes" array for the remaining primes are where all the time is spent and those are optimized by the C# compiler to eliminate the range check when the correct form of specification is used. This was an early optimization which has been documented for C#.

Thus, they **do** in fact differ in that if one writes the loops as I did in my C# equivalent with a "lim" limit set to exactly the "buf.Length", the bounds check is removed (as it turn out probably by the Just In Time - JIT - compiler). However, for either of the imperative loop forms in F# imperative code, the bounds check is still there, perhaps because the F# compiler doesn't output the IL code in a form that the JIT compiler can optimize. You can find a lot of this for yourself by using a disassembler on the resulting code.

This isn't the biggest problem with the current version of F#, as if one is trying to write very much tight imperative code using F# they have probably picked the wrong language; F# is intended for the functional non-imperative style. Once one is using iterators by preference for clarity of style, the extra overhead of the range checks becomes relatively negligible compared to the overhead of using the IEnumerator interface, even when it is implemented as efficiently as C# does.

As I have shown in my latest edits for my post to which you refer here, if the standard run time library objects intended to supply the IEnumerator interface implementations were written more efficiently, F# would be just as fast as C# for anything other than for the tightest imperative loops. Even for most imperative loops, the overhead of range checking won't more than double the time per loop for extremely tight loops.

By on 3/6/2011 3:26 AM ()

A few comments.

The loops firstly using "j" to cull the composite numbers from the "nonprime" array up to its maximum limit and secondly the final "i" loop scanning the "nonprimes" array for the remaining primes are where all the time is spent and those are optimized by the C# compiler to eliminate the range check when the correct form of specification is used. This was an early optimization which has been documented for C#.

Thus, they **do** in fact differ in that if one writes the loops as I did in my C# equivalent with a "lim" limit set to exactly the "buf.Length", the bounds check is removed. However, for either of the imperative loop forms in F# imperative code, the bounds check is still there. You can find a lot of this for yourself by using the Red Gate Reflector program or disassembling the resulting code.

I may be misunderstanding what you are saying, but the C# compiler *can't* remove bounds checks (which are inserted by the CLR JIT compiler), so looking at the IL won't help you determine when they are present. It is true that loops going up to the last element in an array will have the bounds check removed (or rather, not inserted) by the JIT compiler, but this also happens for F# loops of the form

1
for i in 0 .. myArr.Length - 1 do ...

. Unfortunately, the code generated for loops of the form

1
for i in 0 .. step .. myArr.Length - 1 do ...

is significantly worse.

This isn't the biggest problem with the current version of F#, as if one is trying to write very much tight imperative code using F# they have probably picked the wrong language; F# is intended for the functional non-imperative style.

I don't want to speak for the language designers, but I don't think that this is true. I believe that F# is intended to be a multi-paradigm language, and there is nothing wrong with using an imperitave style within F# when performance requires it.

Once one is using iterators by preference for clarity of style, the extra overhead of the range checks becomes relatively negligible compared to the overhead of using the IEnumerator interface, even when it is implemented as efficiently as C# does.

As I have shown in my latest edits for my post to which you refer here, if the standard run time library objects intended to supply the IEnumerator interface implementations were written more efficiently, F# would be just as fast as C# for anything other than tight imperative loops. Even for most imperative loops, the overhead of range checking won't more than double the time per loop for extremely tight loops.

I agree that it's unfortunate that the F# IEnumerator implementation isn't more performant; perhaps that's something that could be addressed in a future version of the language. However, I think that your other point is quite relevant, too: if performance is critical, then use a different collection type such as an array or mutable list. For instance, getting rid of the seq {} block in the code in this thread in favor of a mutable list yields a huge speedup on my machine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
let primes max =
  let nonprimes = Array.zeroCreate((max - 3)/2 + 1)
  let culllim = int ((sqrt (double max) - 3.) / 2.)
  for i = 0 to culllim do
    if not nonprimes.[ i] then
      let prime = i + i + 3
      (* for j in (prime * prime - 3) / 2 .. prime .. nonprimes.Length do
           nonprimes.[j] <- true *)
      let mutable j = (prime * prime - 3) / 2
      while j < nonprimes.Length do
        nonprimes.[j] <- true
        j <- j + prime
  let mutable ct = 0
  let results = ResizeArray()
  for i in 0 .. nonprimes.Length - 1 do
    if not nonprimes.[ i] then results.Add(i + i + 3)
  results
By on 3/6/2011 7:57 AM ()

A few comments.

The loops firstly using "j" to cull the composite numbers from the "nonprime" array up to its maximum limit and secondly

the final "i" loop scanning the "nonprimes" array for the remaining primes are where all the time is spent and those are optimized by the C# compiler to

eliminate the range check when the correct form of specification is used. This was an early optimization which has been documented for C#.

Thus,

they **do** in fact differ in that if one writes the loops as I did in my C# equivalent with a "lim" limit set to exactly the "buf.Length", the bounds

check is removed. However, for either of the imperative loop forms in F# imperative code, the bounds check is still there. You can find a lot of this

for yourself by using the Red Gate Reflector program or disassembling the resulting code.

I may be misunderstanding what you are saying, but the C# compiler *can't* remove bounds checks (which are inserted by the CLR JIT compiler), so

looking at the IL won't help you determine when they are present. It is true that loops going up to the last element in an array will have the

bounds check removed (or rather, not inserted) by the JIT compiler, but this also happens for F# loops of the form

1
2
3
for i in 0 .. 

myArr.Length - 1 do ...

. Unfortunately, the code generated for loops of the form

1
2
3
for i in 0 .. step .. 

myArr.Length - 1 do ...

is significantly worse.

You are correct; I misspoke - it's the JIT compiler that (sometimes) adds the array bounds checks, and I guess I determined that from disassembly of the

JIT output code and not from the IL. What is true is that the imperative F# mutable while .. do loop isn't as efficient as the C# forms for doing the

same thing by about 25%. The second "for .. in" form you state is actually using the F# runtime library's version of a sequence, in this case a Range

object, and thus has a terrible relative performance, which is my gripe.

A few comments.

This isn't the biggest problem with the current version of F#, as if one is trying to write very much tight imperative

code using F# they have probably picked the wrong language; F# is intended for the functional non-imperative style.

I don't want to speak for the language designers, but I don't think that this is true. I believe that F# is intended to be a multi-

paradigm language, and there is nothing wrong with using an imperitave style within F# when performance requires it.

You are correct that there is some capability in F# to write imperative code but it is quite limited in flexibility: the basic for loop only steps up

or down by incrementing and decrementing, no continue, no break, etcetera - at least as yet. So yes, while one can try to use an imperative style where

one must, the current language specification doesn't make it easy. That wouldn't matter so much if the built in implementations of sequences, etcetera,

were made more efficient.

A few comments.

Once one is using iterators by preference for clarity of style, the extra overhead of the range checks becomes relatively

negligible compared to the overhead of using the IEnumerator interface, even when it is implemented as efficiently as C# does.

As I have shown in my latest edits for my post to which you refer here, if the standard run time library objects intended to supply the IEnumerator

interface implementations were written more efficiently, F# would be just as fast as C# for anything other than tight imperative loops. Even for most

imperative loops, the overhead of range checking won't more than double the time per loop for extremely tight loops.

I agree that it's unfortunate that the F# IEnumerator implementation isn't more performant; perhaps that's something that could be addressed in a

future version of the language. However, I think that your other point is quite relevant, too: if performance is critical, then use a different

collection type such as an array or mutable list. For instance, getting rid of the seq {} block in the code in this

thread in favor of a mutable list yields a huge speedup on my machine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
let primes max =
  let nonprimes = Array.zeroCreate((max - 3)/2 + 1)
  let culllim = int ((sqrt (double max) - 3.) / 2.)
  

for i = 0 to culllim do
    if not nonprimes.[ i] then
      let prime = i + i + 

3
      (* for j in (prime * prime - 3) / 2 .. prime .. nonprimes.Length 

do
           nonprimes.[j] <- true *)
      let mutable j = (prime 

* prime - 3) / 2
      while j < nonprimes.Length do
        nonprimes.[j] <- 

true
        j <- j + prime
  let mutable ct = 0
  let results = ResizeArray()
  for i 

in 0 .. nonprimes.Length - 1 do
    if not nonprimes.[ i] then results.Add(i + i + 3)
  results

Yes, this poor implementation of the sequence objects is one of the more serious problems for the current implementation of F#. I have submitted this

improvement as a suggestion to Microsoft, to which they replied "We have an open suggestion for improving the performance of some of the functions in

the Seq module. Your report will increase the priority of this work for our next release. We are now tracking this as 843498 in our database.";

however, that was over eight months ago and so far there have been no really new releases addressing this issue.

As you show in your code snippet above and as I show in my self-implementation of IEnumerable/IEnumerator, **any** other implementations that avoid

using the Core built-in sequences are much faster. The advantage of redefining Range sequence objects for oneself as compared to using most other types

of collection objects is that sequences don't have any memory overhead and don't require that one generate the collection and then scan it (again) to

output the results.

In order to exactly parallel the C# IEnumerable implementation, we would really like to use a sequence expression to behave the same as the C# code loop

containing the "yield return" statement as follows:

1
  seq { for i = 0 to LSTNDX do if not nonprimes.[&#0;i] then yield i + i + 3 }

Unfortunately, the performance of this with the current version CTP 2.0 of F# absolutely stinks!!!

We would be perfectly happy to use the following filtered range statement if performance were reasonable:

1
  seq { 0 .. LSTNDX } |> Seq.filter (fun i -> not nonprimes.[&#0;i]) |> Seq.map (fun i -> i + i + 3)

But the performance of that stinks almost as bad!! Again, this is partly due to the lousy implementation of sequences and sequence support in the

current version of F# but also due to that the sequence needs to be called for every possible index value and not just for the pre-filtered prime

indexes as con be done with the "yield return" form or with the first form above if the same code were generated.

If we want performance and style, we either have to sacrifice using more memory for an existing real DotNet collection as you have done or for more

efficiency in both computational speed and memory requirements to implement our own sequences. In the following code, I use nested object expressions

to implement an IEnumerable/IEnumerator pair that has the same result as the "yield return" sequence producing loop of the C# code with about the same

speed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
type internal State =


  |Reset = 0


  |Running = 1


  |Finished = 2

let GetOddPrimes max =


  let LSTNDX = (max - 3) / 2


  let LSTCULNDX = (int (sqrt (double max)) - 3) / 2


  let nonprimes = Array.zeroCreate (LSTNDX + 1)


  //create a yield sequence based on the nonprimes array using nested object expressions


  let yieldseq =


    let seqienum =


      let i = ref 0


      let st = ref State.Reset


      { new IEnumerator<int> with


          member this.Dispose() = ()


          member this.Current with get() = (!i <<< 1) + 3


        interface IEnumerator with


          member this.Reset() = i := 0; st := State.Reset


          member this.MoveNext() =


            if !st = State.Running then i := !i + 1


            if !st <>  State.Finished then st := State.Running


            while !i <= LSTNDX && nonprimes.[&#0;!i] do i := !i + 1


            if !i > LSTNDX then st := State.Finished; false


            else true


          member this.Current with get() = (!i <<< 1) + 3 :> obj


      }


    { new IEnumerable<int> with


        member this.GetEnumerator() = seqienum


      interface IEnumerable with


        member this.GetEnumerator() = seqienum :> IEnumerator


    }


  for i = 0 to LSTCULNDX do


    if not nonprimes.[&#0;i] then


      let mutable j = (((i + 3) * i) <<< 1) + 3


      let p = i + i + 3


      while j < nonprimes.Length do


        nonprimes.[&#0;j] <- true


        j <- j + p


// Equivalent to the C# IEnumerable solution, but performance absolutely stinks at 45 times slower!!!


//  seq { for i = 0 to LSTNDX do if not nonprimes.[&#0;i] then yield i + i + 3 }


// The way one might also like to do it, but performance stinks almost as bad at about 20 times slower!!


//  seq { 0 .. LSTNDX } |> Seq.filter (fun i -> not nonprimes.[&#0;i]) |> Seq.map (fun i -> i + i + 3)


  yieldseq.GetEnumerator().Reset(); // not really required if only running once


  yieldseq // quite high performance from the implementing object expressions

Note that if one wants to call this repetitively, one requires the "...Reset" before enumeration since only one enumerator object is created and reused

as necessary. The above code is just slightly about the same speed as the equivalent C# code, although that is still approximately twice as slow as the

imperative approach due to iterator method call overheads. Note that the current (2010) C# compiler is very efficient at generating iterators using the

yield statement, which should be equivalent to the seq expression with the yield keyword in F# if implemented correctly. As it is, the F# version is

approximately 45 times slower than the iterator object created automatically by the C# compiler or that I have created by hand here!!!

Also note that although the F# language designer people recommend preceding a sequence expression with the "seq" keyword as above to be consistent with

other forms of computational expressions (enforced for the "yield" type expression for sequences), the current compiler generates an additional MakeSeq

call that adds a redundant level of indirection in calling the resulting sequence and slightly slows the implementation even further. Upon further

thinking about this, I realize that the seq expression is a modified special version of computational expressions, but that this botched implementation

does even more stacked virtual method calls than just computational expressions and is thus completely unsuitable for use with sequences for fast

iterative loops.

No one would have any problems with the performance of F# if the run time library and compiler were optimized to run at the speed (or slightly better

with slight more loop optimizations) of the above code as they should be able to do.

[EDIT-ADD:] Analysis of why these implementations are so slow:

A/ For the ordinary sequence processing chain as per:

1
  seq { 0 .. LSTNDX } |> Seq.filter (fun i -> not nonprimes.[&#0;i]) |> Seq.map (fun i -> i + i + 3)

Other than adding one extra unnecessary level of indirection as mentioned above for the first "seq" (which isn't currently required), this

implementation suffers from two time-consuming problems, as follows:

1. The inefficient built-in implementation of the range { 0 .. LSTNDX }: this can be replaced with a custom implementation of an

IEnumerable/IEnumerator object for a time saving of almost 300% from about 35 times slower than an imperative loop doing the same to about 10 times

slower.

2. The rest of the extra time is spent in manipulating the call stack for the levels of indirection required to call the nested levels of functions for

the Seq.filter and especially Seq.map, which takes about 75% of the remaining extra time.

B/ For the sequence expression as per:

1
  seq { for i = 0 to LSTNDX do if not nonprimes.[&#0;i] then yield i + i + 3 }

This is converted by the F# compiler to a "computational expression" equivalent of the following:

1
  { 0 .. LSTNDX } |> Seq.collect (fun i -> if not nonprimes.[&#0;i] then Seq.singleton (i + i + 3) else Seq.empty)

Which is so very slow for the following reasons:

1. Due to the inefficient implementation of the { 0 .. LSTNDX } range sequence as per the above, with some alleviation of the performance burden by a

custom version. However, the following takes the majority of the time.

2. Due to the extremely long extra processing for the Seq.collect of multiple singleton sequences generated by the test condition as implied by the use

of the F# "yield" keyword, which is extremely inefficient as compared to using the C# "yield return" statement that just causes the C# compiler to

insert the continuation code directly into the MoveNext() method of the resulting iterator implementing the IEnumerable/IEnumerator pair. Just this

Seq.collect processing takes approximately 20 times the time required to do this imperatively. In order to have competitive preformance to the C#

"yield return" statement, the F# compiler would need to automatically emit equivalent code, which saves time on several fronts for this type of

application in that the condition branching logic is inside the MoveNext() method, meaning that there are less calls to the method which greatly reduces

the method calling overhead

Conclusion: if one requires C# type iterator (sequences in F#) performance, then one avoids the yield keyword and does a custom IEnumerable/IEnumerator

object implementation. For the ultimate in speed in tight loops, one avoids functional style programming altogether in favour of the inperative style.

By on 3/6/2011 6:51 PM ()

Well "awful" is always an objective assessment, but what I can say for sure is that it doesn't look "functional". It looks like a solution you'd code in C# or C, translated into F#. Which, while not ideal, is not always bad. That's why F# gives you the ability to write imperative code, because sometimes it's just the best way to do things.

That said, if you want to find a way to make it more functional, one thing you could do is find a way to introduce a recursion in order to replace the two loops. A first attempt at this might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
let getPrimes max =
    let primes = new BitArray(max+1, true)
    let max = max |> uint64

    let rec next_prime n results = 
        if (n>max) then results
        else if (primes. [ n ]) then
            let start = n*n
            if start < max then
                let i = ref (int start)
                while !i < max do 
                    primes.[!i] <- false
                    i := !i + n
            next_prime (n+1UL) (n::results)
        else next_prime (n+1UL) results

    next_prime 2UL []

(Warning, this code may not compile, I don't have a compiler I can test it on where I am). But this is already better, because we removed some state (the results list), and instead changed it to an immutable list data type. Instead of "results.Add n" we only need to do "n::results", which sticks n onto the front of the list in an immutable fashion. Since it is immutable, we obviously can't leave it declared at the top level scope because with no way to modify it, there would be no way to pass information about new primes to each successive recursive call. So instead we add it as an argument to the recursive call, and each time through, if the number is prime we recurse with n::results, and if it's not prime, we just use the original results.

Now to get rid of the ref variable in the inner while loop. Really all you have here is a for loop that a) starts at "start", b) has a step value of "n", and c) stops when the index is greater than or equal to max. F# provides a construct that does exactly this.

for i in [start .. n .. (max-1)] do
primes. [ i ] <- false

Using a few more simplifications, we can now convert the code to:

1
2
3
4
5
6
7
8
9
10
11
12
13
let getPrimes max =
    let primes = new BitArray(max+1, true)
    let max = max |> uint64

    let rec next_prime n results = 
        if (n>max) then results
        else if (primes. [ n ]) then
            for i in [(n*n) .. n .. (max-1)] do
                primes. [ i ] <- false
            next_prime (n+1UL) (n::results)
        else next_prime (n+1UL) results

    next_prime 2UL []

The bit array probably offers the best performance you're going to get, especially in terms of memory usage, but if you really want to take it an extra step, try to change this to an immutable data type as well. Perhaps you could use another list here, and then use the List.filter() function.

By on 2/25/2009 8:06 AM ()

Thanks for a comprehensive answer. I've tried your code, it works and is fast. It is much more 'functional programming friendly' than my second attempt (I accept that the sieve is best solved imperatively and that we should probably keep the mutable BitArray).

However, I don't quite see what has gone wrong with my first attempt. I have an almost identical solution written by a friend in c# and it has very good performance. So my follow on question is:- why does the c# below work well and the f# below not?

public IEnumerable<long> GetPrimes(int max)

{

var nonprimes = new bool[max + 1];

for (long i = 2; i <= max; i++)

{

if (nonprimes[ i ] == false)

{

for (var j = i * i; j <= max; j += i)

{

nonprimes[j] = true;

}

yield return i;

}

}

}

<!--EndFragment-->

let getPrimes1 max =

let primes = new BitArray(max+1, true)

[ for n in [2..max] do

if primes.[int n] then

for i in [int64 n * int64 n..int64 n..int64 max] do primes.[int i] <- false

yield n ]<!--EndFragment--><!--EndFragment-->

By on 2/25/2009 8:40 AM ()

C# iterator blocks are compiled into state automatons, while F# sequences are composed using library components and lamda functions. The C# approach yields better performance, in particular for nested sequence expressions.

When iteration speed really matters, one better stays away from sequences (IEnumerables), since iterating over them costs about 1-2 allocations per iteration loop and 2 interface calls per element. See Joe Duffy's blog post for some numbers.

By on 2/25/2009 9:28 AM ()

(Note that in the <i>next<i> release of F#, F# 'seq' blocks will get compiled into state machines, hurray.)

By on 2/25/2009 9:51 AM ()

You guys are my heroes. [Edited: Removed question about generalization to computation expressions, which only makes limited sense.]

By on 2/25/2009 10:06 AM ()

Thanks to all who responded to this thread. The speed and quality of support on HubFS has proved excellent. You all get a metion in Martin's writeup here [link:blogs.msdn.com]

By on 3/4/2009 11:24 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper