2.8k
views9
comments

Knowing nothing about how F# handles list manipulations and consumption in comparison to Haskell. The question I would like to pose is, does F# need something like stream fusion?

[link:www.cse.unsw.edu.au]

Unfortunately, afaik F# does not support user-defined compiler rewrite rules like Haskell. Therefore, unless Don & co. find time to provide optimizations using this technique, we are not likely to see them implemented in F#.

By the way, that was a great paper: I get how List/Seq/Array code could be made significantly faster with these techniques. I've just added most of the Seq operators to Jscript Enumerator (for work), as well as implemented most of Array/List on Jscript Array, so this was a very timely article. Thanks!

By jhugard on 6/28/2008 7:29 PM (permalink)

That's not entirely true. I believe F#'s annotations feature should give us some of the things we would need. But the question was posed to find out not only if we could implement stream fusion but if it was needed. Does the F# compiler suffer from the problem of generating a lot of intermediate structures to support a lot of the list/seq/array manipulation done in F#? If it does then I think a good feature request would be to enable stream fusion at some level for pure functions.

By birchoff on 6/29/2008 11:03 AM (permalink)

Does the F# compiler suffer from the problem of generating a lot of intermediate structures to support a lot of the list/seq/array manipulation done in F#? If it does then I think a good feature request would be to enable stream fusion at some level for pure functions.

Not sure about allocating intermediate structures, but the following code processes each stage of the pipeline independently: first, all elements are mapped, then all elements are filtered, then all elements go thru reduce_left. From what I read of Stream Fusion, a fusing compiler should be able to iterate over the elements once, and apply a fused set of methods to the stream, saving both iteration overhead as well as intermediate structures.

#light

let test lst =
    lst
    |> List.map (fun x -> x * 2)
    |> List.filter (fun x -> x < 100)
    |> List.reduce_left (+)

test [1;2;3;4;5;6]

Stream Fusion is cool stuff, and a good read... even I could understand it!

-- Jay

By jhugard on 6/30/2008 2:01 PM (permalink)

The only way to make that code better would be to do the following

#light

let test (seq:'a Seq) =
    seq
    |> Seq.map (fun x -> x * 2)
    |> Seq.filter (fun x -> x < 100)
    |> Seq.fold (+)

test seq{for i in 1..6 do yield i}

in which case I imagine each step in the pipeline should process the first thing from the sequence followed by the second and so forth. The problem is filter could consume more than one item out of the pipeline before emitting anything for the rest of the pipeline to make use of. You can essentially do the same thing with lazy lists, if they are described using LazyList.unfold. That should, at least in my mind, work the same way as a Stream. It still however suffers from the filter problem I outlined above which Streams are not susceptible to since they have a Skip Node. Also LazyList's need to memoize the entire results once the head of rest is calculated. Something Stream also doesn't do.

The other thing to make note of the only thing that I saw in the entire paper about rewrite rules was

forall s. stream(unstream s) -> s

asside from that I didnt see any other rewrite rules, at least for the basics shown in the paper. If you needed to build something a bit more complicated where that rewrite rule wouldnt eliminate all the intermediate Step constructor allocations. Then you would need a full rewrite facility. I would actually like to have a sufficient enough understanding of annotations to see if it could be used in some way to implement the rewrite rule facility, till then I guess I can just sit back and dream.

By birchoff on 6/30/2008 8:24 PM (permalink)

I remain unconvinced by geneverov's reply: unless I am mistaken, it still appears that idiomatic F# could benefit quite a bit by applying techniques from the Stream Fusion paper.

Let's look at the native x86 code generated for the following, rewritten from the example above so that it compiles on F# CTP (1.9.6.2).

#light

let test (xs : int seq) =
    System.Diagnostics.Debugger.Break()
    xs
    |> Seq.map (fun x -> x * 2)
    |> Seq.filter (fun x -> x < 100)
    |> Seq.reduce (+)

let v = test ( seq {for i in 1..6 do yield i} )
printfn "%i" v

Stepping through the optomized code shows the following disassembly for the map call:

let test (xs : int seq) =
    System.Diagnostics.Debugger.Break()
    xs
    |> Seq.map (fun x -> x * 2)
00000000  push        esi  
00000001  push        eax  
00000002  mov         dword ptr [esp],ecx 
00000005  mov         esi,edx 
00000007  cmp         dword ptr ds:[000D2E08h],0 
0000000e  je          00000015 
00000010  call        79D47B47 
>00000015  nop              
00000016  mov         eax,esi 
00000018  add         eax,eax 
0000001a  pop         ecx  
0000001b  pop         esi  
0000001c  ret

Stepping again shows the following code for the filter:

let test (xs : int seq) =
    System.Diagnostics.Debugger.Break()
    xs
    |> Seq.map (fun x -> x * 2)
    |> Seq.filter (fun x -> x < 100)
00000000  push        esi  
00000001  push        eax  
00000002  mov         dword ptr [esp],ecx 
00000005  mov         esi,edx 
00000007  cmp         dword ptr ds:[000D2E08h],0 
0000000e  je          00000015 
00000010  call        79D47B17 
>00000015  nop              
00000016  cmp         esi,64h 
00000019  setl        al   
0000001c  movzx       eax,al 
0000001f  pop         ecx  
00000020  pop         esi  
00000021  ret

And, finally, the reduce:

let test (xs : int seq) =
    System.Diagnostics.Debugger.Break()
    xs
    |> Seq.map (fun x -> x * 2)
    |> Seq.filter (fun x -> x < 100)
    |> Seq.reduce (+)
00000000  push        esi  
00000001  push        eax  
00000002  mov         dword ptr [esp],ecx 
00000005  mov         esi,edx 
00000007  cmp         dword ptr ds:[000D2E08h],0 
0000000e  je          00000015 
00000010  call        79D47A8F 
00000015  nop              
00000016  mov         eax,dword ptr [esp+0Ch] 
0000001a  add         eax,esi 
0000001c  pop         ecx  
0000001d  pop         esi  
0000001e  ret         4

This appears to clearly depict overhead for each independent stage in the pipeline. I think the CLR does some magic, by pushing most of the sequence on the stack up front, then unwinding the stack as it calculates each stage and leaving some intermediate values in registers. None-the-less, unnecessary overhead (compared to ideal native code) does clearly exist here.

IMHO, the core concept from the Stream Fusion paper is to transform lists into streams (similar to sequences), yes, but then to <i>also <i>combine (fuse) all the pipeline stages into a <i>single <i>function call that is optomized as a whole. This is possible due to the way each stream "element" is normalized into essentially an "element option", as in Some(element) or None, allowing reduction of filtering down to an if guard. By fusing the pipeline, it should be possible to eliminate the setup for two out of the three calls on each loop iteration, including a number of pushes, pops, and calls. I suspect that inlining and standard optomization could further improve the pipeline performance. Finally, while I did not directly witness superflous objects getting created in this simple example, given the fact that each pipeline stage appeared independent leads me to suspect that there are optomizations and attendent performance gains to be had here, too, for a Stream Fusing compiler. -- Jay

By jhugard on 9/21/2008 6:05 PM (permalink)

IIRC, if you have something like this

map f $ filter g $ xs

the technique in the paper does not (at least not in general) fuse the definitions of f and g.

The usefulness of the paper in Haskell is that it eliminates intermediate constructions of Cons cells.

Compositions of simple seq functions in F# (e.g., Seq.map, Seq.filter) do not cause intermediate data construction because the methods MoveNext and get_Current are similarly composed.

By gneverov on 9/23/2008 7:55 AM (permalink)

The paper presents a technique for eliminating intermediate results in a composition of list functions by transforming list functions into stream functions. For the most part, streams from this paper are seq/IEnumerables in F#. Since F# programmers commonly and directly program in terms of streams, having the compiler transform and optimize list code in terms of streams is a little extravagant for the complexity it adds.

Even if your underlying data structure is an array or list you can still process it as a stream in F# through sub-typing, hence avoiding intermediate results.

let test seq  =
    seq
    |> Seq.map (fun x -> x * 2)
    |> Seq.filter (fun x -> x < 100)
    |> Seq.fold (+) 0
test [| 1..6 |]

The compiler optimizations that make this paper work are not primarily user-defined rewrite rules, but rather standard general-purpose optimizations performed internally by GHC. It is not really feasible for FSC to perform many of these optimizations because it is not a pure language.

IEnumerable's MoveNext and Current are analogous to the stepper functions in this paper. The purpose of Skip is to make the stepper functions non-recursive to enable inlining optimizations. For IEnumerable, MoveNext can be recurisve because we can't optimize it in the same way anyway. Additionally the combination of MoveNext and Current avoid the need to construct an intermediate Step value.

So it's a good idea for Haskell, but not so much for F#. Essentially it's the same old trade-off between imperative and functional. Functional makes code easier to analyse for optimizations, imperative means you can just make the code faster yourself.

By gneverov on 7/3/2008 1:49 AM (permalink)

Thanks for the explaination.

By birchoff on 7/3/2008 6:16 AM (permalink)

Thanks for the explaination.

By birchoff on 7/3/2008 5:28 AM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags