It's been years since I've messed with NESL, but IIRC the flattening transforms are necessary for performance by reducing the constant factors (which allowed the interpreted VCODE+CVL to perform as well as Fortran+MPI) but don't affect the complexity metrics. But I'm not sure that the standard F# libraries are well suited for this type of data-parallel programming.

As you mentioned, the NESL qsort translates pretty straightforwardly to F#:

1
2
3
4
5
6
7
8
9
let r = new System.Random();
let rec qsort (s : int[]) =
    if s.Length < 2 then s
    else let p = s.[r.Next(s.Length)] in
         let les = Array.filter ((>)p) s
         let eqs = Array.filter ((=)p) s
         let ges = Array.filter ((<)p) s
         let svs = [qsort les ; qsort ges] in
            Array.append svs.[0] <| Array.append eqs svs.[1]

Using the Async and PSeq modules the NESL code can be straightforwardly translated as:

1
2
3
4
5
6
7
8
9
10
11
let rec pqsort (s : int[]) =
    if s.Length < 2 then s
    else let p = s.[r.Next(s.Length)] in
         let parts = Async.Parallel [async {return PSeq.filter ((>)p) s |> PSeq.toArray} ;
                                     async {return PSeq.filter ((=)p) s |> PSeq.toArray} ;
                                     async {return PSeq.filter ((<)p) s |> PSeq.toArray}]
                     |> Async.RunSynchronously
         let les, eqs, ges = parts.[0], parts.[1], parts.[2]
         let svs = Async.RunSynchronously (Async.Parallel [async {return pqsort les} ;
                                                           async {return pqsort ges}])
         Array.append svs.[0] <| Array.append eqs svs.[1]

This should be pretty close to the NESL version's parallelism, if quite a bit wordier. It does make the various flavors of parallelism explicit: parallel filters, the three filters themselves can be executed in parallel, and the recursive sorts can be done in parallel. But it's really impressively slow - when I run this on an array of 100 elements it takes ~ 15 seconds to complete on a 2.26 ghz core duo 2 w/ 4g RAM. And I'm pretty sure that the lack of flattening isn't the source of the slowness. #time shows the following:

1
Real: 00:00:15.028, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0

The flattening transformations would have changed the code into a version that did the filtering and sorting without needing to create the temporary arrays, but there's no GC going on so it's not like this unflattened version is consing itself to death or anything. Now I'm not an F# wizard, so maybe I'm doing something catastrophically wrong with my translation. But it's pretty disappointing to say the least.

By on 8/9/2010 8:42 PM ()

thanks so much for your help. This gets me started.

I'll look in to Parallel Link too.

Dave

By on 8/10/2010 2:20 PM ()

I haven't looked deeply, but for starters I would check out the Parallel LINQ stuff in the PowerPack. PSeq.map and PSeq.filter are some of the needed operations.

By on 8/9/2010 12:59 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper