3.0k
views8
comments

Problem with .NET Tuples, Please Talk Me Down

I love tuples; I got used to using them in C++ and I’m glad they are being added to .NET.

If implemented as a native value type, tuples can be as efficient as the underlying types they store. However, in .NET they are implemented as reference types, requiring heap allocation, function calls for member access, etc.

Now, I understand how an orthogonal, language-independent implementation as a reference type has distinct advantages. I don’t dispute this; it is an acceptable trade-off.

Here is my problem: why implement F# tuples as this reference type, instead of as a language construct? I worry that by damaging the efficiency of tuples in this way, it will impede their use in designs where the only relevant factor should be the language, and not concerns about allocation and access efficiency.

But I’m willing to be talked down from this. Why should I use tuples without worrying about efficiency.

-Neil

Tuples in F# are tuples in .Net.

Once upon a time a number of months ago, the F# team did an experiment where the F# compiler was changed to implement tuples as a value type rather than a reference type in the code it generated (I don't recall if this was for all tuples or just for small ones, e.g. 2- and 3-tuples). We then tried running this modified compiler on a large code base (the compiler itself) to get a feel for the performance impact. The result: no noteworthy change. The moral is that it is by no means a foregone conclusion that structs are more performant than references in a given application. There's an awful lot happening in modern runtimes that makes it not always easy to guess/predict what's best. (And there will probably never be a free lunch.)

That said, the F# compiler might choose to remove some tupling as an optimization when it does not affect the observable program behavior. And of course, you may find that defining your own 2-field struct is a useful micro-optimization in certain cases (e.g. perhaps in an inner loop that's doing lots of allocation and where memory pressure/GC is a perf issue). In such cases various language features make it pretty easy to explicitly do this optimization with only minor changes to code (e.g. see below).

As always with perf "optimization", be sure to measure! The results are often unintuitive.

 

let t = (3,"foo")
let (a,b) = t
printfn "%A %A" a b

// If I need a struct as a perf micro-optimization, here's one
type T<'a,'b> = struct
    val public First : 'a
    val public Second : 'b
    public new(x,y) = {First=x; Second=y}
    end
// along with an active pattern
let (|T|) (t:T<'a,'b>) = (t.First,t.Second)

// so that I can just add the 'T' prefix to my struct-tuples
let t2 = T(3,"foo")
let (T(a2,b2)) = t2
printfn "%A %A" a2 b2

By brianmcn on 1/11/2010 8:51 AM (permalink)

Tuples in F# are tuples in .Net.
Once upon a time a number of months ago, the F# team did an experiment where the F# compiler was changed to implement tuples as a value type rather than a reference type in the code it generated (I don't recall if this was for all tuples or just for small ones, e.g. 2- and 3-tuples). We then tried running this modified compiler on a large code base (the compiler itself) to get a feel for the performance impact. The result: no noteworthy change. The moral is that it is by no means a foregone conclusion that structs are more performant than references in a given application. There's an awful lot happening in modern runtimes that makes it not always easy to guess/predict what's best. (And there will probably never be a free lunch.)
That said, the F# compiler might choose to remove some tupling as an optimization when it does not affect the observable program behavior. And of course, you may find that defining your own 2-field struct is a useful micro-optimization in certain cases (e.g. perhaps in an inner loop that's doing lots of allocation and where memory pressure/GC is a perf issue). In such cases various language features make it pretty easy to explicitly do this optimization with only minor changes to code (e.g. see below).
As always with perf "optimization", be sure to measure! The results are often unintuitive.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 

let t = (3,"foo")
let (a,b) = t
printfn "%A %A" a b

// If I need a struct as a perf micro-optimization, here's one
type T<'a,'b> = struct
    val public First : 'a
    val public Second : 'b
    public new(x,y) = {First=x; Second=y}
    end
// along with an active pattern
let (|T|) (t:T<'a,'b>) = (t.First,t.Second)

// so that I can just add the 'T' prefix to my struct-tuples
let t2 = T(3,"foo")
let (T(a2,b2)) = t2
printfn "%A %A" a2 b2

The active pattern version comes with a massive performance hit that will defeat the purpose of using a struct in this case.

By Associat0r on 4/21/2010 9:28 AM (permalink)

Since this thread has gotten bumped, here's an update to the OP:

Since I penned the original post, I've had several months of experience with F#, including looking over the IL generated for various F# constructs, etc. I now feel most of my original angst was unfounded; the compiler does a very good job of optimizing tupled function calls in many commonly used constructs.

-Neil

By TechNeilogy on 4/22/2010 3:16 PM (permalink)

Passing tuples isn't an issue; the F# compiler just turns that into a standard argument list. Problems arise when you try to return a tuple. In this case, returning structs is *much* faster.

Here's a scenario with a particle engine. Some crude (but realistic) benchmark results. I can share code if need be.

I've listed the last numbers before the framerate became consistently below 60fps. Tested on a quad core in Release mode, using PSeq on a ResizeArray of particles.

With tuples for the integration result and gravity calculation result:
Particles: 147967
GC: (486,2,0) // Gen 0, 1, 2 GC counts per second
Framerate: 60

With a struct for the integration result:
Particles: 149967
GC: (219,2,0)
Framerate: 60

With a struct for the gravity result as well as integration result:
Particles: 271967
GC: (12,2,0)
Framerate: 59

Investigating these numbers in CLRProfiler, it turns out that System.Tuple is indeed by far the largest cause of memory pressure. Using the inline keyword didn't make much of a difference.

While I understand that language implementors have to deal with some minor confusion when some tuples are value types and some are reference types, this seems significant enough a performance impediment that pairs and triples should have been structs.

I'm not sure I agree with benchmarking *just* a compiler and concluding that tuples should be reference types. Considering math and physics applications are common targets for F#, this is kind of disappointing.

Could we maybe have an opt-in struct tuple type in a future version -- or maybe optimization for tuple returns?

By reinux on 8/29/2011 6:14 AM (permalink)

Once upon a time a number of months ago, the F# team did an experiment where the F# compiler was changed to implement tuples as a value type rather than a reference type in the code it generated (I don't recall if this was for all tuples or just for small ones, e.g. 2- and 3-tuples). We then tried running this modified compiler on a large code base (the compiler itself) to get a feel for the performance impact. The result: no noteworthy change. The moral is that it is by no means a foregone conclusion that structs are more performant than references in a given application.

When discussing the relative performance of reference and value types on .NET, it's important to point out that the code generated by the JIT for structs is far from optimal. While the quality of the generated code improved with the introduction of the "scalar replacement optimization" for small structs in the .NET 3.5 SP1 (only on the 32-bit CLR) and with the improved inlining in .NET 4, the JIT optimizer still has some severe limitations regarding code involving structs. Hence, if one doesn't obtain the expected speed-up from replacing a reference type with a struct, it's sometimes not because heap allocations on .NET are so fast, but because the code generated for structs is so slow.

Probably the number one issue affecting the performance of struct-tuples is the current JIT's inability to eliminate write barrier calls for structs with reference fields returned from functions. Although rarely mentioned in discussions on GC performance, write barrier calls make up a significant part of the cost of allocation on the CLR. Every time a reference type field is assigned a value, the JIT emits a write barrier call. This is sometimes even true for reference type fields of structs allocated on the stack, most notably in case a struct is the return value of a function. The C# micro-benchmark attached below demonstrates how this affects the performance of a typical use case for tuples. The function returning a struct tuple with three nulls is about three times slower than the function returning three zeros. If you compare these with the timings for the class tuple, you can see that, on average, the write barrier calls are almost as expensive as the heap allocation (incl. collection) itself.

Most uses of tuples in the F# compiler code base probably involve reference type elements. Hence, I'd guess that if you don't see any measurable speedup from using structs for tuples, it's probably because the JIT isn't as smart about eliminating write barrier calls as it should be. You will get different results if you test the struct-tuples with numerical code.

using System;
using System.Runtime.CompilerServices;

namespace Test {
class Program {
    struct Tuple<T1,T2,T3> {
        public T1 Item1;
        public T2 Item2;
        public T3 Item3;

        public Tuple(T1 item1, T2 item2, T3 item3) {
            Item1 = item1;
            Item2 = item2;
            Item3 = item3;
        }
    }

    // struct tuple - value type fields
    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    static Tuple<int, int,int> Test1() {
        return new Tuple<int,int,int>(0, 0, 0);
    }

    // struct tuple - reference type fields
    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    static Tuple<string, string, string> Test2() {
        // the JIT emits write barrier calls emits write barrier calls even for null assignments
        return new Tuple<string,string, string>(null, null, null);
    }

    // class tuple - value type fields
    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    static System.Tuple<int, int,int> Test3() {
        return new System.Tuple<int,int,int>(0, 0, 0);
    }

    // class tuple - reference type fields
    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    static System.Tuple<string, string, string> Test4() {
        // the JIT emits no write barrier calls for null assignments, so we use empty strings
        return new System.Tuple<string,string, string>("", "", "");
    }

    static int TestLoop1() {
        var n = 0;
        for (int i = 0; i < 40000000; ++i) {
            var p1 = Test1();
            n = p1.Item3;
        }
        return n;
    }

    static string TestLoop2() {
        var n = "";
        for (int i = 0; i < 40000000; ++i) {
            var p = Test2();
            n = p.Item3;
        }
        return n;
    }

    static int TestLoop3() {
        var n = 0;
        for (int i = 0; i < 40000000; ++i) {
            var p = Test3();
            n = p.Item3;
        }
        return n;
    }

    static string TestLoop4() {
        var n = "";
        for (int i = 0; i < 40000000; ++i) {
            var p = Test4();
            n = p.Item3;
        }
        return n;
    }

    static int Main(string[] args) {
        var sw = new System.Diagnostics.Stopwatch();
        long ticks1, ticks2, ticks3, ticks4;

        TestLoop2(); // warmup

        sw.Start(); TestLoop1(); sw.Stop();
        ticks1 = sw.ElapsedTicks;
        sw.Reset();

        sw.Start(); TestLoop2(); sw.Stop();
        ticks2 = sw.ElapsedTicks;
        sw.Reset();

        sw.Start(); TestLoop3(); sw.Stop();
        ticks3 = sw.ElapsedTicks;
        sw.Reset();

        sw.Start(); TestLoop4(); sw.Stop();
        ticks4 = sw.ElapsedTicks;
        sw.Reset();

        System.Console.WriteLine("Relative performance:");
        System.Console.WriteLine("struct tuple, value type fields: {0}", 1.0);
        System.Console.WriteLine("struct tuple, reference type fields: {0}", (double)ticks2/ticks1);
        System.Console.WriteLine("class tuple, value type fields: {0}", (double)ticks3/ticks1);
        System.Console.WriteLine("class tuple, reference type fields: {0}", (double)ticks4/ticks1);
        // Relative timings for the 32-bit JIT on an old AMD Opteron (I get similar results on an Intel i-7):
        // struct tuple, value type fields: 1
        // struct tuple, reference type fields: 3.02
        // class tuple, value type fields: 3.19
        // class tuple, reference type fields: 4.83
        // If you want to run the timings on the 64-bit CLR, you should replace the ints with IntPtrs for fairness.
        return 0;
    }
}
}

By Stephan on 1/12/2010 3:57 AM (permalink)

Thanks Stephan; that may explain some of the things I've seen while benchmarking.

I'm not well-versed on garbage collection so I took the time to do some follow-up reading on write barriers -- fascinating!

-Neil

By TechNeilogy on 1/13/2010 7:07 AM (permalink)

Thanks for the replies and the design rationale. That’s the kind of thing I was looking for.

I agree with the observations about the efficiency of .NET reference types. I am continually amazed at the runtime performance even when benchmarked against hand-optimized native C++. I probably should do some F# tuple benchmarks of my own; I’m a “show-me” kind of guy, lol.

In the meantime, here’s the compilation that sparked my concern:

//000007: Console.WriteLine(f0 1 2 3)
  IL_0000:  nop
  IL_0001:  ldc.i4.5
  IL_0002:  call void [mscorlib]System.Console::WriteLine(int32)
//000008: Console.WriteLine(f1(1,(2,3)))
  IL_0007:  ldc.i4.1
  IL_0008:  ldc.i4.2
  IL_0009:  ldc.i4.3
  IL_000a:  newobj instance void class [FSharp.Core]System.Tuple`2<int32,int32>::.ctor(!0,                                                                                          !1)
  IL_000f:  call       int32 Program::f1(int32,
                                         class [FSharp.Core]System.Tuple`2<int32,int32>)
  IL_0014:  call       void [mscorlib]System.Console::WriteLine(int32)
(For some reason, I can't get <code> tags to work.)

The compiler pre-computes the first call (sans tuple) to a constant. This is the kind of behavior typical of modern compilers. The second call is not so optimized, because it would require an internal knowledge of the tuple type (there might be side effects, etc.)

It seems like a safe WAG that the second is less efficient, even when loaded by the JIT. But, as you pointed out, this efficiency might be small given the rational of a consistent implementation of tuples across the platform.

I agree with the observation about structs, it's just that tuples are so handy! They mirror many problems quite nicely.

-Neil

By TechNeilogy on 1/11/2010 9:42 AM (permalink)

Here's the design rationale:

[link:msdn.microsoft.com]

Apparently the performance benefits weren't what they were though to be. I remember doing some mini benchmarks myself and being surprised at how fast the reference types were -- was much different than I expected. Also remember the F# compiler doesn't always construct a tuple just because you have a tuple in code, so that helps in many cases, I'm sure. There was a discussion and a nice explanation by Don Syme on the MSR Fsharp mailing list, but I can't seem to find it now.

Of course there must be scenarios where you'll need a value type -- in that case use a struct?

-Michael

By Michael Giagnocavo on 1/11/2010 8:50 AM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags