2.6k
views8
comments

Does anyone miss Unicode in the CTP? Icosahedron example...

Hi,

I love the expressiveness of Unicode over ASCII and use it all the time. Since the CTP we lost much of the ability to use Unicode and I hope very much full Unicode support is reinstated. A simple example - I wanted to implement a nice neat plus-minus operator so:

let inline (±) x = seq { yield -x; yield x}

An icosahedron could then be implemented so:

open List


let r = 1.0
let z = [0.]

let √ = sqrt // Also doesn't work no more :(


///The golden ratio
let φ  = (1.0 + √ 5.0) / 2.0
let rφ = r * φ


///Returns a list of all combinations of points for x, y and z dimensions
let CombineGeometry3D (xs, ys, zs) =


    [ for x in xs do
        for y in ys do
            for z in zs do
                yield (x, y, z) ]


///An icosahedron centred around (0,0,0)
let icosahedron =

 
   [ z,   ±r,  ±rφ ;
     ±r,  ±rφ, z   ;
     ±rφ, z,   ±r  ]


   |> map CombineGeometry3D 
   |> concat

Alas, my current implementation is not as succinct.

Any further ideas about why Unicode was dropped from the CTP or pro and cons would be very useful.

regards,

Danny

Hi,

A good integration of Unicode requires some work. For each character, F# developers need to decide if it is an identifier (é should be treated like letters, as well as your greek letters), a prefix operator (like your ±), an infix operator or maybe a postfix operator? There are also issues with precedence and associativity. As far as I know, the versons before CTP were far from perfect about unicode.

If I remember correctly, it's planned to reintroduce Unicode later.

Laurent.

By LLB on 3/17/2009 4:44 PM (permalink)

MichaelGG - thanks for digging in here.

Unicode characters are readily available from:

Start -> Run -> charmap

or:

Programs -> Accessories > System Tools -> Character Map

Once a Unicode character is used it is a simple task to copy/paste it around. For code that is write once, read many, carefully used Unicode can significantly enhance readability.

I would hope that, at the very least, the Unicode category Sm - Symbol, math would be enabled for our favourite mathematical language.

Laurent - great to hear it's to be reintroduced! The burden is now upon us to communicate to the FSharp team we'd like to use Unicode. Personally I'd like the team to be bold in opening up Unicode categories, but I do understand that careful consideration needs to be given to integration with the .NET platform.

regards,

Danny

By DannyAsher on 3/18/2009 4:51 AM (permalink)

Hi Danny

Michael and Laurent's explanations are correct. Unicode is supported and permitted in identifiers, according to classification of the unicode characters along the same lines as the C# spec.

To allow operators such as +/-, the most important issue that would need to be solved would be a full categorization of default fixities (infix, prefix etc.) for various math symbols. This is not a simple matter. We'd be interested to hear proposals in this regard - the scheme used before the CTP was very adhoc

Kind regards

don

By dsyme on 3/20/2009 8:57 PM (permalink)

I don't think you will get consensus on Unicode beyond basic categorization as letters, numbers, and symbols.

If you want declarative statements in a domain including symbolic math and logic, Unicode is a must.

It seems obvious only when I think it through. Logic, Physics, Math, and Computer Science use symbols differently. Then consider localization. Then consider it in Greece.

The only possibility I see is to do what Prolog did thirty years ago:

:- op(tensor, xfy, 1000).
:- op(degrees, xf, 100).
:- op(phi, fx, 200).

Only substitute the actual symbols in quotes, or force programmers to look up the hexadecimal codes.

I don't know, maybe I'm missing your point. Do you really think it's realistic to split up all Unicode Math Symbols and Other Symbols into a nice grammer without creating spurious controversy and taking time away from more important pursuits?

At the head of a file you could have directives. Any identifier can be an operator, but you could draw the line at letters-numbers for identifiers and symbols only for operators.

If you want you could make it XAML.

By Ryan94114 on 7/25/2009 11:34 PM (permalink)

Hi Don,

After reading the 4.1 Operator Names of the language spec I amended the +/- operator to:

let inline (~+-) x = seq { yield -x; yield x}


let icosahedron =


   [   z,   +-r,    +-rφ ;
     +-r,   +-rφ,     z  ;
     +-rφ,    z,    +-r  ]


   |> map CombineGeometry3D
   |> concat

which works well.

I'd still love to be able to use the ± as well as the common √, ∫, ∏, •, ≤, ≥, € symbols.

Obviously full Unicode support would require a significant design process (further marking out F# as the natural programming language for mathematics, finance and engineering) - but perhaps simply supporting the common symbols in the Lucida Console and Consolas fonts would place a stake in the ground and whet appetites for natural mathematical expression.

Given the ability to determine the fixity of an operator when defined wouldn't requiring that fixity be defined when using Unicode characters from outside the small set currently supported be a possibility?

Would the ability to define whether a token was an operator or a identifier be possible?

I'm rapidly reaching the limits of my understanding, but certainly know what modes of expression I'd like to enable.

regards,

Danny

By DannyAsher on 3/30/2009 10:58 AM (permalink)

I thought they said they took Unicode out of the language to make it more accessible in general, since most people don't know how to get a lot of the symbols used. They must have also changed identifier rules in the process?

Unicode identifiers still work, but they appear to be governed by which category the character is in, and it's different than what you're saing. I don't know the F# compiler, but looking at lex.mll in the source, it looks like it'll describe it. (I'm not sure if this is right, but experimentally it seems to hold true.) For instance, digits are defined as the Unicode category "Nd" (line 82). So you can write this:

let zero = <U+0660, Arabic-Indic Digit Zero, but it doesn't show up in my browser>

And it "works", but doesn't compile since int parsing fails on it. You can suffix it with I to make it a bigint and it compiles fine (crashes at runtime, since it still can't parse it). Guess only [0-9] are really supported as digits :).

For identifiers, it must start with a letter:

let letter = '\Lu' | '\Ll' | '\Lt' | '\Lm' | '\Lo' | '\Nl'

And can be followed also with digits and a few other categories (combining forms and other stuff I don't know how to type). That rules out the identifiers you want. Operator characters are even more constrained, to just what's in the spec.

So it's quite Unicode capable :). Just not the categories you want. (Seems to be following similar rules as C#, I think.) Maybe it was just too much work to spec and have work properly for a commercial product for the benefit it provides?

By Michael Giagnocavo on 3/17/2009 2:29 PM (permalink)

I grabbed 1.9.4.19 source and the rules are very different -- you should check it out to see what all changed. From what I can tell it looks as if almost any UTF8 sequence was allowed, even to start an identifier.

By Michael Giagnocavo on 3/17/2009 3:04 PM (permalink)

whoa! I didn't know that was ever possible.

I'd love to be able to use math symbols in my code. It would be extremely cool to use the language toolbar with the language of ``math''. (in a similar way to pinyin for Chinese)

My concerns would be that:

1) mono and ms compilers handle unicode properly, and

2) interfaces to other .NET languages would work reasonably.

By none on 3/17/2009 2:20 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags