Chatting with Don Syme about the F# compiler
In 2018, my team at G-Research had a fruitful meeting with Don Syme about how to contribute to the F# compiler. I made rough notes at the time. They may be out of date by now.
Organisational structure
- Suggestions are for debate; Don goes through every three months or so, and approves or rejects or leaves open.
- RFCs are for things which are approved in principle - but there’s no guarantee that anyone will commit time to them.
- Once they’re approved, they kick around the RFC folder until they’re implemented, at which point they get bucketed into the appropriate historical record of F# version.
RFCs act as primary documentation, since the technical specification of F# is pretty hard to read. It should therefore record drawbacks and alternatives that were considered.
The process for getting a feature to “approved in principle” is simply to ask Don.
Features must be complete when they’re pushed - no partial features that will be extended later. (This is so that we resolve feature interactions up front.)
Why does F# not have functors?
F# doesn’t have a functor system because the process of taking functors out of code once they’re in is very hard. Don wants to add features in such a way that it’s not too hard to unwind a decision to start using them.
Working on the compiler
Sidetrack: he said that when working on the compiler, all the tools basically needed to be the same versions; is that some kind of design decision? Is it going to become less true? I didn’t ask this.
You can always push stuff up to a branch on the FSharp repo to run the CI and verify that existing F# code won’t be broken by your addition to the language.
Basically everything happens in the typechecker. “Pretty much every time we try to do things earlier, we regret it.” This is because of the language machinery, which is wedded heavily to the actual code rather than to any other representation.
Specific notes on adding syntactic sugar for fun x -> x.Prop
During typechecking we would expand the _.Prop
syntax.
Would want to expand to fun (obj : ^T) -> (^T : (member Prop : 'U) obj)
.
This would add a constraint during typechecking, that would be solved automatically.
But this approach doesn’t extend to .Method
instead of .Property
- since generics can’t be placed on that member.
A naive find-replace of “replace .Prop
syntactic sugar with fun x -> x.Prop
” fails because of the example of .Method(2)
- this can’t easily be converted to fun x -> x.Method(2)
.
There is no ability to have “or”-based constraints.
Implementation strategy for the “replace .x
with fun i -> i.x
” feature idea:
- Find how
_.S
is parsed currently (hint: it’s not). - Alter parser so that
_.S
is actually parsed to an alternative. - Look in the typechecker where that AST gets resolved.
- Expand the typechecker and try again.
(Keep track of where our operator’s syntax may legitimately be used, by checking the
pars.fsy
parse file.)
Pitfall to remember: the artificial name x
in fun x -> x.S
needs to not be reported in debug symbols etc.
Specifically during typechecking, Don hasn’t made the decision whether it would be typecheck-then-replace, or replace-then-typecheck.
The F# parser
Useful flags in tooling exist.
fsc --tokenize
, which tells us that_.Length
e.g. doesn’t have an expression form: the tokeniser just seesUNDERSCORE
.fsc --parseonly
to run only the parsing stage.
Specifically to see what happens with our proposed underscore syntax, go to pars.fsy
to see where the underscore parse token appears, and lex.fsl
to see what it lexes to.
We discover that _
is a keyword in the language (see lexhelp.fs
), and it’s declared in pars.fsy
.
Find usages of it in pars.fsy
.
Seems reasonable to parse _
as an identifier (alternative: create a whole new syntax node).
The parser defines a grammar with various types of expression. Atomic (e.g. 1
, true
, (blah)
, 1.0
); other expressions (1+1
, etc).
Atomic expressions have known-in-advance delimitation in the code - e.g. bracketed expressions.
Things which will be parsed along similar code paths to our proposed underscore syntax: base.blah
, global.System.Blah
.
How the typechecker works
The typechecker will first typecheck the thing on the left, and then has a chain of qualifications afterwards (i.e. the type gets refined as we shove more on the right).
E.g. _.Name.Name2.Name3
is typechecked as “_
” and then refined with “.Name
”, and so on.
Type-checking and name resolution are interleaved.
At the end, after typecheck, FindUnsolved.fs
infers the rest of the types.
How to go about implementing applicative computation expression builders
This is issue #579. Editor’s note: this is now implemented as of F# 5.
Tomas Petricek had a wide-ranging change (“joinads”) which would have encompassed this feature plus many more, but it was rejected, because it tried to change the semantics of pattern matching.
Question of whether a builder should have a flag to indicate “I follow the applicative laws” - e.g. for optimising when you could use “map” rather than “bind then return”. Answer: yes, this would be good - for backwards compat, even if for no other reason. This would probably be a new attribute on the map.
An annoyance: not being able to have a crate returning unit
We could go ahead and fix this by using the void
return type on type C () = member x.M() : unit
, and using the unit
return type on everything else.
(Our current workaround is to define a type FakeUnit
which we return instead of unit
.)