-
Notifications
You must be signed in to change notification settings - Fork 4
(feat) Type Inference #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
connortsui20
approved these changes
Apr 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to go through the whole algorithm so just doing a quick look over some of the code and tests everything looks fine. I have other concerns about Rust-related things but we have more important things to worry about.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR brings in the long awaited type inference to the DSL. This not only makes rules more ergonomic to write, but it is also needed for correctness in the optimizer.
The engine needs to know what types are
Logical
orPhysical
, as they behave differently in theengine
. Thus, all expressions must be resolved to a type.Furthermore, type inference is also used to verify if field accesses and function calls are valid (as these are type-dependent checks).
Some examples:
Strategy
The type inference works in three phases.
In
from_ast
, during the initialAST
->HIR<TypedSpan>
transformation, we create and add all implicit and explicit type information from the program (e.g. Literals like1
or"hello"
, function annotations, etc.). For types that areUnknown
, we generate a new ID and assign the type to eitherDescending
(for unknown closure parameters and map keys), orAscending
(for all the rest) — more details about these modes in (3).Constraints
are generated in thegenerate.rs
file, which also performs scope-checking now (for convenience, as both code paths were extremely similar). Constraints indicatesubtype
relationship,field_accesses
, and functioncalls
.For example:
let a: Logical = expr
generates the constraint
Logical :> typeof(expr)
During type inference, the method refines unknown types to satisfy subtyping constraints according to their variance:
UnknownAsc
type is encountered as a parent, it is updated to the least upper bound (LUB) of itself and the child type. These types start atNothing
and ascend up the type hierarchy as needed.UnknownDesc
type is encountered as a child, it is updated to the greatest lower bound (GLB) of itself and the parent type. These types start atUniverse
and descend down the type hierarchy as needed.UnknownAsc
type is encountered as a child, its resolved type is checked against the parent type.UnknownDesc
type is encountered as a parent, its resolved type is checked against the child type.This refinement process happens iteratively until the system reaches a stable state where no more unknown types can be refined. At that point, if any constraints remain unsatisfied, the solver reports the most relevant type error.
The key insight of this algorithm is that it makes monotonic progress - each refinement step either:
By tracking whether any types changed during each iteration and continuing until we reach a fixed point, we ensure all types are resolved as completely as possible before reporting any errors.
Limitations
While the algorithm is theoretically correct, it has the following limitations:
Its run time appears to be quadratic. This is not a problem for small programs but might become a compilation bottleneck in the future. Excellent heuristics exist to optimize the order in which constraints get applied.
As commented in the code, when resolving the constraint:
UnknownDesc <: UnknownAsc
We could either dump the left type to
Nothing
, or pump the right type toUniverse
. However, since pumping/dumping types cannot be undone later on (to avoid exponential run-time), it is possible that we over-dump/pump a specific type. A solution would be to ignore these constraints until all other constraints that can be safely applied have run out. This would result in much better empirical type inference.Concat
as the keys are contra-variantlet map: {Animal : I64} = {Dog : 3} ++ {Cat : 2}
would fail under the current type checker.All these above points may be solved by adding specific constraints for each of these, like we did for
field_access
andcall
.span
handling from theparser
is the correct way forward.Testing
Given how hard (and bloated!) it is to test each point in isolation, we simply test whether fully written-out programs pass the type checker or not in
solver.rs
.Error reporting has been tested manually for a variety of programs, and is expected to improve as we start writing rules.
Future work
Focus will be put on the final
HIR
compilation process, which needs to correctly encode identifiedLogical
/Physical
types and reject ambiguous (albeit correctly) inferred types (i.e.Nothing
orUniverse
). It is indeed better practice to enforce type annotations in these scenarios.Note to Reviewers
Don't read the diff, just read the entire
analyzer/types
directory.