Skip to content

(feat) Type Inference #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Apr 24, 2025
Merged

(feat) Type Inference #79

merged 28 commits into from
Apr 24, 2025

Conversation

AlSchlo
Copy link
Collaborator

@AlSchlo AlSchlo commented Apr 21, 2025

Overview

This PR brings in the long awaited type inference to the DSL. This not only makes rules more ergonomic to write, but it is also needed for correctness in the optimizer.

The engine needs to know what types are Logical or Physical, as they behave differently in the engine. Thus, all expressions must be resolved to a type.

Furthermore, type inference is also used to verify if field accesses and function calls are valid (as these are type-dependent checks).

Some examples:

image

image

image

image

Strategy

The type inference works in three phases.

  1. In from_ast, during the initial AST -> HIR<TypedSpan> transformation, we create and add all implicit and explicit type information from the program (e.g. Literals like 1 or "hello", function annotations, etc.). For types that are Unknown, we generate a new ID and assign the type to either Descending (for unknown closure parameters and map keys), or Ascending (for all the rest) — more details about these modes in (3).

  2. Constraints are generated in the generate.rs file, which also performs scope-checking now (for convenience, as both code paths were extremely similar). Constraints indicate subtype relationship, field_accesses, and function calls.

For example:

let a: Logical = expr
generates the constraint
Logical :> typeof(expr)

  1. The last step to resolve the unknown types is to use a solver (credit to @AarryaSaraf for the base algorithm idea). It works as follows:
// Pseudocode for the constraint solver
function resolve():
    anyChanged = true
    lastError = null

    // Keep iterating until we reach a fixed point (no more changes)
    while anyChanged:
        anyChanged = false
        lastError = null
        
        // Check each constraint and try to refine unknown types
        for each constraint in constraints:
            result = checkConstraint(constraint)
            
            if result is Ok(changed):
                anyChanged |= changed  // Track if any types were refined
            else if result is Err(error, changed):
                anyChanged |= changed  // Still track changes
                lastError = error      // Remember the last error
        
    // After reaching fixed point, return error if any constraint failed
    return lastError ? Err(lastError) : Ok()

During type inference, the method refines unknown types to satisfy subtyping constraints according to their variance:

  • When an UnknownAsc type is encountered as a parent, it is updated to the least upper bound (LUB) of itself and the child type. These types start at Nothing and ascend up the type hierarchy as needed.
  • When an UnknownDesc type is encountered as a child, it is updated to the greatest lower bound (GLB) of itself and the parent type. These types start at Universe and descend down the type hierarchy as needed.
  • When an UnknownAsc type is encountered as a child, its resolved type is checked against the parent type.
  • When an UnknownDesc type is encountered as a parent, its resolved type is checked against the child type.

This refinement process happens iteratively until the system reaches a stable state where no more unknown types can be refined. At that point, if any constraints remain unsatisfied, the solver reports the most relevant type error.

The key insight of this algorithm is that it makes monotonic progress - each refinement step either:

  • Successfully resolves a constraint
  • Refines an unknown type to be more specific
  • Identifies a type error

By tracking whether any types changed during each iteration and continuing until we reach a fixed point, we ensure all types are resolved as completely as possible before reporting any errors.

Limitations

While the algorithm is theoretically correct, it has the following limitations:

  1. Its run time appears to be quadratic. This is not a problem for small programs but might become a compilation bottleneck in the future. Excellent heuristics exist to optimize the order in which constraints get applied.

  2. As commented in the code, when resolving the constraint:

UnknownDesc <: UnknownAsc

We could either dump the left type to Nothing, or pump the right type to Universe. However, since pumping/dumping types cannot be undone later on (to avoid exponential run-time), it is possible that we over-dump/pump a specific type. A solution would be to ignore these constraints until all other constraints that can be safely applied have run out. This would result in much better empirical type inference.

  1. We postpone as future work (Finalize Type Inference #81):
  • Map Concat as the keys are contra-variant let map: {Animal : I64} = {Dog : 3} ++ {Cat : 2} would fail under the current type checker.
  • List pattern matching is still broken.
  • Generic functions are not yet supported.

All these above points may be solved by adding specific constraints for each of these, like we did for field_access and call.

  1. Some error messages are a bit confusing to understand, although a lot of effort has been done to make them already better (e.g. see examples above). There is no silver bullet here: probably improving the span handling from the parser is the correct way forward.

Testing

Given how hard (and bloated!) it is to test each point in isolation, we simply test whether fully written-out programs pass the type checker or not in solver.rs.

Error reporting has been tested manually for a variety of programs, and is expected to improve as we start writing rules.

Future work

Focus will be put on the final HIR compilation process, which needs to correctly encode identified Logical / Physical types and reject ambiguous (albeit correctly) inferred types (i.e. Nothing or Universe). It is indeed better practice to enforce type annotations in these scenarios.

Note to Reviewers

Don't read the diff, just read the entire analyzer/types directory.

@AlSchlo AlSchlo marked this pull request as ready for review April 24, 2025 00:08
@AlSchlo AlSchlo changed the title Alexis/type infer 3 (feat) Type Inference Apr 24, 2025
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 88.79593% with 308 lines in your changes missing coverage. Please review.

Project coverage is 88.3%. Comparing base (70dbd27) to head (6c6659a).

Files with missing lines Patch % Lines
optd/src/dsl/analyzer/errors.rs 8.9% 112 Missing ⚠️
optd/src/dsl/analyzer/types/registry.rs 55.9% 63 Missing ⚠️
optd/src/dsl/analyzer/types/solver.rs 88.0% 53 Missing ⚠️
optd/src/dsl/analyzer/types/generate.rs 81.4% 29 Missing ⚠️
optd/src/dsl/analyzer/types/glb.rs 95.0% 20 Missing ⚠️
optd/src/dsl/analyzer/types/lub.rs 96.9% 14 Missing ⚠️
optd/src/dsl/analyzer/types/subtype.rs 98.3% 13 Missing ⚠️
optd/src/dsl/analyzer/from_ast/expr.rs 89.6% 3 Missing ⚠️
optd/src/dsl/compile.rs 90.9% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
optd/src/dsl/analyzer/context.rs 89.8% <100.0%> (+0.7%) ⬆️
optd/src/dsl/analyzer/from_ast/converter.rs 97.5% <100.0%> (ø)
optd/src/dsl/analyzer/from_ast/pattern.rs 94.5% <100.0%> (+0.2%) ⬆️
optd/src/dsl/analyzer/from_ast/types.rs 97.6% <100.0%> (+1.5%) ⬆️
optd/src/dsl/analyzer/hir.rs 77.5% <ø> (ø)
optd/src/dsl/analyzer/semantic_checks/adt_check.rs 98.2% <100.0%> (ø)
optd/src/dsl/parser/expr.rs 81.4% <100.0%> (+0.1%) ⬆️
optd/src/dsl/utils/span.rs 78.9% <100.0%> (+7.5%) ⬆️
optd/src/dsl/compile.rs 65.4% <90.9%> (+65.4%) ⬆️
optd/src/dsl/analyzer/from_ast/expr.rs 94.5% <89.6%> (+<0.1%) ⬆️
... and 7 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AlSchlo AlSchlo requested review from connortsui20 and SarveshOO7 and removed request for connortsui20 April 24, 2025 19:47
@AlSchlo AlSchlo self-assigned this Apr 24, 2025
Copy link
Member

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to go through the whole algorithm so just doing a quick look over some of the code and tests everything looks fine. I have other concerns about Rust-related things but we have more important things to worry about.

@AlSchlo AlSchlo merged commit d4898a8 into main Apr 24, 2025
12 checks passed
@AlSchlo AlSchlo deleted the alexis/type-infer-3 branch April 24, 2025 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants