SymmetricalDataSecurity: Typing Is Hard

Saturday, August 8, 2020

Typing Is Hard

Type Checking and Type Inference

Type checking is the process of taking a given program in some programming language and working out whether all variables and expressions have the correct types, i.e. strings are assigned to strings, arithmetic expressions involve only numbers, etc. Some languages also offer type inference, tasking the compiler with the task of figuring out correct types on its own. Depending on a language’s features the type checking and type inference problems range from trivial to undecidable.

Common terms

Completeness

A type checker is complete if it can check every correctly typed program.

Soundness

A type checker is sound if it only accepts correctly typed programs.

Decidability

A decision problem is decidable if for any input we can compute whether the input satifies the problem in finite time. Examples of decidable problems include primality testing and boolean satisfiability. The halting problem for example is undecidable: We cannot check whether a program runs infinitely long in finite time. We are interested in the type checking and type inference problems for programming languages: Given some input program, does it type check? And given some program, which types should we assign to the expression such that it typechecks?

Hindley-Milner Type System

The Hindley-Milner (HM) type system is a type system for the simple typed lambda calculus with parametric polymorphism, which is used as a base for many functional languages. Type-checking for HM is decidable and efficient, its complexity is linear in the size of the program. Type inference is more difficult than type checking, since the compiler has to solve the type constraints incurred for expressions in the program. It is decidable for the HM type system, however the problem itself is PSPACE-hard and EXPTIME-complete, meaning that in the worst case it needs at least a polynomial amount of extra space and exponential time relative to the input size. Fortunately the type inference algorithm has linear when the nesting depth of polymorphic variables is bounded, as is the case for most applications. There exist many type inference algorithms, the best known one is the so-called Algorithm W. The Hindley-Milner type system since many functional programming languages are based on (variaions of) it.

How Hard is Type-Checking Your Favorite Language?

Below I’ve compiled a list of languages and how hard type checking/type inference is in these languages. If you find a mistake or are missing a language please file an issue with your language, its type checking complexity and optimally a resource that supports your claim. I do not claim completeness nor correctness of the properties shown here, it is mainly an amalgamation of blog posts I’ve been collecting.

C++

undecidable, C++ Templates are Turing Complete, Todd L. Veldhuizen (2003), even its grammar is undecidable

C#

unsound, undecidable, there is an excellent SO answer by Eric Lippert on this topic. Other fun things include a SAT solver using the C# type-checker

Elm

decidable, uses Hindley-Milner, but currently unsound, due to an interesting compiler bug: (String.length " ") ^ (-1) : Int

F#

undecidable, GitHub user cannorin implemented the untyped lambda calculus in F#

Go

decidable, since type inference is only used for variable initialization

Haskell

1998 standard, no extensions: decidable, variant of HM

2010 standard, no extensions: decidable, restriction of System F

with sufficient extensions: undecidable, as type checking in System F is undecidable. There exist Turing Machines in Haskell types. A simpler variant is to implement the SKI calculus.

Idris

decidable, surprisingly. Idris has dependent types, but at compile time it will only evaluate expressions which it knows to be total (terminating and covering all inputs).

Java

undecidable, because Java Generics are Turing complete. Java 8 or later is unsound, as shown by Amin and Tate (2015)

OCaml

undecidable, since we can cause the type checker to loop infinitely

ML

decidable, uses Hindley-Milner

Rust

undecidable, both type inference and type checking since it allows rank-3-types. There’s also a Smallfuck interpreter implemented in traits.

Scala

undecidable, since it admits a type-level SKI calculus, unsound, as shown by Amin and Tate (2015). Scala 2.13.3 (newest as of writing this) also exhibits the same problem.

Swift

undecidable, as proven here by reduction to the word problem for finitely generated groups

TypeScript

undecidable. TypeScript’s type system was proven Turing complete until they disallowed self-referential types. However Robbie Ostrow wrote a program checking the Collatz conjecture, and as the generalized form of the Collatz conjecture is undecidable, the TypeScript type system is undecidable as well.

Zig

undecidable, since evaluation of recursive functions at compile time is possible, thus requiring the compiler to solve the halting problem.

FAQ

Where are Python, Bash, etc.?

Type checking and type inference work primarily for statically typed languages. While there exist extensions to some dynamic languages imbuing them with static type checking these are not part of the language, and the complexity depends not on the language but the extension.

What about unsafe casts?

Some languages offer explicit unchecked casts, which are accepted by the type checker and may potentially fail at runtime. Examples are casting from Object to some subclass in C# and Java, casting values of type interface{} in Go or using unsafeCoerce in Haskell. I’ve chosen not to account for such casts/coercions since unchecked downcasting is an inherently unsafe operation not covered by type checker guarantees.

I’ve spotted a mistake/imprecision, what should I do?

Great work! Please report it on the official issue tracker detailing what is wrong and I will try to fix it as soon as possible.

from Hacker News https://ift.tt/31ji3qv

SymmetricalDataSecurity