

概括:Sylvain Henry在尝试使用GHC内部设施时惊骇地发现,其内部组件杂乱无章且无明显边界。最后他与几个协作者一同写下了论文Modularzing GHC, 旨在揭示是什么让GHC这样一个由函数式语言编写的大型项目变成充满状态、处处耦合、缺乏可组合性的程序,以及思考如何重构GHC。

第三节Some design defects in GHC主要讲述作者在GHC的code base中挖出的糟糕设计,4、5节以领域驱动的思想重新设计GHC。

GHC 9.6开始逐渐按照该论文的设计重构。

3.1 Shotgun parsing

Shotgun parsing指一种将文本分析和输入验证混合在一起的反模式,这和类型驱动开发中著名的口号parse, don’t validate相悖。一个典型的例子是GHC如今的构建机制Backpack 16. 论文中的设计需要改动Module类型的定义,但是实现者选择用特殊值来代替,理由是改定义会让一些现有的函数变成偏函数,并且需要大量的重写工作。结果就是出现这样的函数:

loadInterface :: SDoc -> Module -> WhereFrom -> IfM lcl (MaybeErr SDoc ModIface)
loadInterface doc_str mod from
| isHoleModule mod
-- Hole modules get special treatment
= ...

评价为:it is saddening to see this kind of code in the flagship Haskell codebase

注:目前(2023/6/2)的Module类型定义是type Module = GenModule Unit, GenModule类型的定义则是

data GenModule unit = Module
   { moduleUnit :: !unit       -- ^ Unit the module belongs to
   , moduleName :: !ModuleName -- ^ Module name (e.g. A.B.C)
   deriving (Eq,Ord,Data,Functor)

isHoleModule函数也改了,类型变成了GenModule (GenUnit u) -> Bool

3.2 Command-line flags (DynFlags)

GHC里面表示命令行参数的是一个巨大的record,类型名叫DynFlags。这个名字最早出现在35fb1e38. 需要显式传递并且在编译时可能发生变化的flag称为动态flag,其余的则是静态flag。原则上来讲静态flag应该用全局变量存放,但是直至这篇论文写作时仍有三个例外:-dppr-debug, -dno-debug-output, -fno-state-hack.

3.2.1 Layering Issues


-- | Creates a ’Literal’ of type @Int#@
mkLitInt :: DynFlags -> Integer -> Literal

事出有因 - Int#具体多大是由目标架构决定的。目标架构信息在哪里呢?自然是由GHC从配置文件中读取到某个DynFlags啦。


可以想象,其他调用mkLitInt的函数将不得不加上一个DynFlags参数(这些代码需要跨平台), 这绝对不利于模块化。

3.2.2 Shotgun parsing DynFlags

许多具体行为依赖命令行flag信息的函数选择用谓词去处理DynFlags参数,调用链过深之后很难搞清楚究竟是哪个函数真的在使用DynFlags. 那么出现这样的注释也不奇怪:

 , hscs_iface_dflags :: !DynFlags
 -- ^ Generate final iface using this DynFlags.
 -- FIXME (osa): I don’t understand why this is necessary,
 -- but I spent almost two days trying to figure this out
 -- and I couldn’t .. perhaps someone who understands this
 -- code better will remove this later.



3.2.3 When immutable becomes mutable


data Env gbl lcl
  = Env {
         env_top :: !HscEnv, -- Top-level stuff that never changes


3.2.4 Why not make DynFlags implicit?

2012年一个把DynFlags包装到Reader Monad里面的点子被加进GHC用于各种pretty-printing,这就是SDoc类型。这个想法的初衷可能是想着让大家眼不见心不烦,但结果让GHC变得更加stateful了。



论文作者花了一些功夫(spent countless hours of work to revert this)试着修正这个设计,例如他们设计了一个叫做OutputableP的typeclass用来提供打印信息时需要的上下文信息(比如目标架构),后发现在2012年有一个叫PlatformOutputable的typeclass被删除了,和OutputableP非常相似。

3.2.5 The genesis of a global mutable DynFlags variable

有些使用SDoc的函数没法拿到DynFlags, 例如一些被用于”静态上下文”中的跟踪函数。


commit ab50c9c527d19f4df7ee6742b6d79c855d57c9b8
Date:  Tue Jun 12 18:52:05 2012 +0100
   Pass DynFlags down to showSDoc
-- tracingDynFlags is a hack, necessary because we need to be
-- able to show SDocs when tracing, but we don’t always have
-- DynFlags available. Do not use it if you can help it.
-- It will not reflect options set by the commandline flags,
-- it may have the wrong target platform, etc. Currently it
-- just panics if you try to use it.
tracingDynFlags :: DynFlags
tracingDynFlags = panic "tracingDynFlags used"


原作者的说法很有意思,他说”this got fixed on the very same day”

commit 37f9861ff65552c2bb6a85c3b27e0228275bc0b6
Date:   Tue Jun 12 23:29:53 2012 +0100
   Make tracingDynFlags slightly more defined

   In particular, fields like flags are now set to the default,
   so at least they will work to some extent.

-- Do not use tracingDynFlags!
-- tracingDynFlags is a hack, necessary because we need to be
-- able to show SDocs when tracing, but we don’t always have
-- DynFlags available. Do not use it if you can help it.
-- It will not reflect options set by the commandline flags,
-- and all fields may be either wrong or undefined.
tracingDynFlags :: DynFlags
tracingDynFlags = defaultDynFlags tracingSettings
   where tracingSettings = panic "Settings not defined in


commit cfb038de5df3fd2521987c143b3e5257d5d20055
Date: Fri Jul 20 19:10:14 2012 +0100

Make tracingSettings have just enough information to get debug output printed
I suspect I have done the wrong thing; I hope someone can improve.

{-# OPTIONS_GHC -fno-warn-missing-fields #-}
-- So that tracingSettings works properly

tracingDynFlags :: DynFlags
tracingDynFlags = defaultDynFlags tracingSettings

tracingSettings :: Settings
tracingSettings = Settings { sTargetPlatform = tracingPlatform }

tracingPlatform :: Platform
tracingPlatform = Platform { platformWordSize = 4, platformOS = OSUnknown }

不出所料地,又有人躺枪了:#7304 — arm-linux: Missing field in record construction DynFlags.sPlatformConstants


3.2.6 When immutable really becomes mutable: GHCi


$ ghc-8.10.5 --interactive
GHCi, version 8.10.5: https://www.haskell.org/ghc/ :? for help
> :set -fexternal-interpreter
> 1
ghc: ghc-iserv terminated (-11) <-- segmentation fault
Leaving GHCi.

让问题更加艰难的是这个世界上有两种flag,一种是交互输入的,另一种是源码里用{-# OPTIONS_GHC ...... #-}传递的。它们被要求和谐相处,但是往往天不遂人愿。

3.3 Top-level session state (HscEnv)


newtype Ghc a = Ghc { unGhc :: Session -> IO a }

-- | The Session is a handle to the complete state of a
-- compilation session. A compilation session consists of
-- a set of modules constituting the current program or
-- library, the context for interactive evaluation, and
-- various caches.

data Session = Session !(IORef HscEnv)

-- | HscEnv is like Session’, except that some of the fields are
-- immutable.

-- An HscEnv is used to compile a single module from plain Haskell
-- source code (after preprocessing) to either C, assembly or C--.
-- It’s also used to store the dynamic linker state to allow for
-- multiple linkers in the same address space. Things like the
-- module graph don’t change during a single compilation.

-- Historical note: \"hsc\" used to be the name of the compiler
-- binary, when there was a separate driver and compiler.
-- To compile a single module, the driver would invoke hsc on
-- the source code... so nowadays we think of hsc as the layer
-- of the compiler that deals with compiling a single module.

data HscEnv = HscEnv
    { hsc_dflags :: DynFlags
    -- ^ The dynamic flag settings
    , hsc_IC :: InteractiveContext
    -- ^ The context for evaluating interactive statements
    , ...

注释已经过时了,2005年后的Hsc表示的是全局的GHC会话(可以是交互式的也可以不是)可变状态。hsc_IC :: InteractiveContext字段存放的是GHCi的状态,其中还包含着GHCi自己的一个DynFlags

3.3.1 HscEnv’s DynFlags


{-# OPTIONS_GHC -static #-}
module Test where

main :: IO ()
main = putStrLn "Hello World"
$ ghc-9.2 Test.hs -dynamic
[1 of 1] Compiling Test
  error: Bad interface file: .../base-
  mismatched interface file profile tag (wanted "", got "dyn")

3.3.2 HscEnv’s caches

HscEnv被当作一个全局的状态存储器使用。例如它包含多个从磁盘读取的模块接口缓存(external package state, EPS)以及会话过程中生成的模块接口(home package table, HPT).

首当其冲的问题是只有一个环境在跨平台/多目标编译时该怎么区别不同环境下的模块(e.g. host vs target, profiling vs non-profiling, dynamic vs non-dynamic).



如果第一次读入的模块包含-fignore-interface-pragmas这个flag(或者使用 -O0 编译隐式开启/关闭这个flag),接口文件会被部分读入缓存(据说是因为性能原因),下一个读取该接口的模块则不出意外地读取到残缺的接口信息(即使该模块未使用此flag)


多了就会泄漏, 泄漏给本不应该得知这些事的模块。



经典计算机科学难题之cache invalidation


3.3.3 Code reuse

就跟DynFlags一样,HscEnv也在GHC的代码库里到处繁殖,这对GHC子组件(type-checker, renamer, desugarer (HsToCore), Core optimizer, most code generators)的复用产生了非常大的妨害.


byteCodeGen :: HscEnv
            -> Module
            -> [StgTopBinding]
            -> [TyCon]
            -> Maybe ModBreaks
            -> IO CompiledByteCode

没有文档,但是类型签名看起来还是好懂的,Module是目标模块,[StgTopBinding]是这个模块中top-level binding的STG表示(STG就是那个”spineless, tagless G-machine”)列表, [TyCon]是此模块内的类型构造子列表,Maybe ModBreaks看起来就有些怪,可能是某些关于断点的数据。

但是为什么需要一个HscEnv呢?为什么结果包裹在一个IO Monad里面?看起来HscEnv提供了



3.4 Interpreter


解释器既支持ByteCode也支持native code,后者要难一点,不仅因为各个平台不太一样,还因为GHC在同一平台有多种ABI(e.g. with profiling enabled or not, dynamically linked or not, etc.)

3.4.1 Internal interpreter

曾经GHC只有一种解释器,称作internal interpreter, 它通过所谓的”运行时链接”运行native code(这个功能由GHC的RTS实现,所以依赖于GHC).

此方式需要native code的ABI和当前所用的GHC一致,所以internal interpreter对跨平台编译/用profiling等方式编译出的native code无能为力。

3.4.2 Avoiding the use of the interpreter


假设我们已经有了一个GHC程序(ghc-stage0),它可以产出old_abi的对象代码,现在用它来编译一个新GHC(ghc-stage1), 它可以产出new_abi的对象代码。显然,ghc-stage1无法使用internal interpreter,因为它编译出的对象代码和它本身的ABI不一致。但是因为GHC的代码中没有需要internal interpreter的特性,可以用stage1编译出stage2,stage2可以使用internal interpreter,它也是一般情况下分发出去的二进制。

悲哀的是对于跨平台编译,这个问题基本没办法解决,直至论文成稿时跨平台的GHC分发的仍是stage1 : https://gitlab.haskell.org/ghc/ghc/-/issues/19174

cross-compilers don’t support compiler plugins : https://gitlab.haskell.org/ghc/ghc/-/issues/14335

3.4.3 Working around “ways”


有些选项是编译期决定的(e.g. tables_next_-to_code),另一些可以运行时配置的选项被称为”ways”.

当GHC使用与编译自身的”Ways”不同的方式编译对象代码时,它实际上就是在做某种”cross-compile”, 因此在这个过程中,它无法使用internal interpreter,Template Haskell也没法用了。

GHC的解决方案是弄两份对象代码,一份与自身ABI相容 - 用于internal interpreter,另一份按用户想法来的用于真正的编译结果。这在文件扩展名上会有点不一样(对象文件。接口文件,archive都适用),比如应用动态链接+profiling的对象文件后缀是.dyn_p.o.

问题来了,不光对象代码,接口文件也有两种,可是HscEnv的缓存里面并没有区分这两种接口文件的机制 - 都放一块了,此处直接贴上wiki原文:

The way this is done currently is inherently unsafe, because we use the profiled .hi files with the unprofiled object files, and hope that the two are in sync.

这导致了例如#15492 — Plugin recompilation check fails when profiling is enabled这样的问题。

还有一点问题就是安装到本地的包不一定会提供所有可能组合的对象文件 - 那就得编译了,有时候会导致编译时间和磁盘占用爆炸,所以这个策略也是有着内在创伤性的

#15394 — GHC doesn’t come with dynamic object files/libraries compiled with profiling

3.4.4 -dynamic-too

为了避免编译两种对象代码导致的时间加倍,又一种hack被发明:-dynamic-too, 加了这个flag会让GHC表现地像一个多目标编译器并同时输出静态与动态链接的对象文件。


-dynamic-too is buggy, slow, and has an ugly implementation



-- #8180 - when using TemplateHaskell, switch on -dynamic-too so
-- the linker can correctly load the object files. This isn’t
-- necessary when using -fexternal-interpreter.
dflags1 = if hostIsDynamic && internalInterpreter &&
             not isDynWay && not isProfWay && needsLinker
          then gopt_set lcl_dflags Opt_BuildDynamicToo
          else lcl_dflags
-- #16331 - when no "internal interpreter" is available but we
-- need to process some TemplateHaskell or QuasiQuotes, we
-- automatically turn on -fexternal-interpreter.
dflags2 = if not internalInterpreter && needsLinker
          then gopt_set dflags1 Opt_ExternalInterpreter
          else dflags1

3.4.5 External interpreter

既然internal interpreter从设计上就缺乏可扩展性,那为什么不搞个外部解释器(iserv)呢. 外部解释器很灵活,只需要一套编译器与解释器之间的通信协议。


结果: #14335 — Plugins don’t work with -fexternal-interpreter



3.5 Plugins and Hooks

一开始的插件只有一种(自定义的Core to Core pass), 后来它逐渐生长分化为各种各样的插件(type-checker, renamer, interface loader, Template Haskell splice modifier)





3.6 Template Haskell


同时,用户没办法让TH用某个包的A版本而GHC用这个包的B版本 - 因为GHC这个单模块环境没法区分。

TH支持执行IO Action, 在使用外部解释器的情况下可能不是特别合适,因为外部解释器可能访问不到所需的文件(e.g. due to sandboxing, remote execution, or execution in a VM)

3.7 The Driver

Driver在GHC里用于协调其他编译器/链接器(is responsible for orchestrating other compil- ers and linkers,这玩意在compiler目录里的名字是main),后来又被扩展用于支持多模块编译(--make)和交互(GHCi), 它的核心类型是HscEnv.



It isn’t independent of GHC-the-program command-line interface


newHscEnv :: DynFlags -> IO HscEnv

It isn’t self-consistent


(1) passing a valid DynFlags value is difficult as its “settings” field has to be properly setup. Most users probably rely on initGhcMonad :: GhcMonad m => Maybe FilePath -> m () in the GHC top-level module or duplicate its code to avoid dealing with the GhcMonad abstraction.

(2) the HscEnv created by this function is useless for most purposes because several fields (unit env, interpreter…) have to be properly initialized, which can only be done with setSessionDynFlags ::GhcMonad m => DynFlags -> m () or the similar setProgramDynFlags)also in the GHC module. Or by duplicating their code.

Documentation is often missing, outdated, or incomplete

(1) GHC.Driver.Main模块在文档中记载为”编译haskell代码的主要API”, 然而它根本没法单独使用。

(2) 没什么函数在注释里标记了自己究竟使用了HscEnv的哪个字段

The interface is inherently unsafe

调整HscEnv的唯一方法是去设置DynFlags, 但是API用户是真不知道自己该改啥。

It isn’t full-featured

一些ghc-lib的客户端(例如:HLS, Haddock)不得不重新实现一个复杂的driver,常常包含大量对原driver的重复。