Thank you for working on this. I found Okhsl
to be of great utility.
Okay. I updated mold to v2.0.0
. Added "-Z", "time-passes"
to get link times, ran cargo with --timings
to get CPU utilization graphs.
Tested on two projects of mine (the one from yesterday is "X").
Link times are picked as the best from 3-4 runs, changing only white space on main.rs
.
lto="fat" |
lld | mold |
---|---|---|
project X (cu=1) | 105.923 | 106.380 |
Project X (cu=8) | 103.512 | 103.513 |
Project S (cu=1) | 94.290 | 94.969 |
Project S (cu=8) | 100.118 | 100.449 |
Observations (lto="fat"
): As expected, not a lot of utilization of multi-core. Using codegen-units
larger than 1 may even cause a regression in link time. Choice of linker between lld
and mold
appears to be of no significance.
lto="thin" |
lld | mold |
---|---|---|
project X (cu=1) | 46.596 | 47.118 |
Project X (cu=8) | 34.167 | 33.839 |
Project X (cu=16) | 36.296 | 36.621 |
Project S (cu=1) | 41.817 | 41.404 |
Project S (cu=8) | 32.062 | 32.162 |
Project S (cu=16) | 35.780 | 36.074 |
Observations (lto="thin"
): Here, we see parallel LLVM_lto_optimize
runs kicking in. Testing with codegen-units=16
was also done. In that case, the number of parallel LLVM_lto_optimize
runs was so big, the synchronization overhead caused a regression running that test on a humble workstation powered by an Intel i7-7700K processor (4 physical, 8 logical cores only). The results will probably look different running this test case (cu=16) in a more powerful setup. But still, the choice of linker between lld
and mold
appears to be of no significance.
lto=false |
lld | mold |
---|---|---|
project X (cu=1) | 29.160 | 29.231 |
Project X (cu=8) | 8.130 | 8.293 |
Project X (cu=16) | 7.076 | 6.953 |
Project S (cu=1) | 11.996 | 12.069 |
Project S (cu=8) | 4.418 | 4.462 |
Project S (cu=16) | 4.357 | 4.455 |
Observations (lto=false
): Here, codegen-units
becomes the dominant factor with no heavy LLVM_lto_optimize
runs involved. Going above codegen-units=8
does not hurt link time. Still, the choice of linker between lld
and mold
appears to be of no significance.
lto="off" |
lld | mold |
---|---|---|
project X (cu=1) | 29.109 | 29.201 |
Project X (cu=8) | 5.896 | 6.117 |
Project X (cu=16) | 3.479 | 3.637 |
Project S (cu=1) | 11.732 | 11.742 |
Project S (cu=8) | 2.354 | 2.355 |
Project S (cu=16) | 1.517 | 1.499 |
Observations (lto="off"
): Same observations as lto=false
. Still, the choice of linker between lld
and mold
appears to be of no significance.
Debug builds link in <.4 seconds.
codegen-units=1
, debug=true
, varying lto
lto = "fat"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 2:31 | 90.8207MiB | 7.3374MiB |
["-Z", "gcc-ld=lld"] |
2:31 | 91.9731MiB | 7.3332MiB |
linker = "clang" |
2:32 | 90.8207MiB | 7.3375MiB |
linker = "clang"; fuse-ld="mold" |
2:31 | 92.1107MiB | 7.3334MiB |
lto = "thin"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:33 | 96.9630MiB | 8.1695MiB |
["-Z", "gcc-ld=lld"] |
1:32 | 98.3889MiB | 8.1777MiB |
linker = "clang" |
1:33 | 96.9631MiB | 8.1695MiB |
linker = "clang"; fuse-ld="mold" |
1:32 | 98.6903MiB | 8.1797MiB |
lto = false
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:32 | 113.5656MiB | 8.0601MiB |
["-Z", "gcc-ld=lld"] |
1:30 | 115.1210MiB | 8.1122MiB |
linker = "clang" |
1:32 | 113.5656MiB | 8.0602MiB |
linker = "clang"; fuse-ld="mold" |
1:31 | 115.4679MiB | 8.0663MiB |
lto = "off"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:33 | 113.5666MiB | 8.0601MiB |
["-Z", "gcc-ld=lld"] |
1:31 | 115.1231MiB | 8.1122MiB |
linker = "clang" |
1:32 | 113.5667MiB | 8.0602MiB |
linker = "clang"; fuse-ld="mold" |
1:31 | 115.4697MiB | 8.0662MiB |
codegen-units=8
, debug=true
, varying lto
lto = "fat"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 2:21 | 104.9842MiB | 7.6304MiB |
["-Z", "gcc-ld=lld"] |
2:19 | 106.1436MiB | 7.6264MiB |
linker = "clang" |
2:21 | 104.9882MiB | 7.6344MiB |
linker = "clang"; fuse-ld="mold" |
2:19 | 106.2864MiB | 7.6325MiB |
lto = "thin"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:12 | 134.1112MiB | 9.0445MiB |
["-Z", "gcc-ld=lld"] |
1:09 | 136.1897MiB | 9.0660MiB |
linker = "clang" |
1:12 | 134.1113MiB | 9.0446MiB |
linker = "clang"; fuse-ld="mold" |
1:09 | 136.4466MiB | 9.0494MiB |
lto = false
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:14 | 158.1049MiB | 9.0328MiB |
["-Z", "gcc-ld=lld"] |
1:11 | 159.9998MiB | 9.1129MiB |
linker = "clang" |
1:14 | 158.1050MiB | 9.0328MiB |
linker = "clang"; fuse-ld="mold" |
1:12 | 160.3123MiB | 9.0428MiB |
lto = "off"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 0:57 | 145.9463MiB | 9.4586MiB |
["-Z", "gcc-ld=lld"] |
0:54 | 148.6021MiB | 9.6001MiB |
linker = "clang" |
0:57 | 145.9464MiB | 9.4587MiB |
linker = "clang"; fuse-ld="mold" |
0:55 | 148.8842MiB | 9.4668MiB |
mold
appears to be similar but not faster than lld
.
With the caveat that this is not a proper benchmark since:
- I didn't measure link time alone.
- I didn't bother running each case multiple times picking the fastest run (since I perceived the differences to be insignificant).
And a side note, lto = false
appears to be practically useless.
Correct.
Who told you that?