### What does this PR do?
reduce number of zombie build processes that can happen in CI or when
building locally
### How did you verify your code works?
`add` no longer locks a mutex, and `finish` no longer locks a mutex
except for the last task. This could meaningfully improve performance in
cases where we spawn a large number of tasks on a thread pool. This
change doesn't alter the semantics of the type, unlike the standard
library's `WaitGroup`, which also uses atomics but has to be explicitly
reset.
(For internal tracking: fixes ENG-19722)
---------
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
### What does this PR do?
lets you get useful info out of this script even if you set the number
of attempts too high and you don't want to wait
### How did you verify your code works?
local testing
(I cancelled CI because this script is not used anywhere in CI so it
wouldn't tell us anything useful)
### What does this PR do?
- for these kinds of aborts which we test in CI, introduce a feature
flag to suppress core dumps and crash reporting only from that abort,
and set the flag when running the test:
- libuv stub functions
- Node-API abort (used in particular when calling illegal functions
during finalizers)
- passing `process.kill` its own PID
- core dumps are suppressed with `setrlimit`, and crash reporting with
the new `suppress_reporting` field. these suppressions are only engaged
right before crashing, so we won't ignore new kinds of crashes that come
up in these tests.
- for the test bindings used to test the crash handler in
`run-crash-handler.test.ts`, disables core dumps but does not disable
crash reporting (because crashes get reported to a server that the test
is running to make sure they are reported)
- fixes a panic when printing source code around an error containing
`\n\r`
- updates the code where we clone vendor tests to checkout the right tag
- adds `vendor/elysia/test/path/plugin.test.ts` to
no-validate-exceptions
- this failure was exposed by starting to test the version of elysia we
have been intending to test. the crash trace suggests it may be fixed by
#21307.
- makes dumping core or uploading a crash report count as a failing test
- this ensures we don't realize a crash has occurred if it happened in a
subprocess and the main test doesn't adequately check the exit code. to
spawn a subprocess you expect to fail, prefer `expect(code).toBe(1)`
over `expect(code).not.toBe(0)`. if you really expect multiple possible
erroneous exit codes, you might try `expect(signal).toBeNull()` to still
disallow crashes.
### How did you verify your code works?
Running affected tests on a Linux machine with core dumps set up and
checking no new ones appear.
https://buildkite.com/bun/bun/builds/21465 has no core dumps.
I haven't checked all uses of tryTakeException but this bug is probably
not the only one.
Caught by running fuzzy-wuzzy with debug logging enabled. It tried to
print the exception. Updates fuzzy-wuzzy to have improved logging that
can tell you what was last executed before a crash.
---------
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
### What does this PR do?
Closes#13012
On Linux, when any Bun process spawned by `runner.node.mjs` crashes, we
run GDB in batch mode to print a backtrace from the core file.
And on all platforms, we run a mini `bun.report` server which collects
crashes reported by any Bun process executed during the tests, and after
each test `runner.node.mjs` fetches and prints any new crashes from the
server.
<details>
<summary>example 1</summary>
```
#0 crash_handler.crash () at crash_handler.zig:1513
#1 0x0000000002cf4020 in crash_handler.crashHandler (reason=..., error_return_trace=0x0, begin_addr=...) at crash_handler.zig:479
#2 0x0000000002cefe25 in crash_handler.handleSegfaultPosix (sig=<optimized out>, info=<optimized out>) at crash_handler.zig:800
#3 0x00000000045a1124 in WTF::jscSignalHandler (sig=11, info=0x7ffe044e30b0, ucontext=0x0) at vendor/WebKit/Source/WTF/wtf/threads/Signals.cpp:548
#4 <signal handler called>
#5 JSC::JSCell::type (this=0x0) at vendor/WebKit/Source/JavaScriptCore/runtime/JSCellInlines.h:137
#6 JSC::JSObject::getOwnNonIndexPropertySlot (this=0x150bc914fe18, vm=..., structure=0x150a0102de50, propertyName=..., slot=...) at vendor/WebKit/Source/JavaScriptCore/runtime/JSObject.h:1348
#7 JSC::JSObject::getPropertySlot<false> (this=0x150bc914fe18, globalObject=0x150b864e0088, propertyName=..., slot=...) at vendor/WebKit/Source/JavaScriptCore/runtime/JSObject.h:1433
#8 JSC::JSValue::getPropertySlot (this=0x7ffe044e4880, globalObject=0x150b864e0088, propertyName=..., slot=...) at vendor/WebKit/Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1108
#9 JSC::JSValue::get (this=0x7ffe044e4880, globalObject=0x150b864e0088, propertyName=..., slot=...) at vendor/WebKit/Source/JavaScriptCore/runtime/JSCJSValueInlines.h:1065
#10 JSC::LLInt::performLLIntGetByID (bytecodeIndex=..., codeBlock=0x150b861e7740, globalObject=0x150b864e0088, baseValue=..., ident=..., metadata=...) at vendor/WebKit/Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:878
#11 0x0000000004d7b055 in llint_slow_path_get_by_id (callFrame=0x7ffe044e4ab0, pc=0x150bc92ea0e7) at vendor/WebKit/Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:946
#12 0x0000000003dd6042 in llint_op_get_by_id ()
#13 0x0000000000000000 in ?? ()
```
</details>
<details>
<summary>example 2</summary>
```
#0 crash_handler.crash () at crash_handler.zig:1513
#1 0x0000000002c5db80 in crash_handler.crashHandler (reason=..., error_return_trace=0x0, begin_addr=...) at crash_handler.zig:479
#2 0x0000000002c59f60 in crash_handler.handleSegfaultPosix (sig=<optimized out>, info=<optimized out>) at crash_handler.zig:800
#3 0x00000000042ecc88 in WTF::jscSignalHandler (sig=11, info=0xfffff60141b0, ucontext=0xfffff6014230) at vendor/WebKit/Source/WTF/wtf/threads/Signals.cpp:548
#4 <signal handler called>
#5 bun.js.api.FFIObject.Reader.u8 (globalObject=0x4000554e0088) at /var/lib/buildkite-agent/builds/ip-172-31-75-92/bun/bun/src/bun.js/api/FFIObject.zig:65
#6 bun.js.jsc.host_fn.toJSHostCall__anon_1711576 (globalThis=0x4000554e0088, args=...) at /var/lib/buildkite-agent/builds/ip-172-31-75-92/bun/bun/src/bun.js/jsc/host_fn.zig:97
#7 bun.js.jsc.host_fn.DOMCall("Reader"[0..6],bun.js.api.FFIObject.Reader,"u8"[0..2],.{ .reads = .{ ... }, .writes = .{ ... } }).slowpath (globalObject=0x4000554e0088, thisValue=70370172175040, arguments_ptr=0xfffff6015460, arguments_len=1) at /var/lib/buildkite-agent/builds/ip-172-31-75-92/bun/bun/src/bun.js/jsc/host_fn.zig:490
#8 0x000040003419003c in ?? ()
#9 0x0000400055173440 in ?? ()
```
</details>
I used GDB instead of LLDB (as the branch name suggests) because it
seems to produce more useful stack traces with musl libc.
- [x] on linux, use gdb to print from core dump of main bun process
crashed
- [x] on linux, use gdb to print from all new core dumps (so including
bun subprocesses spawned by the test that crashed)
- [x] on all platforms, use a mini bun.report server to print a
self-reported trace (depends on oven-sh/bun.report#15; for now our
package.json points to a commit on the branch of that repo)
- [x] fix trying to fetch stack traces too early on windows
- [x] use output groups so the traces show up alongside the log for the
specific test instead of having to find it in the logs from the entire
run
- [x] get oven-sh/bun.report#15 merged, and point to a bun.report commit
on the main branch instead of the PR branch in package.json
### How did you verify your code works?
Manually, and in CI with a crashing test.
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>