test suite is slow because it cannot run in parallel#20094
Draft
jmr wants to merge 1 commit into
Draft
Conversation
1454912 to
20f2db5
Compare
Problem: The test suite takes over 17 minutes serially and cannot
be run with make -j because tests share the testdir/
working directory and create temporary files with
well-known names (X*, test.log, messages, vimcmd).
Solution: Create per-test isolated working directories under
workdir/ using symlink trees that mirror the real
directory layout, keeping temp files isolated while
preserving all relative path assumptions.
(Jesse Rosenstock)
Each test runs in workdir/$test/testdir/ where the symlink tree
preserves relative-path depth: ../ resolves to src/ contents and
../../ resolves to the vim root. The layout is:
workdir/ symlinks to vim root (runtime/, ...)
workdir/$test/ symlinks to src/ contents
workdir/$test/testdir/ symlinks to testdir/ contents, test cwd
Level-1 symlinks are created once in the nolog target rather than
per-test, because ln -s follows existing symlinks to directories
(creating e.g. runtime/runtime in the repo root) and there is no
race-free way to create them from parallel test processes. A
setup_workdir.sh POSIX shell script creates the per-test levels
2 and 3.
The Makefile avoids GNU Make extensions to stay POSIX-compatible:
setup_workdir.sh replaces a define/call macro, shell TESTDIR=`pwd`
replaces $(CURDIR), and recursive $(MAKE) replaces order-only
prerequisites (|).
Tests that navigate directory structure (test_stacktrace.vim,
test_python3.vim) are adapted for the extra workdir/ level.
An alternative approach would eliminate the symlink tree entirely
by rewriting all ~127 relative path references (../ and ../../)
across 35 test files to use absolute paths via g:testdir. This
was rejected because: (1) it touches many files unrelated to the
parallelism fix, creating review noise and merge conflicts;
(2) tab-completion and fnamemodify tests genuinely test relative
path resolution against real directory structure, requiring
per-test fixture directories anyway; (3) future test authors
must remember to avoid ../ in favor of g:testdir, an easy
regression with no enforcement mechanism; and (4) the symlink
approach preserves the natural relative-path idiom that existing
tests use, keeping the diff focused on infrastructure.
On a 48-core machine, wall clock time drops from 17m32s to ~50s.
This supersedes 6146f33 / vim#20082 (patch 9.2.0410: test suite races when
run with parallel make) which added .NOTPARALLEL as a workaround for the
same race conditions.
related: vim#20082
Signed-off-by: Jesse Rosenstock <jmr@google.com>
Co-authored-by: Gemini <noreply@google.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: The test suite takes over 17 minutes serially and cannot
be run with make -j because tests share the testdir/
working directory and create temporary files with
well-known names (X*, test.log, messages, vimcmd).
Solution: Create per-test isolated working directories under
workdir/ using symlink trees that mirror the real
directory layout, keeping temp files isolated while
preserving all relative path assumptions.
(Jesse Rosenstock)
Each test runs in workdir/$test/testdir/ where the symlink tree preserves relative-path depth: ../ resolves to src/ contents and ../../ resolves to the vim root. The layout is:
workdir/ symlinks to vim root (runtime/, ...)
workdir/$test/ symlinks to src/ contents
workdir/$test/testdir/ symlinks to testdir/ contents, test cwd
Level-1 symlinks are created once in the nolog target rather than per-test, because ln -s follows existing symlinks to directories (creating e.g. runtime/runtime in the repo root) and there is no race-free way to create them from parallel test processes. A setup_workdir.sh POSIX shell script creates the per-test levels 2 and 3.
The Makefile avoids GNU Make extensions to stay POSIX-compatible: setup_workdir.sh replaces a define/call macro, shell TESTDIR=$(CURDIR), and recursive $ (MAKE) replaces order-only prerequisites (|).
pwdreplacesThe test_stacktrace.vim Filepath() function is updated to use getcwd() for Xscript files since they are created in the workdir cwd, while runtest.vim is resolved from the original testdir.
An alternative approach would eliminate the symlink tree entirely by rewriting all ~127 relative path references (../ and ../../) across 35 test files to use absolute paths via g:testdir. This was rejected because: (1) it touches many files unrelated to the parallelism fix, creating review noise and merge conflicts; (2) tab-completion and fnamemodify tests genuinely test relative path resolution against real directory structure, requiring per-test fixture directories anyway; (3) future test authors must remember to avoid ../ in favor of g:testdir, an easy regression with no enforcement mechanism; and (4) the symlink approach preserves the natural relative-path idiom that existing tests use, keeping the diff focused on infrastructure.
On a 48-core machine, wall clock time drops from 17m32s to ~50s.
This supersedes 6146f33 / #20082 (patch 9.2.0410: test suite races when run with parallel make) which added .NOTPARALLEL as a workaround for the same race conditions.
related: #20082