Skip to content

Commit

Permalink
fix: missing logs on ci timeout (dragonflydb#3452)
Browse files Browse the repository at this point in the history
The env variables exported when regression tests timeout are not working properly and the if statement on the action step Print last log on timeout would fail to read and upload the files set in /tmp/last_log_file.txt. Furthermore, another problem is the job.timeout argument that kills the whole job/matrix before the upload log step has a chance to run. For that, we need manual timeouts on the workflow similar to what we do in regression tests action.

* remove print last log on timeout action step
* copy the logs on timeout directly within the timeout step
* replace global timeout on CI workflow with timeout command per step


---------

Signed-off-by: kostas <[email protected]>
  • Loading branch information
kostasrim authored Aug 6, 2024
1 parent 420046a commit ff716bb
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 32 deletions.
27 changes: 6 additions & 21 deletions .github/actions/regression-tests/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,12 @@ runs:
# timeout returns 124 if we exceeded the timeout duration
if [[ $code -eq 124 ]]; then
echo "TIMEDOUT=1">> "$GITHUB_OUTPUT";
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 TESTS TIMEDOUT 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
# Copy the last log file because we timedout and pytest did not copy it over
# the /tmp/failed/ folder
cat /tmp/last_test_log_dir.txt | xargs -I {} mv {}/ /tmp/failed/
exit 1
fi
Expand All @@ -50,26 +55,6 @@ runs:
exit 1
fi
- name: Print last log on timeout
if: failure()
shell: bash
env:
TIMEDOUT_STEP_1: ${{ steps.first.outputs.TIMEDOUT }}
TIMEDOUT_STEP_2: ${{ steps.second.outputs.TIMEDOUT }}
run: |
if [[ "${{ env.TIMEDOUT_STEP_1 }}" -eq 1 ]] || [[ "${{ env.TIMEDOUT_STEP_2 }}" -eq 1 ]]; then
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 TESTS TIMEDOUT 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
echo "🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑 🛑"
# It could be the case that the first test failed and the folder was not created. We need mkdir
# therefore so plz do not remove
mkdir /tmp/failed
# Copy over the logs of the test that timedout. We need this because the exception/failure
# handlers do not run when the shell command TIMEOUT sends a SIGTERM to terminate the pytest process.
cat /tmp/last_test_log_dir.txt | xargs -I {} cp -r {}/ /tmp/failed/
fi
- name: Send notification on failure
if: failure() && github.ref == 'refs/heads/main'
shell: bash
Expand Down
18 changes: 8 additions & 10 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ jobs:
cxx_flags: ""

runs-on: ubuntu-latest
timeout-minutes: 60
env:
SCCACHE_GHA_ENABLED: "true"
SCCACHE_CACHE_SIZE: 6G
Expand Down Expand Up @@ -119,7 +118,6 @@ jobs:
df -h
echo "-----------------------------"
ninja src/all
- name: PostFail
if: failure()
run: |
Expand All @@ -130,7 +128,7 @@ jobs:
run: |
cd ${GITHUB_WORKSPACE}/build
echo Run ctest -V -L DFLY
GLOG_alsologtostderr=1 GLOG_vmodule=rdb_load=1,rdb_save=1,snapshot=1 ctest -V -L DFLY
GLOG_alsologtostderr=1 GLOG_vmodule=rdb_load=1,rdb_save=1,snapshot=1 timeout 20m ctest -V -L DFLY
echo "Running tests with --force_epoll"
Expand All @@ -143,20 +141,20 @@ jobs:
EOF
gdb -ix ./init.gdb --batch -ex r --args ./dragonfly_test --force_epoll
FLAGS_force_epoll=true GLOG_vmodule=rdb_load=1,rdb_save=1,snapshot=1 ctest -V -L DFLY
FLAGS_force_epoll=true GLOG_vmodule=rdb_load=1,rdb_save=1,snapshot=1 timeout 20m ctest -V -L DFLY
echo "Finished running tests with --force_epoll"
echo "Running tests with --cluster_mode=emulated"
FLAGS_cluster_mode=emulated ctest -V -L DFLY
FLAGS_cluster_mode=emulated timeout 20m ctest -V -L DFLY
echo "Running tests with both --cluster_mode=emulated & --lock_on_hashtags"
FLAGS_cluster_mode=emulated FLAGS_lock_on_hashtags=true ctest -V -L DFLY
FLAGS_cluster_mode=emulated FLAGS_lock_on_hashtags=true timeout 20m ctest -V -L DFLY
./dragonfly_test
./multi_test --multi_exec_mode=1
./multi_test --multi_exec_mode=3
./json_family_test --jsonpathv2=false
timeout 5m ./dragonfly_test
timeout 5m ./multi_test --multi_exec_mode=1
timeout 5m ./multi_test --multi_exec_mode=3
timeout 5m ./json_family_test --jsonpathv2=false
- name: Upload unit logs on failure
if: failure()
Expand Down
6 changes: 5 additions & 1 deletion tests/dragonfly/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
DATABASE_INDEX = 0
BASE_LOG_DIR = "/tmp/dragonfly_logs/"
FAILED_PATH = "/tmp/failed/"
LAST_LOGS = "/tmp/last_test_log_dir.txt"


# runs on pytest start
Expand All @@ -56,7 +57,7 @@ def pytest_runtest_setup(item):
item.log_dir = test_dir

# needs for action.yml to get logs if timedout is happen for test
last_logs = open("/tmp/last_test_log_dir.txt", "w")
last_logs = open(LAST_LOGS, "w")
last_logs.write(test_dir)
last_logs.close()

Expand Down Expand Up @@ -378,6 +379,9 @@ def copy_failed_logs(log_dir, report):
logging.error(f"🪵🪵🪵🪵🪵🪵 {file} 🪵🪵🪵🪵🪵🪵")
shutil.copy(file, test_failed_path)

# Clean up
os.remove(LAST_LOGS)


# tests results we get on the "call" state
# but we can not copy logs until "teardown" state because the server isn't stoped
Expand Down

0 comments on commit ff716bb

Please sign in to comment.