Troubleshooting
General simkube-ci-action Debugging
If you are running simkube-ci-action and experiencing failures please review the common issues below. Inspect the
GitHub Action logs for for specific error messages to reference below for both the
launch-runner and run-simulation custom actions. If you are
experiencing issues on the runner it can be helpful to temporarily set keep-alive on the runner to
prevent auto-termination.
Keep Runner Alive
Set keep-alive to true in launch-runner action. This will keep the runner alive after the job completes regardless
of if the job succeeds or fails:
jobs:
launch-runner:
runs-on: ubuntu-latest
steps:
- name: Setup SimKube GitHub Action runner
uses: acrlabs/simkube-ci-action/actions/launch-runner@main
with:
instance-type: m6a.large
aws-region: us-west-2
subnet-id: subnet-xxxx
security-group-ids: sg-xxxx
simkube-runner-pat: ${{ secrets.SIMKUBE_RUNNER_PAT }}
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
keep-alive: true # Turn on keep-alive by adding it here
!!! warning When using keep-alive be sure to manually delete your EC2 instance when you are done debugging as it
will not self-terminate!
To connect to your runner first push an SSH key to it using ec2-instance-connect:
aws ec2-instance-connect send-ssh-public-key \
--instance-id <instance-id> \
--region <region> \
--instance-os-user ubuntu \
--ssh-public-key file://~/path/to/your/public/key.pub \
--profile <your-profile>
Note
send-ssh-public-key creates a temporary key that lasts for 60 seconds so connect via SSH directly after sending your key.
Connect to the instance via SSH:
ssh ubuntu@<instance-public-ip>
If you see “Permission Denied” re-run the ec2-instance-connect command above and SSH into them directly after. Your
key is stored temporarily, it will expire after 60 seconds.
Launch Runner Issues
Typically issues related to the launch-runner action will exit the step with a non-zero exit code. The runner does not
launch in EC2 and the workflow will not proceed to the run-simulation step. For clues look for the specific error
messages in the launch-runner GitHub Action logs. Use the sections below to address the specific error messages.
Invalid AMI
InvalidAMIID.Malformed- indicates a malformed AMI IDInvalidAMIID.NotFound- indicates the AMI is invalid or is not accessible
Marketplace AMIs must be subscribed via Marketplace to to be visible. Confirm you’re using the correct region + AMI ID combination (AMI IDs are region specific identifiers). To check AMI visibility test your AMI & region combination:
aws ec2 describe-images --image-ids <ami-id> --region <region>
AWS AuthFailure
AuthFailure means the provided AWS access credentials were invalid. Check the correct secrets are being passed to
aws-access-key-id and aws-secret-access-key. See Usage for required AWS secrets.
AWS UnauthorizedOperation & AccessDenied
UnauthorizedOperation and AccessDenied errors indicate that the role specified is valid but does not have sufficient
permissions to perform the action. See Usage for permissions or our
example IAM policy
GitHub PAT Issues
Bad credentials or Failed to generate runner token during the Setup SimKube GitHub Action runner step all indicate
issues with the GitHub PAT. Expired PATs, insufficient PAT scope, and invalid PATs will all produce these errors. Ensure
the repo calling the action has the PAT secret configured correctly as $SIMKUBE_RUNNER_PAT, the PAT is unexpired and
has sufficient scope. Details on how to setup the PAT can be found in the Usage
documentation. Additional documentation GitHub PATs
here.
Run Simulation Issues
Runner launches and registers but run-simulation fails with a non-zero exit code. A failure summary will appear in the
run-simulation GitHub Action logs that dumps some cluster state and common debugging aids which can be expanded to see
the output. The failure summary includes the following sections:
- Simulation Status
- Simulation YAML
- sk-ctrl logs
- sk-driver logs
- All nodes in the simkube namespace
- All pods in the simkube namespace
- Recent events (tail=20)
- Get all
Incompatible Driver Versions
For simkube-ci-action we recommend you use the default driver by not specifying one in run-simulation. If you
specify an incompatible driver version sk-driver will fail to start or fail mid-replay.
Hook File Errors
SimKube requires a hooks file to run the simulation. run-simulation uses a default
hooks configuration file.
If you are running custom hooks and seeing “Error: error reading hook path/to/your/hooks”, the hook file was not found. Check the path to your hooks file:
ls path/to/your/hooks
Trace File Errors
Simulation requires a trace artifact to replay.
Trace File Not Found
If simulation fails with the message “ERROR: Trace file not found” in incorrect file path was specified. For more information on traces visit the traces section on the docs. Check the path to your trace and ensure it is a valid trace file:
ls path/to/your/trace
Could Not Parse Trace File
“Error: could not parse trace file”, the trace file path specified is invalid. Validate your trace.
You can validate your trace file by using the skctl CLI utility:
skctl validate check path/to/your/trace
Any trace validation errors will be listed. If the list is empty your trace artifact is ready for replay.
Still stuck, how can I get support?
- open an issue in the SimKube GitHub repo
- message us in the SimKube Slack Channel
- contact us directly