Skip to content

[Metrics] Install and configure prometheus server on skypilot cluster (cloud as infra source) #5928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 29 commits into
base: metrics-exporters
Choose a base branch
from

Conversation

rohansonecha
Copy link
Collaborator

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

cblmemo and others added 28 commits June 5, 2025 12:53
* UI: add copy buttons to entrypoint/YAML.

* lint

* Revert "lint"

This reverts commit 9272e31.

* lint
* avoid showing cluster yaml for controllers

* format
…5905)

* refactor

* revert change

* pipeline update

* fix

* update name

* bug fix
* Nebius VM network tier

---------

Co-authored-by: Maknee <henry@assemblesys.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Maknee <henry@assemblesys.com>
* Volume mounting for SSH Node Pools

* make cloud optional arg

* Comments

* minor docs
* avoid showing cluster yaml for controllers

* format

* use user yaml

* format

* format

* Minor UX changes

* Duration fixes.

* revert j2 template removal

---------

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
return empty list when no config is loaded
…owed_clouds` (#5729)

* implement

* format

* wip overhaul will break

* prelim

* done
* support setting private workspace and add users to it

* fixes for permission

* Fix private workspace

* format

* update code and fix some issues

* address comments

* Add private badge and avoid checks for no allowed users

* Add user hash to user table

* fix logging

* fix ut and remove useless code

* format

* Add users view in private workspace

* remove useless code

* format

* ignore dashboard modules in dockerignore

* Use user email

* format

* fix manifest

* fix user info overriding

* remove duplicate code

* Fix server user

* only get workspace a user has access to

* Add icon for user role

* Fix the workspace checking user setting

* adjust avatar size

* format

* filter jobs for private workspaces

* fix user fetching

* fix the user role in sidebar

* ui fixes

* Add todos

* minor fix

* only check specific cloud

* fix active job check

* Fix user name resolution and add docs

* fix message

* Add unit test for workspace user resolution

* add unit test

* type

* update code based on latest update

* fix interface

* use config override and avoid skipping workspace test

* revert smoke test

* Add unit test for user resources visibility

* format

* minor

* Add note for auth proxy

* fixes

* format

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
* add serverside check to update config

* Revert "add serverside check to update config"

This reverts commit 0da7f03.

* serverside check
* Yield worker process while waiting for retry

Signed-off-by: Aylei <rayingecho@gmail.com>

* Fix UT

Signed-off-by: Aylei <rayingecho@gmail.com>

* Fix UT

Signed-off-by: Aylei <rayingecho@gmail.com>

* Address review comments

Signed-off-by: Aylei <rayingecho@gmail.com>

* Update sky/server/requests/process.py

Co-authored-by: Christopher Cooper <cooperc@assemblesys.com>

---------

Signed-off-by: Aylei <rayingecho@gmail.com>
Co-authored-by: Christopher Cooper <cooperc@assemblesys.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
@rohansonecha rohansonecha self-assigned this Jun 9, 2025
@rohansonecha rohansonecha force-pushed the cloud-head-prometheus-srv branch from b9b98b1 to 7d9f63c Compare June 9, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.