feat(catalog): Add RunPod data fetcher #5930
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a new data fetcher script to automatically generate RunPod instance catalog data. The script queries the RunPod API to fetch GPU types, pricing, and availability information.
Verified all previously existing Accelerator Names are in the newly generated CSV.
Additionally, includes support for previously missing GPUs:
Includes all available quantities of GPUs (previously did not include all permutations)
The script requires a RunPod API key with read access and generates a CSV file compatible with SkyPilot's catalog system.
For each GPU, a CSV catalog entry is created for every region (hardcoded list of regions) for every available quantityFor example, the api returns
availableGpuCounts
like[1,2,3,4]
to indicate you can request an instance with up to 4 GPUs. Thus, the GPU is listed in the catalog 100 times: gpu quantity options (4) * number of regions (25)Verification
For testing, I locally generated the catalog CSV:
Then I compared all the generated CSVs accelerator names to the ones in the existing v7 catalog CSV:
Thus, all originally available accelerators are available (with an unchanged name) plus three more (
RTX2000-Ada
,RTX5090
,RTXPRO6000
).For a final gut check, I manually compared lines for some specific quantity-GPU-region tuples.
Besides expected fluctuation of prices, there are some differences in the
vCPU
andMemoryGiB
columns. It's unclear to me why they are different. To the best of my knowledge, the values coming from the API and used by this script are correct.Example difference:
Previously, L40 was listed as having 64vCPUs and 192GiB RAM. I confirmed in the API & UI of RunPod that the vCPUs and memory for an instance with 4x L40s matches the newly generated values:
Similar differences can be seen in other GPUs, like the A100.
Tested (run the relevant ones):
bash format.sh
/smoke-test
(CI) orpytest tests/test_smoke.py
(local)/smoke-test -k test_name
(CI) orpytest tests/test_smoke.py::test_name
(local)/quicktest-core
(CI) orpytest tests/smoke_tests/test_backward_compat.py
(local)