Simon van Dyk
commited on
Commit
·
7403338
1
Parent(s):
f98bde0
Add: cost calc explanation
Browse files
README.md
CHANGED
|
@@ -56,6 +56,10 @@ Its lightweight architecture enables fast, large-scale extraction of forecasts,
|
|
| 56 |
<img src="https://huggingface.co/NOSIBLE/prediction-v1.1-base/resolve/main/plots/results.png"/>
|
| 57 |
<p>
|
| 58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
## Class token mapping.
|
| 60 |
|
| 61 |
Because this is a classification model built off the [**Qwen3-0.6B**](https://huggingface.co/Qwen/Qwen3-0.6B), we mapped the `prediction` and `not-prediction` classes onto tokens. This is the mapping we chose.
|
|
|
|
| 56 |
<img src="https://huggingface.co/NOSIBLE/prediction-v1.1-base/resolve/main/plots/results.png"/>
|
| 57 |
<p>
|
| 58 |
|
| 59 |
+
Cost per 1M tokens for the LLMs was calculated as a weighted average of input and output token costs using a 10:1 ratio (10× input cost + 1× output cost, divided by 11), based on pricing from OpenRouter. This reflects the ratio between our prompt used to label our dataset.
|
| 60 |
+
|
| 61 |
+
For the NOSIBLE model, we conservatively used the cost of Qwen-8B on OpenRouter with a 100:1 ratio since the model produces a single output token when used as described in this guide. Despite this, our model is still the cheapest option.
|
| 62 |
+
|
| 63 |
## Class token mapping.
|
| 64 |
|
| 65 |
Because this is a classification model built off the [**Qwen3-0.6B**](https://huggingface.co/Qwen/Qwen3-0.6B), we mapped the `prediction` and `not-prediction` classes onto tokens. This is the mapping we chose.
|