Text Classification
Transformers
PyTorch
TensorBoard
mpnet
Generated from Trainer
text-embeddings-inference
Instructions to use mtyrrell/CPU_Conditional_Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mtyrrell/CPU_Conditional_Classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mtyrrell/CPU_Conditional_Classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mtyrrell/CPU_Conditional_Classifier") model = AutoModelForSequenceClassification.from_pretrained("mtyrrell/CPU_Conditional_Classifier") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -34,15 +34,22 @@ It achieves the following results on the evaluation set:
|
|
| 34 |
|
| 35 |
## Model description
|
| 36 |
|
| 37 |
-
|
| 38 |
|
| 39 |
## Intended uses & limitations
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Training and evaluation data
|
| 44 |
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
## Training procedure
|
| 48 |
|
|
|
|
| 34 |
|
| 35 |
## Model description
|
| 36 |
|
| 37 |
+
The model is a binary text classifier using 'sentence-transformers/all-mpnet-base-v2' and fine-tuned on text sourced from national climate policy documents.
|
| 38 |
|
| 39 |
## Intended uses & limitations
|
| 40 |
|
| 41 |
+
The classifier assigns a class of 'Unconditional' or 'Conditional' to denote the strength of commitments as portrayed in extracted passages from the documents. The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports.
|
| 42 |
+
|
| 43 |
+
Due to inconsistencies in the training data, the classifier performance leaves room for improvement. The classifier exhibits reasonably good training metrics (F1 ~ 0.85), balanced between precise identification of true positive classifications (precision ~ 0.85) and a wide net to capture as many true positives as possible (recall ~ 0.85). When tested on real world unseen test data, the performance was subptimal for a binary classifier (F1 ~ 0.5). However, testing was based on a small out-of-sample dataset containing it's own inconsistencies. Therefore classification may prove more robust in practice.
|
| 44 |
+
|
| 45 |
|
| 46 |
## Training and evaluation data
|
| 47 |
|
| 48 |
+
The dataset is comprised of data from 2 sources:
|
| 49 |
+
- [ClimateWatch NDC Sector data](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1)
|
| 50 |
+
- [IKI TraCS Climate Strategies for Transport Tracker](https://changing-transport.org/wp-content/uploads/20220722_Tracker_Database.xlsx) implemented by GIZ and funded by theInternational Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK).
|
| 51 |
+
|
| 52 |
+
From the first source, we take
|
| 53 |
|
| 54 |
## Training procedure
|
| 55 |
|