--- license: cc datasets: - clarin-pl/poquad language: - pl base_model: - radlab/polish-qa-v2 pipeline_tag: question-answering library_name: transformers tags: - qa - poquad - quant - bitsandbytes --- ### Model Overview - **Model name**: `radlab/polish-qa-v2-bnb` - **Developer**: [radlab.dev](https://radlab.dev) - **Model type**: Extractive Question‑Answering (QA) - **Base model**: `radlab/polish-qa-v` (`sdadas/polish-roberta-large-v2` fine‑tuned for QA) - **Quantization**: 8‑bit inference‑only quantization via **bitsandbytes** (`load_in_8bit=True`, double‑quantization enabled, `qa_outputs` excluded from quantization) - **Maximum context size**: 512 tokens ### Intended Use This model is designed for **extractive QA** on Polish text. Given a question and a context passage, it returns the most relevant span of the context as the answer. This model is bnb-quantized version of `radlab/polish-qa-v2` model. ### Limitations - The model works best with contexts up to 512 tokens. Longer passages should be truncated or split. - 8‑bit quantization reduces memory usage and inference latency but may introduce a slight drop in accuracy compared with the full‑precision model. - Only suitable for inference; it cannot be further fine‑tuned while kept in 8‑bit mode. ### How to Use ```python from transformers import pipeline model_path = "radlab/polish-qa-v2-bnb" qa = pipeline( "question-answering", model=model_path, ) question = "Co będzie w budowanym obiekcie?" context = """Pozwolenie na budowę zostało wydane w marcu. Pierwsze prace przygotowawcze na terenie przy ul. Wojska Polskiego już się rozpoczęły. Działkę ogrodzono, pojawił się również monitoring, a także kontenery dla pracowników budowy. Na ten moment nie jest znana lista sklepów, które pojawią się w nowym pasażu handlowym.""" result = qa( question=question, context=context.replace("\n", " ") ) print(result) ``` **Sample output** ```json { "score": 0.32568359375, "start": 259, "end": 268, "answer": "sklepów," } ``` ### Technical Details - **Quantization strategy**: `BitsAndBytesStrategy` (8‑bit, double‑quant, `qa_outputs` excluded). - **Loading code (for reference)** ```python from transformers import AutoConfig, BitsAndBytesConfig, AutoModelForQuestionAnswering config = AutoConfig.from_pretrained(original_path) bnb_cfg = BitsAndBytesConfig( load_in_8bit=True, bnb_8bit_use_double_quant=True, bnb_8bit_excluded_modules=["qa_outputs"], ) model = AutoModelForQuestionAnswering.from_pretrained( original_path, config=config, quantization_config=bnb_cfg, device_map="auto", ) ```