Instructions to use InternScience/Agents-K1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use InternScience/Agents-K1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="InternScience/Agents-K1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("InternScience/Agents-K1")
model = AutoModelForMultimodalLM.from_pretrained("InternScience/Agents-K1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use InternScience/Agents-K1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "InternScience/Agents-K1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InternScience/Agents-K1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/InternScience/Agents-K1

SGLang

How to use InternScience/Agents-K1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "InternScience/Agents-K1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InternScience/Agents-K1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "InternScience/Agents-K1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InternScience/Agents-K1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use InternScience/Agents-K1 with Docker Model Runner:
```
docker model run hf.co/InternScience/Agents-K1
```

Agents-K1

Knowledge extraction model in Agents-K1 is a 4B-parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 with GRPO (Group Relative Policy Optimization) on the information-extraction corpus, targeting Named Entity Recognition (NER) and Relation Extraction (RE) in English scientific and general-domain text.

The model produces structured JSON extractions with explicit step-by-step reasoning, enabling its use as a building block in downstream knowledge-graph construction, citation linking, and multi-hop QA pipelines.

Highlights

+3.3 absolute F1 averaged over 10 NER/RE benchmarks vs. the Qwen3-4B-Instruct base model, with gains on every dataset evaluated (including held-out CrossNER domains).
Trained with rule-based rewards (format + JSON validity + entity/relation F1), no human preference data required.
Outputs follow a strict <think>…</think><answer>…</answer> schema, making reasoning auditable and JSON parsing reliable.

Intended use

Designed as an extraction backbone for:

Scientific-literature mining (entities/relations in biomedicine, chemistry, CS, etc.)
Knowledge-graph construction
Pre-processing for retrieval / multi-hop QA systems

Not intended for general-purpose chat — it has been specialized for structured extraction.

Usage

The model uses the same chat template as Qwen3-4B-Instruct and expects a schema-driven user prompt. The reply will contain a <think> block followed by an <answer> block with a JSON object.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "InternScience/Agents-K1"
tok   = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

system = (
    "You are an expert in information extraction. Given a task instruction "
    "with schema definitions and input text, extract the required information.\n\n"
    "You should think step by step about the extraction task, then provide "
    "your answer in JSON format.\n\n"
    "Format your response as:\n"
    "<think>\nYour step-by-step reasoning...\n</think>\n"
    "<answer>\nYour JSON extraction result here\n</answer>"
)

user = (
    "You are an expert in named entity recognition. Please extract entities "
    "that match the schema definition from the input. Return an empty list if "
    "the entity type does not exist. Please respond in the format of a JSON "
    "dictionary.\n\n"
    'Entity types to extract: ["person", "organization", "location"]\n\n'
    "Input text: Marie Curie worked at the University of Paris.\n\n"
    "Please think step by step and respond in the following format:\n"
    "<think>\nYour reasoning process...\n</think>\n"
    "<answer>\nYour JSON extraction result\n</answer>"
)

messages = [{"role": "system", "content": system},
            {"role": "user",   "content": user}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

For RE, replace the user template with Relation types to extract: [...] and a relation-extraction instruction; the output schema is a JSON dict mapping relation types to lists of {head, tail} pairs.

Training data

Training data comes from IEPile, restricted to:

English NER and RE tasks
22 source datasets, mixing scientific (SciERC, GENIA_NER, BC5CDR, BC2GM, BC4CHEMD, AnatEM, NCBI) and general-domain (CoNLL2003, conll04, FabNER, MultiNERD, NYT11, kbp37, …) corpora

Split	Size	Notes
Train	14,400	90/10 split, seed=42; each source capped to balance the mix
Validation	1,600

70% of samples have non-empty gold labels; 30% are empty-label cases (to prevent the model from defaulting to non-empty outputs).

Training procedure

Algorithm: GRPO (PPO without a critic), implemented in veRL.
Reward ∈ [0, 1]:
- format reward: 0.1 · 𝟙[has <think>] + 0.1 · 𝟙[has <answer>]
- JSON validity: 0.1 · 𝟙[valid JSON dict] (or 0.05 for non-dict valid JSON)
- task F1: 0.7 · F1(pred, gold) — entity-set F1 for NER, triple-set F1 for RE

Evaluation

Reported numbers are micro-F1 on each benchmark's official test split, using the same prompt template as training. Gains are base → Agents-K1 (GRPO).

Dataset	Task	n	Base F1	Agent-K1 F1	Δ
CoNLL2003	NER	3,184	0.6547	0.7007	+0.046
NCBI-Disease	NER	937	0.6737	0.7340	+0.060
BC5CDR	NER	4,788	0.7126	0.7494	+0.037
CrossNER — AI (held-out)	NER	430	0.4862	0.5400	+0.054
CrossNER — Literature (held)	NER	416	0.5462	0.5736	+0.027
CrossNER — Music (held)	NER	457	0.5791	0.6050	+0.026
CrossNER — Politics (held)	NER	650	0.6611	0.6855	+0.024
CrossNER — Science (held)	NER	532	0.5928	0.6132	+0.020
SciERC	NER	397	0.1166	0.1270	+0.010
conll04	RE	287	0.2933	0.3181	+0.025
Average			0.5317	0.5647	+0.033

All 10/10 benchmarks improve, including the 5 CrossNER domains that are not in the training mix — evidence of generalization rather than mere fitting to in-distribution sources.

Limitations

Schema-driven prompting required. Free-form questions will likely return malformed JSON; always supply explicit entity / relation type lists.

License

Released under the Apache-2.0 license, following the upstream Qwen3-4B-Instruct-2507 license. Users must also comply with the licenses of the IEPile component datasets when using this model in derivative works.

Downloads last month: 22

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for InternScience/Agents-K1

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1732)

this model