下面是 Hugging Face 上各种模型格式对应的加载方式和推理方法,本文按文件类型分类说明,并给出具体代码示例:
PyTorch
Safetensors
TensorFlow / Keras
Flax / JAX
ONNX
TensorFlow.js
GGUF(llama.cpp)
分词器(Tokenizer)
✅ 1. PyTorch (.bin, .pt, .pth)
使用 Hugging Face Transformers:#
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("model_dir/")
model = AutoModel.from_pretrained("model_dir/")
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)
model_dir/目录中需要包含:pytorch_model.bin+config.json+ tokenizer 文件
✅ 2. Safetensors (.safetensors)
等价于 PyTorch 权重,但更安全。Transformers 会自动识别,无需特殊处理。
from transformers import AutoModel
model = AutoModel.from_pretrained("model_dir/")
纯 PyTorch 加载:
from safetensors.torch import load_file
state_dict = load_file("model.safetensors")
model = MyModel()
model.load_state_dict(state_dict)
✅ 3. TensorFlow / Keras (.h5, .pb, .ckpt)
Keras .h5:
from tensorflow.keras.models import load_model
model = load_model("model.h5")
result = model.predict(input_data)
SavedModel (saved_model.pb)
import tensorflow as tf
model = tf.saved_model.load("saved_model_directory/")
result = model(input_tensor)
✅ 4. Flax / JAX (.msgpack)
需要使用 transformers 的 Flax 模型接口:
from transformers import FlaxAutoModel
model = FlaxAutoModel.from_pretrained("model_dir/")
✅ 5. ONNX (.onnx)
使用 onnxruntime 推理:
import onnxruntime
import numpy as np
session = onnxruntime.InferenceSession("model.onnx")
inputs = {"input_ids": np.array([[101, 102]])}
outputs = session.run(None, inputs)
✅ 6. TensorFlow.js (model.json + .bin)
在 浏览器中运行:
✅ 7. GGUF (.gguf)
用于 llama.cpp 项目中的大语言模型:
CLI 推理:
./main -m model.gguf -p "Hello, how are you?"
或使用 llama-cpp-python:
from llama_cpp import Llama
llm = Llama(model_path="model.gguf")
output = llm("Hello world", max_tokens=50)
✅ 8. 分词器(Tokenizer)
Hugging Face 的 transformers 和 tokenizers 均可自动加载:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("model_dir/")
inputs = tokenizer("文本", return_tensors="pt")
支持:
tokenizer.jsonvocab.txtmerges.txtsentencepiece.model等