Hugging Face 模型简介
下面是 Hugging Face 上各种模型格式对应的加载方式和推理方法,本文按文件类型分类说明,并给出具体代码示例:
- PyTorch
- Safetensors
- TensorFlow / Keras
- Flax / JAX
- ONNX
- TensorFlow.js
- GGUF(llama.cpp)
- 分词器(Tokenizer)
✅ 1. PyTorch (.bin, .pt, .pth)
使用 Hugging Face Transformers:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("model_dir/")
model = AutoModel.from_pretrained("model_dir/")
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)
model_dir/目录中需要包含:pytorch_model.bin+config.json+ tokenizer 文件
✅ 2. Safetensors (.safetensors)
等价于 PyTorch 权重,但更安全。Transformers 会自动识别,无需特殊处理。
from transformers import AutoModel
model = AutoModel.from_pretrained("model_dir/")
纯 PyTorch 加载:
from safetensors.torch import load_file
state_dict = load_file("model.safetensors")
model = MyModel()
model.load_state_dict(state_dict)
✅ 3. TensorFlow / Keras (.h5, .pb, .ckpt)
Keras .h5:
from tensorflow.keras.models import load_model
model = load_model("model.h5")
result = model.predict(input_data)
SavedModel (saved_model.pb)
import tensorflow as tf
model = tf.saved_model.load("saved_model_directory/")
result = model(input_tensor)
✅ 4. Flax / JAX (.msgpack)
需要使用 transformers 的 Flax 模型接口:
from transformers import FlaxAutoModel
model = FlaxAutoModel.from_pretrained("model_dir/")
✅ 5. ONNX (.onnx)
使用 onnxruntime 推理:
import onnxruntime
import numpy as np
session = onnxruntime.InferenceSession("model.onnx")
inputs = {"input_ids": np.array([[101, 102]])}
outputs = session.run(None, inputs)
✅ 6. TensorFlow.js (model.json + .bin)
在 浏览器中运行:
✅ 7. GGUF (.gguf)
用于 llama.cpp 项目中的大语言模型:
CLI 推理:
./main -m model.gguf -p "Hello, how are you?"
或使用 llama-cpp-python:
from llama_cpp import Llama
llm = Llama(model_path="model.gguf")
output = llm("Hello world", max_tokens=50)
✅ 8. 分词器(Tokenizer)
Hugging Face 的 transformers 和 tokenizers 均可自动加载:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("model_dir/")
inputs = tokenizer("文本", return_tensors="pt")
支持:
tokenizer.jsonvocab.txtmerges.txtsentencepiece.model等
🧠 小结
| 文件类型 | 加载方法 |
|---|---|
.bin |
transformers.AutoModel.from_pretrained() |
.safetensors |
同上(自动识别),或用 safetensors 手动加载 |
.h5 |
tf.keras.models.load_model() |
.pb |
tf.saved_model.load() |
.onnx |
onnxruntime.InferenceSession() |
model.json |
浏览器中用 tf.loadGraphModel() |
.gguf |
llama.cpp 或 llama-cpp-python |
| 分词器相关文件 | AutoTokenizer.from_pretrained() |