Abstract
Post-training quantization has become essential for deploying Large Language Models (LLMs) on resource-constrained hardware. This project explores the use of the Posit number system as a quantization target for LLMs, investigating its potential as an alternative to conventional formats such as INT8, FP8, NVFP4. A range of open-source LLMs will be quantized and systematically compared under standardized evaluation benchmarks.
Tasks
- Survey existing quantization methods and the Posit number format; identify relevant open-source LLMs as quantization targets.
- Implement Posit-based quantization and apply it across selected LLMs.
- Evaluate and compare quantized models on standardized benchmarks against established quantization formats (INT8, FP8, NVFP4).
- Analyze results and derive insights on the viability of Posit quantization for efficient LLM inference.
Requirements
- Background in deep learning and familiarity with transformer-based LLMs.
- Experience with Python and PyTorch; experience with HuggingFace Transformers is expected.
- Understanding of quantization fundamentals (PTQ, calibration, per-group scaling).
- Interest in low-level numeric formats and hardware-aware machine learning research.
How to apply
- Please send an email to yue.wu(at)fau.de
- Include a short motivation letter, your CV, and the transcript of your current degree program.
