At Datalab, we train state-of-the-art language models that read documents with human-level accuracy and power the next generation of AI products, workflows, and research.
Our models - Chandra, Surya, and Marker - have become the backbone of document intelligence, with more than 50,000 GitHub stars and adoption across top tier 1 AI research labs, Fortune 500 enterprises, and government agencies.
We've grown to 7-figure ARR with ~7x growth in 2025, driven by a lean, senior team that operates with high autonomy and deep technical ownership.
Backed by founding members of OpenAI, FAIR, and Hugging Face. We move fast, ship often, and we're hiring builders who do the same.
View and apply to our open roles here!
Interested, but don’t see your role? Email us at [email protected]
We believe in the simplest possible technology that gets the job done. Our stack is FastAPI (Python) for the backend and frontend, with some light HTMX and JS sprinkled in. We use Postgres and Redis, and deploy to Render. For our API, training, and inference, we mostly use Python and Pytorch.
We prioritize clarity and maintainability over trends, and we ship code that thousands of developers and enterprises rely on every day.