I’m James, Head of Engineering at Elicit (I’ve also led/advised ML & Engineering teams at Outschool, Spring, and Square). Posting here because I’m excited to be hiring Elicit’s first Data Engineer.
Elicit is an AI research assistant built for professional researchers and high-stakes decision makers. Instead of just producing LLM slop in a chat bubble, Elicit helps users break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.
The initial focus for our first data engineer will be to own and improve the ETLs we use to process a couple hundred million of research papers. We evaluated Dask for this purpose, but concluded it wasn’t a great fit—I’d love to be convinced that this was a mistake . We’re looking for someone that can help define an architecture that will work right now, be easy to operate, and easy to build on top of.
Moving on from the current ETLs, you’ll figure out how we can scale our data platform to ingest other structured and unstructured documents, spreadsheets and presentations, and rich media like audio and video. We also need your help with ML-adjacent tasks like preparing datasets for fine-tuning our models.
Big picture, your work will help accelerate data ingestion, enable secure enterprise integrations, extend Elicit to work over any kind of data corpus, and more.
We’re a small Series A company (16 people so far) building AI systems that push the boundaries of how LLMs can be useful. We need engineers who care about making AI more useful, trustworthy, and aligned with how people actually make tough decisions.
If you’re interested in applying your Dask expertise to help build AI systems that accelerate scientific progress, please check out the job description—I’m happy to answer any questions or discuss our stack!
(Apologies if I should have categorised / tagged this differently: the “How to use the Dask Discourse” meta post indicates that job postings are OK to share, but doesn’t say if there’s a particular place to put them).