<p>Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You'll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs.</p><h2>Requirements</h2><ul><li>Strong backend engineering background with experience building and operating data-heavy systems in production.</li><li>Experience deploying and operating services on AWS.</li><li>Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs.</li><li>Familiarity with Node.js and TypeScript. Itâs fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment.</li><li>High standards for data quality. You think carefully about correctness, deduplication, and consistency.</li><li>Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization.</li><li>Proficient in building reliable REST APIs.</li></ul><h2>More useful experience</h2><ul><li>Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXivâ¦)</li><li>Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale).</li><li>Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text.</li><li>Large scale web crawling and scraping.</li></ul><p><strong>Compensation</strong></p><ul><li>Base compensation â¬60,000ââ¬90,000 based on the level of your experience</li><li>Bonus/equity program.<br /></li></ul><p><br /></p><br/><br/>Please mention the word **NOURISH** and tag RMjYwNDoyZGMwOjEwMToyMDA6OjI4NTA= when applying to show you read the job post completely (#RMjYwNDoyZGMwOjEwMToyMDA6OjI4NTA=). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.