Responsibilities Design and build data architecture that transforms raw and processed omics data into harmonized, AI-consumable layers Build and optimize ETL/ELT pipelines that produce denormalized views, pre-computed aggregations, embedding-ready text representations, and feature stores optimized for AI consumption Implement data quality monitoring, automated profiling, and validation checks across harmonization layers Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment Partner with teams to extend harmonization patterns as modalities expand beyond genomics and proteomics into spatial transcriptomics, Perturb-Seq, single-cell, and digital pathology Design and maintain a semantic layer over multi-omics databases that enables AI systems Create schema documentation: table descriptions, column-level annotations, relationship mappings, business logic rules, and domain-specific constraints Develop gold-standard question/SQL pairs for major databases (with computational biologists and Generative AI Engineers) for training, few-shot examples, and evaluation benchmarks Build and maintain a data dictionary and ontology mapping layer translating scientific terms (gene names, pathways, assay types) to physical data storage Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG retrieval Build integration pipelines connecting heterogeneous sources (omics DBs, internal publications, ELNs, assay results, clinical annotations) into a unified queryable layer Develop and enforce metadata standards so new sources are AI-accessible from ingestion Design data products for multiple consumption patterns: direct SQL, ML training feeds, and semantic interfaces for LLM tools Qualifications BS in Computer Science, Data Engineering, Bioinformatics, or related field + 8 years data engineering experience OR MS + 5 years data engineering experience Demonstrated expertise building data pipelines, ETL/ELT workflows, and data products serving downstream AI/ML systems Additional Skills/Preferences PhD in data or related field Strong SQL and experience with complex relational schemas (hundreds of tables, multi-level joins, domain conventions) Experience with lakehouse platforms (Databricks, Snowflake, or equivalent) Experience with dbt, Spark, Airflow, or similar orchestration/transformation frameworks Proficiency in Python for data processing and pipeline development Cloud data platform experience (AWS preferred: Redshift, Athena, Glue, S3, etc.) Familiarity with vector databases, embedding pipelines, or semantic layer tooling Strong communication across engineers and scientists Biomedical/scientific data experience (omics: RNA-seq, proteomics, GWAS; clinical data; LIMS) Experience in pharma/biotech/life sciences Familiarity with biomedical ontologies/controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) Experience building AI/ML-serving data products (feature stores, training datasets, evaluation benchmarks, semantic annotations for text-to-SQL) Knowledge of data governance in regulated industries (lineage, access controls, versioning, auditability) Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph data modeling Deep Databricks ecosystem experience (Unity Catalog, Delta Lake, MLflow, Databricks SQL) Experience designing architectures bridging Nextflow/R/Bioconductor workflows with lakehouse consumption patterns Benefits (as stated) Company bonus (for eligible full-time equivalent employees) Comprehensive benefit program: eligibility for company-sponsored 401(k), pension, vacation, medical/dental/vision/prescription coverage, flexible benefits (e.g., healthcare and/or dependent day care FSA), life insurance/death benefits, time off/leave of absence benefits, and well-being benefits (e.g., employee assistance program, fitness benefits, employee clubs/activities) #J-18808-Ljbffr
Advisor - Scientific Data Engineer
SCORPION THERAPEUTICS
oaxaca de juárez, oaxaca de juárez
Publicado hace 7 días
Denunciar empleo