Job Description
Data Scientist - Gen AI
Mckinney, TX (Onsite)
Long Term Contract
Job Details
- Develop and implement enterprise-level Gen AI models and tools for business problem-solving.
- Collaborate with customers to identify and address business challenges through data-driven solutions.
- Build evaluation frameworks to measure LLM efficacy dataset quality and guide product development.
- Research emerging trends and technologies in Gen AI and related data science areas.
- Performed Exploratory Data Analysis to identify key features. Prepared and presented findings to corporate leadership.
- Develop use cases for prioritized business problems. Devise an approach and solution. Identify and collect required datasets and establish milestones and metrics for project success.
- Develop, Maintain RAG Pipelines and Optimize information retrievals for latency.
- Communicate insights to both technical and non-technical stakeholders.
- Conduct descriptive statistical analysis to reveal trends and patterns in customer data.
- Build, validate, and implement predictive models in collaboration with business owners.
- Design pilots, experiments, or surveys to derive insights for solving business problems.
- Interpret complex statistical results in an easy-to-understand language.
- Convert statistical findings into actionable business plans.
- Present analytics or modeling results to both technical and non-technical stakeholders.
Qualifications:
- Minimum bachelor s degree in quantitative fields (mathematics, statistics, data science, physics, computer science, engineering, etc.); master s or Ph.D. preferred.
- Experience solving business problems in the construction and distribution industry.
- 5+ years of experience in data science and machine learning development (2+ years with a master s degree).
- 2+ years of experience in Natural Language Processing (NLP). Experience with Large Language Models (LLM) would be a plus.
- Proficient in utilizing machine learning technology stacks, which encompass various tools and frameworks, including notebook environments.
- Proven expertise in building machine/deep learning models using common frameworks such as PyTorch, TensorFlow, Keras, Scikit-learn, TensorFlow, XGBoost, etc.
- Solid Understanding and preferably working knowledge in Gen Techstack leveraging Langchain, LlamaIndex, React Prompt, Prompt engineering, Vector database, chunking , RAG pipelines, and optimizing Information Retrieval techniques and prompt Engineering Techniques.
- Proficient in Python and experienced with machine learning and NLP processing.
- Working knowledge of Datalake Techstacks ( like Snowflake, Big Query, Databricks) and Vector database.
- 5+ years of experience in data querying languages, scripting languages, or statistical/mathematical software.
- Extensive experience with statistical models and their application in data science.
- Ability to translate efficacy measurements of data science models into tangible business impact metrics.
- Proven knowledge of ML/AI platforms and workflows.
- Experience with data preprocessing techniques for big data containing text and tabular data, including feature engineering, dimensionality reduction, and normalization.
- Familiarity and hands-on experience with advanced ML models, including GPT-3/4, T5 , Claude and BERT.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
- Dice Id: 10220884
- Position Id: 8343771
Job Tags
Contract work,