Job Description
Austin, Texas client is seeking an Applied AI Architect with deep experience bridging LLM/SLM model research and enterprise productization. You will lead the technical direction from model architecture selection, fine-tuning, and optimization to deployment and observability, shaping the next generation of agentic AI for cybersecurity. This role demands both foundational knowledge and production practicality — designing and validating novel approaches, then translating them into reliable, scalable solutions deployed in the client's platform.
What You Will Be Doing:
- Drive research-to-production of LLM/SLM systems — from design and fine-tuning to evaluation, deployment, and continual adaptation in enterprise agent workflows.
- Lead technical choices — determine when to apply context engineering, prompt tuning, continued pretraining, supervised fine-tuning, reasoning fine-tuning, LoRA, or RL.
- Architect high-performance inference and serving using vLLM, NVIDIA NIM, Triton, CUDA, or other optimized frameworks.
- Integrate reinforcement learning frameworks (veRL, SkyRL, PyTorch, Ray RLlib) to enhance reasoning, adaptability, and agent feedback loops.
- Develop and operationalize AI Ops pipelines — build benchmarks and metrics for model evaluation, observability, drift detection, and lifecycle automation.
- Advance agent interoperability using A2A (Agent-to-Agent) or MCP (Model Context Protocol) for large-scale coordination.
- Collaborate with cybersecurity researchers to embed threat reasoning, anomaly detection, and defensive logic directly into model behavior.
- Publish, document, and codify reusable AI blueprints for hybrid (cloud + on-prem) deployments and future research acceleration.
- Proven end-to-end experience bringing LLM/SLM research into production — from fine-tuning and inference optimization to evaluation and AI Ops integration. Excellent knowledge of at least one of the following:
- Deep understanding of data-model-infrastructure trade-offs and optimization under real business constraints.
- Hands-on experience fine-tuning LLMs using frameworks such as LLaMA Factory, NeMo, and PEFT (e.g., LoRA)
- Strong knowledge of GPU-accelerated inference (ex: vLLM, NIM, Triton, CUDA, NCCL, PyTorch/XLA).
- Familiarity with AI Ops toolchains (ex: Weights & Biases, MLflow, Ray Serve).
- Proficiency in Python, and experience building containerized AI microservices (ex: Docker, Kubernetes, Ray).
- 8+ years of software engineering or research engineering experience, including the most recent 3 years focused on applied AI/ML and deploying LLM/SLM systems in production at enterprise scale.
- Proven experience as a Senior technical lead or architect, driving end to end design, roadmap decisions, and productization of AI systems.
- Deep expertise in cloud-native architecture across AWS, Azure, or GCP
- Experience in mentoring senior engineers, reviewing technical designs, and establishing engineering best practices
- Demonstrated success in building scalable infrastructure and launching LLM/SLM-based features and agent systems within enterprise platforms.
- Expertise in quantization, distillation, or GPU profiling to lower inference cost.
- Clear conceptual understanding of when to fine-tune vs prompt-engineer vs use RLHF — and evidence of having applied each effectively.
- Familiarity with agentic frameworks (LangChain, AWS Strands, AutoGen, etc).
- Deep understanding of A2A/MCP protocols for interoperable multi-agent systems.
- Research-driven yet delivery-focused — capable of balancing innovation with practical deployment.
- Data- and results-oriented — every hypothesis must be measurable.
- Ownership mentality — from exploration and experiment to evaluation, optimization, and monitoring.
- Passionate about turning AI research into defensible, intelligent, and proactive cybersecurity systems.
#LI-Hybrid

