About Me
I’m Tuan. As a seasoned AI Software Engineer with a deep passion for MLOps/LLMOPS, I bring a comprehensive skill set that encompasses efficient and saving cost Prompt with OpenAI or some open-source model: LLaMa, Mistral, Gemini…. I have strong skills in Algorithm and Data Structure, fine-tuning LLM model with LoRa, RLHF with multi-objective metrics, and a robust understanding of multi-model serving including Concurrent Model inference, Sequence Batching, Dynamic Batching accelerated by Triton and TensorRT. My technical acumen extends to accelerating model serving through the innovative use of GPUs and TPUs, coupled with extensive experience in containerization and orchestration using Docker, Kubernetes, Terraform, Jenkins, Nginx, Grafana, and Prometheus.
https://github.com/ngtranminhtuan/LLMOPS
https://github.com/ngtranminhtuan/GPT
https://github.com/ngtranminhtuan/llm_serve
At the core of my expertise lies the ability to architect and implement MLOps pipelines(CI/CD) for continuous improve AI service quality including dev/train/eval/scalable-deploy and monitoring systems. So I can update SOTA models quickly in one-click. My proficiency in UI development, utilizing both Flutter and ReactJS, allows for the creation of intuitive and responsive user interfaces, enhancing the end-user experience. Below in my RAG documents manager tool( PDF, Images OCR, docx, power point,CSV, Excel…) to query using Flutter(or ReactJS), Langchain, Ragas, LLaMa 7B.
My skill set includes writing meticulous test cases with PyTest, ensuring the reliability and robustness of code. Furthermore, I specialize in developing high-concurrency APIs with FastAPI, a testament to my ability to handle complex, scalable systems designed for efficiency and speed. My approach to model optimization is methodical, employing Quantization and Pruning techniques to refine model performance while minimizing resource consumption. As you can see in my Youtube channel link, I can deploy 8 models in SMALL hardware computation.
https://www.youtube.com/watch?v=j5Uauq2g7HU&t=46s