NVIDIA GTC Paris Workshop Teaches Developers to Enhance Open Source LLMs for Domain-Specialized and Truly Multilingual AI
NVIDIA's Deep Learning Institute (DLI) is addressing a critical gap in the field of artificial intelligence by offering a new hands-on workshop, "Adding New Knowledge to LLMs," at the NVIDIA GTC Paris conference. The workshop aims to equip developers with the skills needed to transform open-source large language models (LLMs) into highly capable, domain-specialized, and genuinely multilingual AI assets. This initiative comes at a time when multilingual AI is increasingly crucial for businesses operating globally, yet existing LLMs often fall short in accurately handling specialized knowledge, niche technical domains, and the linguistic and cultural diversity required in global operations. The Importance of Multilingual Model Evaluation Model evaluation is essential for ensuring that AI systems perform effectively and reliably. For LLMs, which depend heavily on natural language processing, multilingual evaluation is particularly vital. Many models, including popular ones like Llama 2, are predominantly trained on English data, sometimes less than 5% in other languages. This imbalance can lead to inaccurate and biased results, creating significant deployment issues. Furthermore, the lack of a shared, homogeneous dataset for all 24 EU languages and their local variants makes it challenging to compare benchmark scores. Machine-translated benchmarks often introduce unnatural phrases, skewing test results. Discriminative tasks, like multiple-choice questions, dominate existing benchmarks, leaving generative tasks such as summarization and open-ended QA underrepresented, despite their importance in real-world applications. Surface-level metrics like BLEU and ROUGE can penalize valid word order variations, further complicating the evaluation process. True fluency involves multiple dimensions, including grammar, vocabulary, cultural competence, domain knowledge, discourse, bias, time relevance, dialectal variation, script handling, and long-form consistency. Current tests often fail to adequately cover these aspects. Challenges in Multilingual Model Training and Evaluation The workshop at GTC Paris tackles several major challenges in training and evaluating multilingual AI models: Fragmented Benchmarks: There is no single, comprehensive dataset covering all EU languages and their variants, making it difficult to standardize evaluation metrics. Translation Artifacts: Machine-translated benchmarks often introduce errors that can lead to misleading performance scores. Task Imbalance: Generative tasks, which are more common in practical applications, are underrepresented in current evaluations. Metric Pitfalls: Simplistic metrics can overlook valid linguistic variations and amplify biases. Comprehensive Proficiency: Achieving true multilingual fluency requires addressing numerous dimensions simultaneously, which is currently underrepresented in evaluation methods. Workshop Details: Adding New Knowledge to LLMs "Adding New Knowledge to LLMs" is a full-day, instructor-led workshop designed to empower developers with the tools and skills needed to customize LLMs for specific domains and languages. The workshop consists of four key tasks: Systematic Evaluation and Dataset Creation: Participants will learn to create custom evaluation benchmarks using NVIDIA NeMo Evaluator. They will identify the limitations of an LLM in understanding specialized domain concepts and in its performance across various languages, defining metrics that capture the nuances of their specific use cases. Advanced Data Curation: Using NeMo Curator, attendees will implement data cleaning and preparation pipelines to assemble high-quality datasets tailored to their needs. This includes sourcing niche data and handling the complexities of multiple languages, scripts, and cultural contexts. Targeted Knowledge Injection: The workshop covers techniques to infuse an LLM with new knowledge and capabilities, enhancing its expertise and global reach. Model Optimization for Domain and Language: Attendees will apply advanced optimization techniques using NVIDIA NeMo Model Optimizer and NVIDIA TensorRT-LLM. These methods aim to reduce inference costs and improve operational efficiency while maintaining high performance on specialized tasks and preserving robust capabilities across targeted languages, including those with limited resources. Upon completing the workshop, participants will be able to develop, deploy, and operate AI systems that are not only tailored to their domain requirements but also genuinely multilingual, providing more accurate, relevant, and culturally resonant experiences to a diverse global audience. Real-World Impact of Advancing Multilingual AI NVIDIA collaborates with organizations worldwide to improve datasets and models for robust multilingual capabilities. For instance, partnerships with the Barcelona Supercomputing Centre have resulted in significant improvements in language-specific task accuracy. Another collaboration, EuroLLM, has developed the EuroLLM 9B Instruct model, which supports all 24 official EU languages and excels in tasks such as question answering, summarization, and translation. These efforts underscore the importance of multilingual AI in delivering more inclusive and effective AI solutions. Industry Insights and Company Profiles Industry insiders hail NVIDIA's workshop as a crucial step towards democratizing AI by enabling developers to adapt and optimize models for a wide range of languages and domains. This democratization is particularly important in regions like Europe, where linguistic diversity is high. Companies like the Barcelona Supercomputing Centre and EuroLLM, known for their pioneering work in AI and computational linguistics, are collaborating with NVIDIA to advance these goals. By participating in this workshop, developers can join these leading organizations in shaping the future of multilingual AI. For those eager to dive deeper, there are additional GTC Paris sessions to consider: - Sovereign AI in Practice: Building, Evaluating, and Scaling Multilingual LLMs [CWEP1103]: NVIDIA experts discuss enriching language models with new knowledge, expanding capabilities in specialized domains, and adapting to new languages and cultures. - Building and Customizing AI Models for European Applications: From Foundation to Fine-Tuning [GP1046]: A panel discussion featuring insights from leading European model builders and practical applications from companies like ThinkDeep. To secure a spot in the "Adding New Knowledge to LLMs" workshop and other sessions, reserve your seat at GTC Paris today.