A Simple Guide to Deploying Generative AI with NVIDIA NIM NVIDIA Technical Blog
We see a majority of respondents reporting AI-related revenue increases within each business function using AI. And looking ahead, more than two-thirds expect their organizations to increase their AI investment over the next three years. AI high performers are expected to conduct much higher levels of reskilling than other companies are. Respondents at these organizations are over three times more likely than others to say their organizations will reskill more than 30 percent of their workforces over the next three years as a result of AI adoption. Compared with 2023, respondents are much more likely to be using gen AI at work and even more likely to be using gen AI both at work and in their personal lives (Exhibit 4).
LoRA stands for Low-Rank Adaptation, a cool technology that makes it easier to train Stable Diffusion on different concepts, such as characters or styles. In simpler terms, it lets you fine-tune your AI-generated art, making it more vibrant and alive. LoRA models are small, resulting in a reduced file size that’s perfect for users with extensive collections.
We measure the violation rates of each model as evaluated by human graders on this evaluation set, with a lower number being desirable. Both the on-device and server models are robust when faced with adversarial prompts, achieving violation rates lower than open-source and commercial models. We compare our models with both open-source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable size (GPT-3.5-Turbo, GPT-4-Turbo)1.
- Imagine you have a big language model that knows much about language and can understand and generate sentences.
- In the last several years, there have been major breakthroughs in how we achieve better performance in language models, from scaling their size to reducing the amount of data required for certain tasks.
- Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback.
- However, they can sometimes miss the mark because they have not been customized or fine-tuned with additional data for detailed knowledge.
This type of model is usually trained on art by a specific artist, giving you access to their signature style in your own work. Style LoRA can be used for anything from stylizing reference images to creating original artwork in that same style. LoRA does not increase inference latency, as once fine tuning is done, you can simply
update the weights in \(\Theta\) by adding their respective \(\Delta \theta \approx \Delta \phi\). It also makes it simpler to deploy multiple task specific models on top of one large model,
as \(|\Delta \Phi|\) is much smaller than \(|\Delta \Theta|\).
This means that the weight updates are not expected to be complex, and
we shouldn’t need a full-rank update in order to get good results. The latest version of MLPerf Training includes a fine-tuning test, which applies LoRA to the Llama 2 70B model, developed by Meta. By improving performance with the same GPUs, you can either train models with similar computational requirements in less time and at a lower cost or train more computationally intensive models in a similar time with similar costs. This resulted in higher GPU operating frequency within the same power budget and improved end-to-end performance by 4%. For more information about this command, including how to get all possible values, run nvidia-smi boost-slider –help. In this round of MLPerf Training, NVIDIA has more than tripled its submission scale to 11,616 H100 GPUs and more than tripled performance to 3.4 minutes to train, delivering near-linear performance scaling.
What are the Applications of Generative AI?
The jury is still out on that question, with the betas having only dropped Monday, but the company has since revealed some of what makes its approach to generative AI different. Many of the most prominent companies in the space take a “bigger is better” approach to their models. The goal of these systems is to serve as a kind of one-stop shop to the world’s information. In addition, NVIDIA is making the capabilities of RTX Remix Toolkit accessible via a REST API, allowing modders to livelink RTX Remix to digital content creation tools such as Blender, modding tools such as Hammer and generative AI apps such as ComfyUI. NVIDIA is also providing an SDK for RTX Remix Runtime to allow modders to deploy RTX Remix’s renderer into other applications and games beyond DirectX 8 and 9 classics.
For details, please refer to Jacob Stern’s comprehensive guide to memory usage in PyTorch. Language models are already out there helping people — you see them show up with Smart Compose and Smart Reply in Gmail, for instance. To understand how LoRA works, let’s figure out how the weight system in a model is organized. Think of our model as a massive group of big spreadsheets or matrices with lots of numbers in them. Fine-tuning is an integral component of model training as it allows it to adapt to a particular type of application or project. Before realizing why we deem it necessary we need to figure out what fine tuning is.
Fine-tuning the entire model can be computationally expensive, especially when dealing with huge models with millions or billions of parameters. LoRA reduces the computational cost by working with low-rank matrices, making it more feasible for resource-constrained environments. To put it simply, a full-rank weight matrix has a complete set of unique and independent parameters. Each parameter in the matrix contributes uniquely to the model’s ability to learn and represent complex patterns in data. This makes it very expressive but also potentially large and computationally intensive, as it may contain many parameters.
The workload is extremely demanding and is a good test of large-scale LLM training performance, which stresses the compute, networking, and software efficiency of an accelerated computing platform. The second new test focuses on graph neural network (GNN) training, based on an implementation of RGAT (relational graph attention network). GNNs are being applied to many domains, including drug discovery, fraud detection, and recommendation systems.
LoRA (Low Rank Adaptation) is a new technique for fine-tuning deep learning models that works by reducing the number of trainable parameters and enables efficient task switching. In this blog post we will talk about the key ideas behind LoRA in a very minimal torch example. First introduced by Microsoft via the whitepaper here, LoRA is a technique used in language models to make them more efficient and easier for different tasks. Imagine you have a big language model that knows much about language and can understand and generate sentences.
As of today, there are about 1,000 Dreambooth models registered in the Dreambooth Concepts Library, and probably many more not registered in the library. Before training, freeze the original LLM model and set only the LoRA parameters to be trainable. The following figure shows the downstream tasks used for GPT-1, which include common NLP tasks such as classification, hypothesis testing, similarity comparison, and multiple-choice questions.
We used a combination of data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) to scale training along multiple dimensions such as data, model, and sequence length. The concept of LoRA is that since LLM is applicable to different tasks, the model will have different neurons/features to handle different tasks. If we can find the features that are suitable for the downstream task from many features and enhance their features, we can achieve better results for specific tasks.
LLM tuning is a specialized process that takes a pre-trained language model and customizes it for specific tasks or domains. It leverages the general language understanding, acquired by the model during its initial training phase, and adapts it to more specialized requirements. The advantage of LLM tuning is that it does not require re-training the entire model, so, at least in principle, it should be much simpler and less computationally intensive than training a new LLM. Opening up to third-party models like OpenAI’s ChatGPT makes sense when considering the limited focus of Apple’s models.
Let’s say you want to use this language model for different tasks, like summarizing articles or answering questions. The problem is that the model is so big and has so many parameters that it becomes difficult and expensive to use for each task separately. First, it is important to understand how much GPU memory will be used during model training.
With the Core Spotlight framework, developers can donate content they want to make searchable via Spotlight. Coming soon to video editing software Blackmagic Design’s DaVinci Resolve and Wondershare Filmora, RTX Video will enable video editors to upscale lower-quality video files to 4K resolution, as well as convert standard dynamic range source files into HDR. In addition, the free media player VLC media will soon add RTX Video HDR to its existing super-resolution capability. Last year, NVIDIA made RTX Remix Runtime open source, allowing modders to expand game compatibility and advance rendering capabilities. Last year, NVIDIA introduced RTX acceleration using TensorRT for one of the most popular Stable Diffusion user interfaces, Automatic1111. Starting this week, RTX will also accelerate the highly popular ComfyUI, delivering up to a 60% improvement in performance over the currently shipping version, and 7x faster performance compared with the MacBook Pro M3 Max.
Instead of training a new model from scratch for a specific task, you adapt or fine-tune the pre-trained model by modifying its parameters to better suit the new task. The pre-trained model’s existing knowledge, lora generative ai represented in its learned parameters, serves as a valuable starting point. The fine-tuning process makes small changes to this knowledge to make it more relevant and accurate for their new purpose.
Bard is powered by a large language model, which is a type of machine learning model that has become known for its ability to generate natural-sounding language. That’s why you often hear it described interchangeably as “generative AI.” As with any new technology, it’s normal for people to have lots of questions — like what exactly generative AI even is. When fine-tuning a pre-trained model on a new task, there is a risk that the model might overfit the new data and lose some of the knowledge it gained during pre-training.
What Is LoRA AI And How To Use Them (Android Soon) – Dataconomy
What Is LoRA AI And How To Use Them (Android Soon).
Posted: Mon, 26 Feb 2024 08:00:00 GMT [source]
Being able to generate objects with custom designs gives you the freedom to experiment and explore different visuals until you find the perfect one for your project. By continuing to optimize the NVIDIA software stack, customers can enjoy more performance per GPU, which reduces the cost to train, and the ability to efficiently scale to larger numbers of GPUs to train even more demanding models. In the submission with 512 H100 GPUs, we improved end-to-end performance by redirecting power from the L2 cache memory on each H100 GPU to the streaming multiprocessor (SM), which houses, among other units, NVIDIA Hopper fourth-generation Tensor Cores. This was done by setting a ratio using a boost slider managed by NVIDIA Management Libraries (NVML). MLPerf incorporates an LLM pretraining benchmark based on GPT-3 175B, a 175B parameter LLM developed by OpenAI.
By using low-rank matrices to update specific parameters, the approach drastically cuts down the number of parameters that need to be trained. This reduction is crucial for practical applications, as fully retraining LLM models like GPT-3 is beyond the resource capabilities of most organizations. These lightweight, easy-to-use adaptors can be applied to pre-trained models, enhancing control and precision over desired concepts in a single interference pass with minimal entanglement. Concept Sliders also enable the editing of visual concepts not covered by textual descriptions, a feature distinguishing them from text-prompt-based editing methods. While image-based customization methods can effectively add tokens for image-based concepts, they are difficult to implement for editing images. Concept Sliders, on the other hand, allow end-users to provide a small number of paired images defining a desired concept.
Fine-tuning requires loading and working with these massive models, which puts a heavy burden on GPU memory. It happens because it typically involves reading and processing a lot of data, which can be a bottleneck in the GPU’s processing pipeline. Loading large datasets from storage into GPU memory can be slow, especially for very large models.
Why Apple is taking a small-model approach to generative AI
Open-source repositories like Civitai and Hugging Face host a plethora of LoRA models. We evaluate our models’ writing ability on our internal summarization and composition benchmarks, consisting of a variety of writing instructions. These results do not refer to our feature-specific adapter for summarization (seen in Figure 3), nor do we have an adapter focused on composition. As part of responsible development, we identified and evaluated specific risks inherent to summarization.
By incorporating guidance terms during interference, the method improves the limited compositionality inherited by the diffusion frameworks, and they can be used to guide through unsafe concepts in diffusion frameworks. Instead of training the whole model again for each task, LoRA freezes the pre-trained model and adds smaller trainable matrices to each model layer. These matrices help the model adapt to different tasks without changing all the parameters. In recent years, Large Language Models (LLMs), also known as Foundational Models, have been trained using large datasets and models with a massive number of parameters, such as the common GPT-3 (175B parameters). The emergence of ChatGPT also indicates the generalization level of LLMs, as they have performed well in common problems.
The online survey was in the field from February 22 to March 5, 2024, and garnered responses from 1,363 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures. Of those respondents, 981 said their organizations had adopted AI in at least one business function, and 878 said their organizations were regularly using gen AI in at least one function. To adjust for differences in response rates, the data are weighted by the contribution of each respondent’s nation to global GDP. Respondents most often report that their organizations required one to four months from the start of a project to put gen AI into production, though the time it takes varies by business function (Exhibit 10).
Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback. We’re asking for feedback on a proposed Acceptable Use Policy update to address the use of synthetic and manipulated media tools for non-consensual intimate imagery and disinformation while protecting valuable research. Microsoft offers the open sourced LoRA (Low-Rank Adaptation of Large https://chat.openai.com/ Language Models) project on GitHub, which can be a useful tool for fine-tuning LLMs. With LoRA, it is now possible to publish a single 3.29 MB file to allow others to use your fine-tuned model. The experiments only evaluated the performance of adding LoRA modules to the Attention block, and evaluated which block (Q, K, V, or O) achieved the best results while keeping the parameter count fixed.
We find that our models are preferred by human graders over most comparable competitor models. On this benchmark, our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient. Our focus is on delivering generative models that can enable users to communicate, work, express themselves, and get things done across their Apple products. When benchmarking our models, we focus on human evaluation as we find that these results are highly correlated to user experience in our products. We conducted performance evaluations on both feature-specific adapters and the foundation models.
Project G-Assist, a GeForce AI Assistant
AI assistants are set to transform gaming and in-app experiences — from offering gaming strategies and analyzing multiplayer replays to assisting with complex creative workflows. COMPUTEX—NVIDIA today announced new NVIDIA RTX™ technology to power AI assistants and digital humans running on new GeForce RTX™ AI laptops. AI high performers are much more likely than others to use AI in product and service development. Conversely, respondents are less likely than they were last year to say their organizations consider workforce and labor displacement to be relevant risks and are not increasing efforts to mitigate them. Respondents to the latest survey are more likely than they were last year to say their organizations consider inaccuracy and IP infringement to be relevant to their use of gen AI, and about half continue to view cybersecurity as a risk (Exhibit 7).
These matrices are much smaller, which translates into lower computational overhead when tweaking the model’s parameters. However, while LLMs have tremendous potential, they require huge computing resources to train, meaning that only a small group of technology giants and research groups are able to build their own LLMs. One of the prime use cases suggested by the MIT team is the ability to collate relevant information from these small, task-specific datasets. Tasks include useful robot actions like pounding in a nail and flipping things with a spatula. Apple’s bespoke approach to foundational models allows the system to be tailored specifically to the user experience.
The sliders then generalize this concept and automatically apply it to other images, aiming to enhance realism and fix distortions such as in hands. Most current text-to-image diffusion models rely on direct text prompt modification to control image attributes. While this approach allows image generation, it is not optimal as changing the prompt can drastically alter the image’s structure. Another approach used by these frameworks involves Post-hoc techniques, which invert the diffusion process and modify cross-attentions to edit visual concepts. However, Post-hoc techniques have limitations, supporting only a limited number of simultaneous edits and requiring individual interference passes for each new concept. Additionally, they can introduce conceptual entanglement if not engineered carefully.
The forward diffusion process initially adds noise to the data, thus the transition from an organized state to a complete Gaussian noise state. The primary aim of diffusion models is to reverse the diffusion process by gradually denoising the image, and sampling a random Gaussian noise to generate an image. In real world applications, the primary objective of Diffusion frameworks is to predict the true noise when the complete Gaussian noise is fed as input with additional inputs like conditioning and timestep.
Looking ahead to the next three years, respondents predict that the adoption of AI will reshape many roles in the workforce. Nearly four in ten respondents reporting AI adoption expect more than 20 percent of their companies’ workforces will be reskilled, whereas 8 percent of respondents say the size of their workforces will decrease by more than 20 percent. The researchers at Microsoft and Beihang have released an open-source implementation of MoRA, which is compatible with LoRA. This can turn out to be an important tool for enterprise applications that want to add new knowledge to base models. With just a few steps, they can supercharge your Automatic1111 workflow, opening a world of possibilities for your projects. To further evaluate our models, we use the Instruction-Following Eval (IFEval) benchmark to compare their instruction-following capabilities with models of comparable size.
For one, they’re embedded in systems with filters for biased information, inappropriate language, and other questionable content. Plus, they don’t need fine-tuning, a specialized skill set requiring dedicated people and teams. There are also many resources online for generating/training models using Colab or personal computers. Recently, the Stable-diffusion community has open-sourced many projects and provided GUI interfaces, allowing even non-programmers to train high-quality generative AI. In stark contrast, Loora’s AI is built, trained, and optimized specifically for personalized English learning.
Additionally, we use an interactive model latency and power analysis tool, Talaria, to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines. As it can be seen in the above picture, the use of Concept Sliders facilitate precise editing of the attributes desired during the image generation process while maintaining the overall structure of the image.
Full-rank weight matrices are common in deep learning models, where their high dimensionality allows the model to capture a wide range of features and relationships in the data. However, they can also be computationally expensive to work with, both in terms of training and inference. So, instead of using one big full-rank weight matrix, LoRA uses two smaller less complex ones.
As we’ve discussed, one of the major advantages of LoRA is that you get excellent results by training orders of magnitude less weights than the original model size. We designed an inference process that allows loading the additional weights on top of the unmodified Stable Diffusion model weights. We train these models on large volumes of text so they better understand what word is likely to come next. One way — but not the only way — to improve a language model is by giving it more “reading” — or training it on more data — kind of like how we learn from the materials we study.
Windows Copilot Runtime to Add GPU Acceleration for Local PC SLMs
Microsoft and NVIDIA are collaborating to help developers bring new generative AI capabilities to their Windows native and web apps. Interest in generative AI has also brightened the spotlight on a broader set of AI capabilities. For the past six years, AI adoption by respondents’ organizations has hovered at about 50 percent. By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks. The use of classifier-free guidance based methods have indicated their ability to enhance the quality of the generated images, and boost text-image alignment.
LoRA models are small Stable Diffusion models that apply smaller changes to standard checkpoint models, resulting in a reduced file size of MBs, much smaller than checkpoint files. LoRA offers a good trade-off between file size and training power, making them an attractive solution for users who have an extensive collection of models. The first measures how quickly Llama 2 70B can be fine-tuned using the popular low-rank adaptation (LoRA) technique. LLM fine-tuning enables enterprises to customize LLMs using their proprietary data to improve response quality for specific use cases. As an example, generative AI models like LLMs can be fine-tuned to create tailored personal assistants, improved language translation and more. LoRA adapters are being generated by developers and the broader AI community to create custom experiences, and consumers can choose the one that matches their preferences.
Startups will receive up to $1 million each in AWS credits to help them build, train, test, and launch their generative AI solutions. They will also have access to industry experts, technology, and technical sessions from NVIDIA, the program’s presenting partner, and be invited to join the NVIDIA Inception program, designed to nurture cutting-edge startups. Researchers from Microsoft and Beihang University have introduced a new technique for fine-tuning large language models (LLMs) at a fraction of the cost it usually takes.
Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind. Enhancing contextualization and customization has always been a driving force in the realm of user experience. While generative artificial intelligence (AI) has already demonstrated its transformative potential, there remains ample opportunity for further advancements. This Earth Day, we discuss how tech and open source are helping two organizations combat the effects of a changing climate. GitHub Copilot increases efficiency for our engineers by allowing us to automate repetitive tasks, stay focused, and more. Here’s how SAST tools combine generative AI with code scanning to help you deliver features faster and keep vulnerabilities out of code.
iOS 18 cracks down on apps asking for full address book access
I did not attempt to optimize the hyperparameters, so feel free to try it out yourself!. Sayak did another run on a T4 (16 GB of RAM), here’s his final model, and here’s a demo Space that uses it. Full model fine-tuning of Stable Diffusion used to be slow and difficult, and that’s part of the reason why lighter-weight methods such as Dreambooth or Textual Inversion have become so popular. In order for users to share their awesome fine-tuned or dreamboothed models, they had to share a full copy of the final model. You can foun additiona information about ai customer service and artificial intelligence and NLP. Other users that want to try them out have to download the fine-tuned weights in their favorite UI, adding up to combined massive storage and download costs.
The NVIDIA platform continues to deliver even more performance through invention across the entire stack, including new chips and systems. Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and more. And, as these models continue to grow in size and are trained on even more data, they are producing even higher-quality outputs. Beyond enabling fine-tuned language vision models (LVMs) for different artistic styles, the LoRA technique is broadly applicable to any AI model.
The framework creates these sliders using sample images or a set of prompts, thus establishing directions for both textual and visual concepts. As previously mentioned, current text-to-image diffusion frameworks often struggle to control visual concepts and attributes in generated images, leading to unsatisfactory results. Moreover, many of these models find it challenging to modulate continuous attributes, further contributing to unsatisfactory outputs. Concept Sliders may help mitigate these issues, empowering content creators and end-users with enhanced control over the image generation process and addressing challenges faced by current frameworks. LoRA reduces the computational resources required for fine-tuning large language models.
Small matrices assist the model in adjusting to a variety of applications while keeping all original parameters unchanged. When we’re fine-tuning the model with the help of LoRA models, it makes changes to these smaller matrices, while keeping the initial weight matrix and its parameters untouched. Pre-training a model is typically a labor-intensive, time-consuming, and expensive endeavor. For instance, an AI will not be able to generate a story in a certain style if it has not been previously trained on texts in that style. Consider a weight matrix, W0, which measures d by d in size and is kept unchanged during the training procedure.
With PEFT methods becoming increasingly popular in the enterprise, MoRA can become an important addition to the growing toolset of LLM application developers. Our foundation models are trained on Apple’s AXLearn framework, an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.
Instead of completely retraining or fine-tuning a model from scratch, which might result in a loss of valuable pre-trained knowledge, LoRA allows you to adapt the model while minimizing the loss of this knowledge. But performing full-parameter fine-tuning as was previously mentioned can be resource-intensive and expensive. It can make it less accessible to smaller companies, startups, and individuals who may have more limited budgets, computational resources, and access to large datasets. During fine-tuning, the model adjusts its parameters by computing gradients using backpropagation. Backpropagation, short for « backward propagation of errors, » is an algorithm used to compute gradients and update the model’s parameters.
First ACE PC NIM Debuts
NVIDIA ACE technology for powering digital humans is now coming to RTX AI PCs and workstations with NVIDIA NIM — inference microservices that enable developers to reduce deployment times from weeks to minutes. ACE NIM microservices deliver high-quality inference running locally on devices for natural language understanding, speech synthesis, facial animation and more. Project G-Assist takes voice or text inputs from the player, along with contextual information from the game screen, and runs the data through AI vision models. These models enhance the contextual awareness and app-specific understanding of a large language model (LLM) linked to a game knowledge database, and then generate a tailored response delivered as text or speech. The online survey was in the field April 11 to 21, 2023, and garnered responses from 1,684 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures.
- We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.
- Fine-tuning open source models is done on the large cloud provider hosted by the LLM, such as AWS, Google Cloud, or Microsoft Azure.
- LoRA reduces the computational resources required for fine-tuning large language models.
- In order to obtain better results in tasks like chatting
or question answering, these models can be further ‘fine-tuned’ or adapted on domain
specific data.
- With every passing day, we get something new, be it a new LLM like Mistral-7B, a framework like Langchain or LlamaIndex, or fine-tuning techniques.
The use of Concept Sliders can not only result in generating more realistically looking hands, but they have also shown their potential in improving the overall realism of the images generated by the framework. Concept Sliders also identifies single low-rank parameter direction that enables the shift in images from common distortion issues, and the results are demonstrated in the following image. LoRA models Chat GPT are available in various open-source repositories, including Civitai and Hugging Face. The best thing about these models is their size – most LoRA models don’t exceed a few megabytes, making them incredibly lightweight and easy to work with. Object LoRAs are an invaluable tool for not only artists but also game developers, web designers, and other creative professionals who need to create assets efficiently.
The model’s weights are updated primarily for the new task, and it might lose its knowledge of previous tasks. LoRA can potentially mitigate catastrophic forgetting by keeping pre-trained weights frozen so they are not changed during fine-tuning. It focuses on the optimal use of computational resources, such as CPU processing power, GPU capabilities, and memory by decreasing the count of parameters that should be adjusted or trained. LoRA is designed to be resource-efficient, making it a more viable option for organizations with limited computational resources and smaller budgets.
This is a broad category of LoRA models that are used to generate objects such as furniture, plants or even vehicles. Of course the type of items you can create with these models depends on the specific model you’re using and the prompt you provide. Concept LoRA is a special kind of LoRA that was trained on a specific concept or idea. These models usually aim to conceptualize something specific that’d be harder to achieve with simply just prompt engineering. For example, this type of LoRA could be trained on a specific emotion, action, or a very specific item.
NVIDIA RTX Remix is a modding platform for remastering classic DirectX 8 and DirectX 9 games with full ray tracing, NVIDIA DLSS 3.5 and physically accurate materials. RTX Remix includes a runtime renderer and the RTX Remix Toolkit app, which facilitates the modding of game assets and materials. These AI capabilities will be accelerated by NVIDIA RTX GPUs, as well as AI accelerators from other hardware vendors, providing end users with fast, responsive AI experiences across the breadth of the Windows ecosystem.