Introducing Fine-Tuning and Evaluation of LLM in BigQuery

by

in

1. BigQuery allows analysis of data using large language models like Gemini 1.0 Pro, Pro Vision, and text-bison.
2. Model fine-tuning in BigQuery is needed for scenarios where additional customization is required, such as defining behavior or updating with new information.
3. Supervised fine-tuning in BigQuery uses a dataset with input text and ideal output text to mimic behavior implied by examples, leading to improved model performance.

BigQuery offers the ability to analyze data using large language models (LLMs) such as Gemini 1.0 Pro, Gemini 1.0 Pro Vision, and text-bison hosted in Vertex AI. These models are effective for tasks like text summarization and sentiment analysis with prompt engineering. However, in some cases, customization through model fine-tuning may be necessary to define specific behavior, response styles, or incorporate new information.

The new feature announced is supervised fine-tuning in BigQuery, which allows customization of LLMs by using a dataset with input text (prompt) and expected output text (label) to train the model. This approach helps the model mimic desired behaviors or tasks based on examples provided.

To demonstrate model fine-tuning, a classification problem using a medical transcription dataset is used as an example. By creating training and evaluation tables in BigQuery and utilizing the ML.GENERATE_TEXT function, the performance of the model can be assessed based on how well it classifies transcripts into predefined categories.

Model fine-tuning in BigQuery involves creating a fine-tuned model by specifying training data with prompt and label columns. The technique used is Low-Rank Adaptation (LoRA), a parameter-efficient method that enhances the model’s performance. The process involves creating and evaluating a different remote endpoint for the fine-tuned model compared to the baseline model.

The evaluation metrics for the fine-tuned model show improved performance compared to the baseline model. F1 scores for various categories are analyzed, indicating an overall enhancement in the model’s classification accuracy. The fine-tuned model is then ready for inference using the ML.GENERATE_TEXT function, offering improved results without the need for managing additional infrastructure. Further details and support for fine-tuning different models are available through the provided documentation.

Source link