Technique for assessing the reliability of the foundation model developed at mit

Developing a technique to assess the reliability of foundation models prior to application to specific tasks from MIT and MIT-IBM Watson AI Laboratory

Researchers from mit and the MIT-IBM Watson AI Laboratory have developed a new technique to assess the reliability of foundation models before applying them to specific tasks, using an algorithm to assess model consistency. This solution can help reduce errors in safety-critical situations and enable better model selection without the need to test on actual data.

Photo by: Domagoj Skledar/ arhiva (vlastita)

Researchers at MIT and the MIT-IBM Watson AI Lab have developed a technique to evaluate the reliability of foundation models before applying them to a specific task. They achieve this by analyzing a set of foundation models that slightly differ from each other. The algorithm assesses the consistency of the representations each model learns about the same test data. If the representations are consistent, the model is considered reliable.

Comparing their technique with state-of-the-art methods, the researchers found that their method is better at capturing the reliability of foundation models across various classification tasks.

This technique allows users to decide whether to apply the model in a specific environment without the need for testing on real data. This is especially useful in situations where data may not be available due to privacy issues, such as health data. Additionally, the technique can rank models according to their reliability results, allowing users to choose the best model for their task.

„All models can make mistakes, but models that know when they are wrong are more useful. The problem of quantifying uncertainty or reliability is more challenging for these foundation models because their abstract representations are difficult to compare. Our method allows quantifying how reliable a model's representation is for any input data,” says lead author Navid Azizan, professor at MIT and a member of the Laboratory for Information and Decision Systems (LIDS).

Alongside him on the work were co-lead author Young-Jin Park, a PhD student at LIDS; Hao Wang, a research scientist at the MIT-IBM Watson AI Lab; and Shervin Ardeshir, a senior research scientist at Netflix. The work will be presented at the Conference on Uncertainty in Artificial Intelligence.

Measuring Consensus
Traditional machine learning models are trained to perform a specific task. These models typically give a concrete prediction based on the input. For example, a model might say whether a particular image contains a cat or a dog. In this case, reliability assessment can be as simple as checking the final prediction.

But foundation models are different. The model is pre-trained using general data, in an environment where its creators do not know all the tasks it will be applied to. Users adapt it to their specific tasks after it has already been trained.

To evaluate the reliability of foundation models, the researchers used an ensemble approach by training several models that share many characteristics but differ slightly.

„Our idea is like measuring consensus. If all these foundation models give consistent representations for any data in our dataset, then we can say that the model is reliable,” says Park.

But they faced a problem: how to compare abstract representations?
„These models only give a vector, composed of some numbers, so we can't easily compare them,” he adds.

They solved the problem using an idea called neighborhood consistency.

For their approach, the researchers prepare a set of reliable reference points for testing on the ensemble of models. Then, for each model, they investigate the reference points that are close to the model's representation for the test point.

By looking at the consistency of neighboring points, they can assess the model's reliability.

Aligning Representations
Foundation models map data points into what is known as a representation space. One way to think about this space is as a sphere. Each model maps similar data points to the same place in its sphere, so images of cats go to one place, and images of dogs to another.

But each model would map animals differently in its sphere, so while cats might be grouped near the South Pole of one sphere, another model might map cats somewhere in the Northern Hemisphere.

Researchers use neighboring points as anchors to align these spheres so they can compare representations. If the neighbors of a data point are consistent across multiple representations, then we can be confident in the model's reliability for that point.

When they tested this approach on a wide range of classification tasks, they found it was much more consistent than baseline methods. Additionally, it was not confused by challenging test points that other methods were baffled by.

Moreover, their approach can be used to assess reliability for any input data, so it can evaluate how well the model works for a specific type of individual, such as a patient with certain characteristics.

„Even if all models have average performance, from an individual perspective, you will prefer the one that works best for that individual,” says Wang.

One limitation comes from the need to train an ensemble of foundation models, which is computationally expensive. In the future, they plan to find more efficient ways to build multiple models, possibly using small perturbations of a single model.

„With the current trend of using foundation models for their representations to support various tasks — from fine-tuning to generation with retrieval-augmented approaches — the topic of quantifying uncertainty at the representation level is becoming increasingly important but challenging, as the representations themselves lack grounding. Instead, it's about how the representations of different inputs are related to each other, an idea that this work neatly encapsulates through the proposed neighborhood consistency score,” says Marco Pavone, associate professor in the Department of Aeronautics and Astronautics at Stanford University, who was not involved in this work. „This is a promising step towards high-quality uncertainty quantification for representation models, and I am excited to see future extensions that can function without the need for model ensembling to truly enable this approach in foundation-sized models.”

This work was partially funded by the MIT-IBM Watson AI Lab, MathWorks, and Amazon.

Creation time: 17 July, 2024

Note for our readers:
The Karlobag.eu portal provides information on daily events and topics important to our community. We emphasize that we are not experts in scientific or medical fields. All published information is for informational purposes only.
Please do not consider the information on our portal to be completely accurate and always consult your own doctor or professional before making decisions based on this information.
Our team strives to provide you with up-to-date and relevant information, and we publish all content with great dedication.

We invite you to share your stories from Karlobag with us!
Your experience and stories about this beautiful place are precious and we would like to hear them.
Feel free to send them to us at karlobag@ karlobag.eu.
Your stories will contribute to the rich cultural heritage of our Karlobag.
Thank you for sharing your memories with us!

Developing a technique to assess the reliability of foundation models prior to application to specific tasks from MIT and MIT-IBM Watson AI Laboratory

AI Lara Teč

Events Croatia

Zeljarijada 2024 in Vidovac: an international festival celebrating Varaždin cabbage, with a rich program, concerts by Miroslav Škora and Slavonski Lola, and a mega-sarma of 850 kg of meat and 1300 heads of cabbage

Authentic Baroque Music Experience in Varaždin: Portuguese Orchestra, Fado and Exhibitions at the 54th Varaždin Baroque Evenings

Creative workshop "My sketch - my coolest sneaker" at the Varaždin festival: innovations in footwear design and cooperation with local industries

Days of porcini mushrooms and black truffles in Paka: enjoy competitions, culinary workshops and cultural sights

Discover the richness of Croatian tradition at the 17th Vukovar Ethno Fair: a unique experience of culture, handicrafts and local products

Malik fest 2024 in Trsatska Castle: Medieval festival with knight camps, legends and myths of Istria and Kvarner takes place on September 14 and 15

Take part in the Vinkovci Half Marathon 2024 and enjoy sports, nature and socializing in the heart of Slavonia

Fourth Vinkovci Fišijada: culinary spectacle that brings together the best fish masters on September 28 in Vinkovci

Trending

Developing a technique to assess the reliability of foundation models prior to application to specific tasks from MIT and MIT-IBM Watson AI Laboratory

Povezano