Microsoft releases MS Marco, a dataset for training AI systems

Laurent Giret

Microsoft, Application Insights, HockeyApp, Mobile Apps

Looking for more info on AI, Bing Chat, Chat GPT, or Microsoft's Copilots? Check out our AI / Copilot page for the latest builds from all the channels, information on the program, links, and more!

A little more than a year ago, Microsoft shipped Windows 10 which was the first mainstream OS to feature a digital assistant called Cortana. Microsoft is betting big on artificial intelligence to improve its products, and the company’s efforts in that field can already be evaluated in several of the company’s popular products such as chat bots, Skype Translator, Microsoft Translator, Microsoft Pix and more.

But coming back to Cortana, if you frequently use the digital assistant then you’re probably aware that it has some limitations. Indeed, Cortana can’t always answer sophisticated questions, and in that case the assistant will point you to a set of search engine results. Fortunately, Microsoft is currently working to address these shortcomings: today, the company has announced the release of MS MARCO (Microsoft MAchine Reading COmprehension), a dataset that will help artificial intelligence researchers create tools that can answer questions as well as real people.

The dataset contains 100,000 questions and answers based on anonymized queries from Bing and Cortana. Li Deng, partner research manager of Microsoft’s Deep Learning Technology Center explained that “our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions.”

Just like it did with its Cognitive Toolkit earlier this year, Microsoft is making MS MARCO free to use by researchers, hoping that it will lead to more partnerships and technology breakthroughs in the coming years. It’s will likely be a few more years before we see machines that can actually think like humans, but Rangan Majumder, a partner group program manager with Microsoft’s Bing search engine division explained that today’s announcement is a step in that direction. “In order to move towards artificial general intelligence, we need to take a step towards being able to read a document and understand it as well as a person,” he added. The MS MARCO dataset is available to download for free on the dedicated website.