Baidu Restricts Google and Bing from Accessing Content Amid AI Data Needs

article picture

In a move that highlights the growing importance of data in the artificial intelligence (AI) era, Chinese internet giant Baidu has taken steps to limit access to its content by major search engines Google and Bing.

The Beijing-based company recently updated the robots.txt file for Baidu Baike, its Wikipedia-like online encyclopedia. This change, implemented on August 8, effectively blocks Google's and Microsoft's search engine crawlers from indexing and accessing content on the platform.

Baidu Baike, which boasts nearly 30 million entries, is a valuable repository of information. By comparison, the Chinese version of Wikipedia contains only 1.43 million entries. This vast difference in content volume underscores the significance of Baidu's decision.

The move comes at a time when AI developers are increasingly seeking large datasets to train their models. Many companies are striking deals with content publishers to gain access to quality information for their generative AI projects.

For instance, OpenAI recently partnered with Time magazine, gaining access to over a century's worth of archived content. Similarly, Reddit has entered into a multimillion-dollar agreement with Google, allowing the tech giant to scrape data from its platform for AI training purposes.

Baidu's decision to restrict access to its content aligns with a growing trend among tech companies to protect their data assets. Last year, Microsoft reportedly threatened to cut off access to its internet search data for rival companies using it to train chatbots and other AI services.

As of Friday, some Baidu Baike entries were still appearing in Google and Bing search results, likely due to older cached content. However, the long-term impact of this restriction could be significant for these search engines and their AI development efforts.

This development highlights the increasing value placed on data in the AI industry and the strategic moves companies are making to control and monetize their information assets. As the race for AI supremacy continues, access to vast and diverse datasets is becoming a critical factor in gaining a competitive edge.

Baidu Restricts Google and Bing from Accessing Content Amid AI Data Needs

Google Takes Playful Jab at iPhone 17's Rumored Design in New Pixel Ad

The End of an Era: Skype's Legacy Lives On as Microsoft Bids Farewell

End of an Era: Microsoft Shutters Skype After Two Decades of Global Connections

Microsoft Removes User Control Over Windows 11's Major 24H2 Update

End of an Era: Microsoft Announces Skype's Consumer Service Shutdown in May