Disallow Oembed Media Links
When Drupal's media system handles oEmbed content (such as YouTube or Vimeo videos), it creates internal URLs at paths like /media/oembed that serve as intermediary endpoints. These URLs are not intended to be indexed by search engines and can create duplicate content issues or pollute search results if they are crawled.
The Problem
oEmbed media URLs typically follow patterns such as:
/media/oembed?url=...These endpoints serve the embedded content for rendering within your pages but do not contain meaningful page content on their own. If search engines index these URLs, they may:
Create duplicate content signals
Waste crawl budget on non-content pages
Appear as low-quality pages in search results
Solution 1: Using robots.txt
The simplest approach is to add a disallow rule to your site's robots.txt file to prevent search engine crawlers from accessing oEmbed paths.
Add the following lines to your robots.txt file:
# Disallow oEmbed media links
Disallow: /media/oembedIf your site uses the RobotsTxt module for managing robots.txt through the admin interface, navigate to Configuration > Search and metadata > Robots.txt and add the disallow rule there.
Solution 2: Using Rabbit Hole Module
The Rabbit Hole module provides more granular control over how entity pages behave. It can be configured to prevent direct access to media entity pages entirely, redirecting visitors or returning a 403/404 response instead.
To configure Rabbit Hole for media entities:
Install and enable the Rabbit Hole module if it is not already enabled:
Navigate to Configuration > Content authoring > Rabbit Hole settings.
Configure the behavior for Media entities.
Set the default action to Page not found or Page redirect to prevent direct access to media entity pages.
Solution 3: Using Metatag noindex
You can also use the Metatag module to add a noindex meta tag to media entity pages:
Navigate to Configuration > Search and metadata > Metatag.
Edit the defaults for the Media entity type.
Under the Advanced section, set Robots to include
noindex, nofollow.Save the configuration.
This tells search engines not to index these pages even if they discover them through crawling.
Recommended Approach
For most Varbase sites, combining the robots.txt disallow rule with the Metatag noindex approach provides the most robust protection. The robots.txt rule prevents crawlers from wasting crawl budget, while the noindex meta tag serves as a fallback if a crawler reaches the page through another path.
Last updated