PDA

View Full Version : Multiple sitemaps and Google



gaydemon
04-05-2008, 02:00 AM
I've reached the 50.000 page limit on a single sitemap file for Google and have to split it into several sitemaps and a sitemap_index file.

My question is, do you still submit all the different sitemaps to google or only the sitemap_index file?

I cant seem to find anything about it on the google sitemap page.

Also, does it make sense to split them up in a certain way, I got 100.000+ pages and files (60000 html pages and 40000 jpgs) spread over 3 sitemaps now. Not sure if it makes any differrence how its organized.

Gaystoryman
04-05-2008, 08:34 AM
If I remember right, over 50K should be in a gzip or something with an index?

Gaystoryman
04-05-2008, 08:50 AM
I've reached the 50.000 page limit on a single sitemap file for Google and have to split it into several sitemaps and a sitemap_index file.

My question is, do you still submit all the different sitemaps to google or only the sitemap_index file?

I cant seem to find anything about it on the google sitemap page.

Also, does it make sense to split them up in a certain way, I got 100.000+ pages and files (60000 html pages and 40000 jpgs) spread over 3 sitemaps now. Not sure if it makes any differrence how its organized.

I'd say you need to submit the 'index' site map


Sitemaps should be no larger than 10MB (10,485,760 bytes) in length when uncompressed and can contain a maximum of 50,000 URLs. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 10MB.



You can list the updated URLs in a small number of Sitemaps that change frequently and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps.

Way I read it, lot more work when you got a ton of pages, and they do recommend gzip for the larger maps. They also seem to suggest breaking it down into more specific groups.. video, code, news etc.

http://www.google.com/support/webmasters/bin/answer.py?answer=40318

It does seem like you use a different schema for an index of maps, rather than the one for maps

https://www.google.com/webmasters/tools/docs/en/protocol.html#sitemapValidation

hth

gaydemon
04-05-2008, 12:14 PM
Yes.. i managed to find out a bit more now.

Its 2 different types of sitemaps. a sitemap_index.xml which only refer to other sitemap files, which means you only submit that index.xml and not the rest.

But the problem with that is you cant split them up in a way that makes sense without having to submit them as seperate sitemaps and NOT use the index.xml.

So i opted for letting google sitemap script do it for me, it splits it up and creates a new sitemap_index.xml for me:

http://www.gaydemon.com/sitemap_index.xml

Also learned that you can include location of your sitemap(s) in your robot.txt file.. so done that as well.

Gaystoryman
04-05-2008, 12:29 PM
Yes.. i managed to find out a bit more now.

Its 2 different types of sitemaps. a sitemap_index.xml which only refer to other sitemap files, which means you only submit that index.xml and not the rest.

But the problem with that is you cant split them up in a way that makes sense without having to submit them as seperate sitemaps and NOT use the index.xml.

So i opted for letting google sitemap script do it for me, it splits it up and creates a new sitemap_index.xml for me:

http://www.gaydemon.com/sitemap_index.xml

Also learned that you can include location of your sitemap(s) in your robot.txt file.. so done that as well.

Noticed that when I added mine for the wordpress blogs... sweet... :cool:

rawTOP
04-06-2008, 09:40 AM
The problem with using a master sitemap index to point to other sitemaps is that you lose the handy error reporting in GWT since the only errors they report are in the sitemap you submit (in this case the index). You don't get error reporting for the sitemap's 'children'. If you submit each sitemap separately you'll get error reporting, which is one of the main benefits of using sitemaps in the first place.

gaydemon
04-06-2008, 09:53 AM
Good point, I didnt actually think of that.

however I am using Google's own script (https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html) (I think), which also reports errors before submitting so hopefully it should be ok.

I rather submit several sitemaps, but MSN sitemap doesnt accept more than one. Which is why it then seems easier to just use a sitemap index file.


The problem with using a master sitemap index to point to other sitemaps is that you lose the handy error reporting in GWT since the only errors they report are in the sitemap you submit (in this case the index). You don't get error reporting for the sitemap's 'children'. If you submit each sitemap separately you'll get error reporting, which is one of the main benefits of using sitemaps in the first place.

rawTOP
04-06-2008, 11:27 AM
On a corporate site I run where I use sitemaps I submit all of them to Google and Yahoo! and then just the index to MSN and in the robots.txt file... On my gay sites i just submit the RSS feeds as sitemaps - does the trick and gets things picked up quickly...

So you can do both. If you have dynamic pages, they can change over time and errors can crop up...

gaydemon
04-06-2008, 12:39 PM
Yes i noticed how difficult it was to do PHP pages with sitemaps. A real mess.

I do like it nice and tidy so hopefully the sitemap_index will do it. Otherwise like you do probably do several.


On a corporate site I run where I use sitemaps I submit all of them to Google and Yahoo! and then just the index to MSN and in the robots.txt file... On my gay sites i just submit the RSS feeds as sitemaps - does the trick and gets things picked up quickly...

So you can do both. If you have dynamic pages, they can change over time and errors can crop up...