How to Get Your PHP Pages Indexed on Google

Part 1 of the SEO for PHP Series

Finally, an article that will explain the problem and answer the question that has perplexed and eluded web developers and SEO experts for years. The question: How do I get my PHP pages indexed on Google and other search engines?

The Myth Exposed

First, I need to point out that there has always been a myth surrounding Google’s indexing (or non-indexing) of PHP pages, specifically dynamically generated pages. I suppose my mysterious title to this entry doesn’t help the matter, because Google is easily able to index both dynamically generated PHP pages and PHP pages with dynamically generated content (read further down for an explanation). There are some good practices that webmasters should follow to make sure their site’s pages are clearly readable, but there’s not really a big mystery. I’ll explain in later posts in this series how you can make your PHP site Google ready. In the mean time, Google (as always) is smart enough to sort through most of the garbage that people sometimes post to the web. So why write an article you ask? Because there are so many people who still believe this myth, and it’s about time someone pointed out that Google has already set the record straight.

It’s easy to see how such a myth could be perpetuated over time. Google did have an excerpt from their Webmaster Guide back in 2006 that said Google had a more difficult time indexing pages with a query string in the URL (for those of you not familiar with URL parts, read this entry by Google’s Matt Cutts). Google once warned:

“Don’t use “&id=” as a parameter in your URLs, as we don’t include these pages in our index.”

Ah ha! The spark that started the fire. So, Google really did have a hard time at some point indexing PHP pages? Well, not so fast Mr. or Ms. Webmaster pants. They didn’t say anything about dynamically generated pages, or even PHP for that matter. They simply said it’s best not to use certain query strings in your URLs. It has been exaggerated and expanded over the years to somehow include the idea that Google is wholly incapable of indexing pages processed by PHP. This notion was false then, and is false now. In fact, Google has now even removed that line from the Webmaster Guide and has stated clearly that it’s OK to have dynamically generated pages.

People who actually still believe that PHP cannot be indexed by Google have obviously never performed this query:

inurl: .php

What? That’s an outrage! Google doesn’t index PHP pages! Especially not that first one that is dynamically generated and contains “?id=” in the query string! OK, now I’ll remove my tongue from my cheek so I can finish up here.

A Technical Explanation

Alright, here’s where we’ve got to get a little technical. Don’t be afraid, it’s not too bad. After all, it was the people who weren’t willing to explore a little bit of the technical stuff that perpetuated this myth in the first place. Be a part of the solution and keep reading!

First, there is a real difference between dynamically generated pages and pages with dynamically generated content. It is often the case that dynamically generated pages will also have dynamically generated content, but not all pages with dynamically generated content have to be themselves dynamically generated. Confused? Let me explain.

Pages With Dynamic Content

Let’s say I have a file called faqs.php that actually takes up memory on my server, and contains PHP code that performs a certain function. Maybe the code accesses a database, obtains a list of FAQs, and prints them out to the screen. This is an example of a non-dynamically generated page that dynamically generates content. In this case, the faqs.php script is only going to consistently deliver FAQs to the user. It will probably never show a terms of use, contact, or about page. It’s for FAQs.

Google has never had a problem indexing these types of pages, even though they are created using PHP. The fact that PHP is involved is a moot point because all the content has been delivered to the browser or search engine spider after the content was compiled together on the server. The spiders don’t care what happens behind the scenes to deliver my page, especially when I have this simple URL: http://www.mysite.com/faqs.php. For all intents and purposes, this page is the exact same thing as a static HTML page delivered from the server to the spider or browser (the only difference is the .php extension in the filename). Even the file extension is a non-issue considering the fact that a webmaster can setup Apache to execute and deliver PHP pages with the .html extension, which would totally mask the fact that the page content has been delivered dynamically.

Dynamically Generated Pages

Now, let’s imagine I have a file called index.php, and it has a query string appended to it so it looks like this: index.php?page=faqs. This script might be producing what people would call a dynamically generated page. It too, in this case, delivers content to the screen, also showing information about FAQs, just like our previous example. The difference with this index.php page is that it’s probably not specific to just FAQs. It’s probably also used to deliver the terms of use, contact us, and about pages respectively. In theory (but not in good practice), you could have an entire website of thousands of pages that come from one single PHP script.

These are the types of pages that Google has warned about in the past. While they’ve never had problems with the types of pages described previously, you can see how Google may have had a hard time in the past, or just simply refused to keep track of and index these types of pages, especially if multiple name value pairs exist in the query string.

Short and Simple Answers

OK, your head may be spinning after all that rigmarole. Let’s look at how Google themselves answers the question: “Does Google index dynamic pages?” The short and simple answer is yes.

Well, would you look at that! Google themselves set the record straight once and for all. I don’t necessarily like their suggestion of making static pages with the same content as your dynamic pages (if that was an option there’d be little reason to use dynamic pages in the first place). Regardless, Google essentially says to all you wild and wacky webmasters with equally wacky URLs, “Go ahead and make them dynamic pages! We’ll round-up and corral your content free of charge.”

So Google has caught up to the Wild Wild Web in this regard—and they’ve done so just as a new frontier is emerging. One of pretty URLs and simple-to-index PHP sites and pages. One where dynamic pages and dynamic content are still the staple, but with a new 2.0 friendly flare. Check back soon for the next article where I’ll explain how to create a simple PHP site that implements some of these newfangled things, and employs some suggestions from Google Webmaster Central.

Get Internet Marketing Insight For Your Company - SEO.com

10 Comments

  1. Adam says

    I have always been brought up being taught that every dynamic URLs should be re-written using mod-rewrite at all times because of the question marks and ampersands, to denote arguments, that some engines may dislike. Apart from that though, having SEF URLs is really beneficial in the SERPS!

  2. Peter Ehat says

    Yeah, dynamic URLs should be re-written. I agree that doing so is best practice. Sites that have a bunch of crazy stuff in the URL are just not very cute. It is, however, not a requisite to getting your page indexed.

  3. Layne says

    I’m having a problem that led me to your article. Most of the pages on my site contain dynamic content (PHP) with .htm extensions. Google has no problem indexing them. However, I have two pages that use PHP for form processing, such that the form calls itself when submitted to check for errors before passing it to the form handler. No query strings are used, only the $_POST variable. Google seems to have a problem indexing these pages. Perhaps it’s because the page is dynamic, such that the page has the exact same url before and after the form is submitted, but the content is different. Could Google be following the “Submit”, and therefore be getting multiple version of the same page. Maybe if I restrict Google from following the “Submit”, the page will always look the same to Google and it won’t have a problem. Any insight?

  4. Shiva says

    Hi Layne,

    I do not think that search engines crawler are that recursive to figure out whether a page can submit and then do that process and get content from a site. It does look for query strings and any url’s specified in the image src or links and then rel=nofollow and that’s all.

    I do have a suggestion though, once you create a sitemap.xml – specify that page and mention the frequency level of change to be hourly, that way the robot can understand that it is a page that frequenty changes.

    Hth,
    Shiva

  5. Peter Ehat says

    @Layne

    Having your form’s action calling the page it’s on is just fine. I would however have your processing script redirect the results to a new page (look here for more info http://www.php.net/header). You can use session variables to propagate info that you need to display on the final page.

    The reason I prefer a redirection is because you’ll be able to have a separate page that can be indexed (fixing the issue you’re asking about) as well as doing away with duplicate submissions. Your current setup will resubmit the form POST data again if someone hits refresh after submitting the first time (I just don’t like having to write code to guard against unintentional duplicate submissions).

  6. Jan says

    I am not sure if my website contains more than 20 static URLs (no ?this=that), but I am sure that my website contains more than 4000 URLs with ?something=blablabla. And there is no problem with traffic, there is no problem with rankings in Google and there is no problem at all.

    It all depends on what your website offers to it’s visitors, not how it’s made. There is really no need to avoid variables and values within URLs.

  7. says

    @Jan – it’s true that Google is getting a lot better at indexing dynamic URLs like on your site, but I think Peter’s point is that you can improve beyond just getting the pages indexed. The structure, format, URLs, etc of the site certainly DO play a role in how well a site is indexed. Clean URLs are more user-friendly, more likely to get linked, and tend to perform better in the SERPs.

  8. Peter says

    @Dave

    Regarding the Google statement, my guess is that “some rich media files” refers to Flash content, and “dynamic pages” refers to Javascript (I say that because so many people use AJAX these days to get pieces of content on their pages). Neither Flash content nor Javascript generated content is indexed by Google.

Leave a Reply