HTML – The PolyBlog

For anyone running their own website, one of the terms that comes at you fairly early on is “Search Engine Optimization”. This is a lovely term to basically say, “How do I get my website to show up higher in Google’s rankings?”. There are commercial companies that offer packages, individuals who offer tweaking services, etc…everyone wants to sell you tricks and tips on SEO.

Most of them are, umm, well, worthless. While some are outright scammers, some are just worthless because most of what they do is something you end up doing for yourself (my day boss’ regular comment is if you hire a consultant to tell you the time, they’ll borrow your watch first — as you’ll see below, a SEO consultant will quickly ask you for info to do the job for you). And it is a bit of a competitive crapshoot anyone. Certainly if anyone tries to GUARANTEE you a specific SEO result, run the other way — they’re scammers. No one can guarantee a result, they can only offer ways to LIKELY improve your results — your actual mileage will vary. And most of those suggestions fall into the categories below.

Background

For quick background, here’s what you need to know about how Google works. Google’s search engine process is not one giant element, it is actually made up of a lot of little pieces.

The first big piece to know about is that Google likes to crawl around the Web, sending out little robot searchers that jump from link to link to link, searching out text. These are called spiders (get it? Web, crawling, spiders? Who says giant companies can’t be whimsical?). The spiders don’t care how pretty your site is, they just look for text. And while they are constantly out there 24/7, they can’t read the entire internet in a single day. In fact, if your site doesn’t already have a lot of traffic, they might only visit it every couple of months to see what’s new. And since the spiders can only read text, your fantastic trailer, image, etc. are all lost on the bot. It comes, it reads, it leaves.

Well, actually, it comes, it reads, it creates an index of your content, and then it leaves. Google takes that indexed page and puts it into their massive database from which search results are generated. (No, Virginia, Google does not search the entire web every time you press Search…When you ask for a search, Google runs its algorithm against the database and tries to calculate which pages are the most relevant to your request. In other words, Virginia, it compares your search string against the INDEX it created of millions of pages. Including the index of your own webpage.)

If a page has the exact same search string that Virginia looked for, it gets some points. If it has similar keywords embedded in it, more points. If it is a popular site, more points. If it is a highly rich content site with lots of articles, more points. If it has a LOT of pages that match that search, more points. And so on — points for this, points for that, points for something else, all added up into a nice little relevancy score. Those pages with the highest relevancy score are presented first (note that Google also offers paid advertising at the top of the page — one of the ways they do that is give those paid advertisers a huge relevancy score so they appear first!). I’ll expand below on the various parts that can get you more points, these are just the basics for introduction.

Now, if you’re jumping ahead in you’re thinking, you’ll quickly come to a realization — “If I know the algorithm, I can ensure that I get the most points! Ka-ching!”. No kidding, Sherlock…that’s why Google doesn’t tell anyone what the algorithm is. Which, going back to the scammers at the top, if anyone tells you they know how to guarantee a “first-page score” for example, they don’t — they have educated guesses, but your site is still going to be ranked in comparison with other sites. And while you may think you have the best fan page ever on Lindsay Lohan, the thousands of other LiLo sites may outrank you with their content. Oh, and just an aside — Google, Yahoo, Buzz, etc.? They don’t all use the same algorithms. That would be too easy. 🙂

Time for a Royal Wedding

The bride and groom of SEO are keywords and content. And the secret is they eloped long before April 29th!

In the grand old days of the internet, keywords were the most important thing ever. If you got your keywords right (and embedded them in your web page as “META TAGS” or hidden keywords that described what your page is about), you could get a lot of points and show up nice and high. Now, there are more and more sites out there. So the keyword is still important but it doesn’t go by the META TAG anymore. It also doesn’t necessarily narrow things down much if your keyword phrase is more common — if one of your sets of keywords is “Lindsay Lohan”, note that there are 49M hits for that phrase in Google’s database. Not all of those have LiLo as a formal keyword, but it doesn’t necessarily help you if everyone else gets the same points as you do for having a LiLo reference. But keywords are still really important — if you provide keywords to the crawlers, they’ll read those first, and generally follow your suggestion by adding them to the index they create. In other words, you’re helping the index ACCURATELY reflect what’s on your page — increasing the likelihood of the right people coming to your page. So, telling the spiders which keywords to use can take computerized guesswork out of the equation. And, FYI, with the volume of sites and content out there, three-word phrases tend to be more unique than more common two- or one-word phrases (for example, “Lindsay” has 110M hits, “Lindsay Lohan” has 49M hits, “Lindsay Lohan’s awards” only 22 — not 22M, just 22…obviously that number would be higher if you included traffic citations as an award!!). A small trick related to this too is to put a sitemap on your site — like keywords, it tells the spider bots how to navigate your site properly, and makes their job easier (thus improving the chances of them properly indexing your content).

I’ll digress for a moment to bring up two things. First, what is missing from this story though is a lot of old-style web design and how things have changed. As noted above, early on you could “goose” your rankings higher by really using a LOT of keywords. Now, most of the engines will see a really long list of keywords as indications of lack of focus more so than helping it target your index, potentially diluting the impact on the spiders of your suggested keywords. Yes you need them, but don’t assume that if 5 keywords are good that 300 is even better. Second, if you are running your own site or blog, there are lots of plugins that you can use that basically automate your SEO somewhat…for example, I run WordPress on my site. I could add extra keywords to each and every post, but that is REALLY time-consuming — after all, I’m already adding relevant tags and categories to every post (in a sense, those ARE my keywords). So, one of the things I have is an auto-SEO plugin that adds the tags and categories as my main keywords for every post that I write and publish. I don’t add 100 tags per post, I keep it manageable, and usually only add one category too. If, in addition to blog posts, you also have a few static pages where the content doesn’t change very often, you can probably spend a bit more time focusing your attention on what exactly the best keywords are to describe those pages and put those into the keywords. After that, unless you have a major update, you may not need to change them. And fyi, Yahoo and Bing really put a lot of emphasis on keywords — meaning on this element alone, they might give you more “points” than Google or others.

Going back to the most important pieces, the second element (the “groom”) is your actual content on your page. If all you do is throw up pictures, and very little text, then guess what? The spiders come by, they have nothing to read, and they move on. Nothing to see here, folks, and likely low relevancy in search results. You need content, and preferably original content. If all you do is repost the same things as everyone else, kind of like sharing stuff on Facebook, then the spiders will yawn at you, as there is nothing “unique” to index. Yep, your page will go in the index, but not with a lot of “oomph” behind it. One of the best examples out there for SEO on content is the Huffington Post. Ignoring their actual content, their approach to generating that content is hugely successful — lots of people writing short little articles highlighting the best and the latest news, often repackaged from elsewhere but appearing unique, and they use the top search terms from Google as their keywords. If Google shows that people are searching for “jennifer aniston’s fight with brad”, the Huffington Post is really good at creating an article on that subject by someone and using that search term as keywords. So, HP often shows up as one of the top news aggregator sites in Google’s results. With the bot visiting regularly to grab new content. Not necessarily the “best” content, but the most “relevant”.

The Rest of the Wedding Party

Now here’s the real kicker…almost all the other possibilities for getting higher rankings in search engines are derived from those first two pieces.

For example, let’s look at one area — keyword density — that is slightly less important in recent years and is now even a double-edged sword. If you remember that the spiders don’t rate quality, they just index content, then you want to give them a lot of similar content to index. And if you use a few words over and over (like “Marketing Mystery Novels”), then the spider thinks, “Okay, that’s an important phrase” and assumes the article really is about marketing mystery novels. More so than if you used the phrases marketing cozies, marketing noir, marketing thrillers throughout your post, as the spider doesn’t know you mean the same thing. A human reader likes diversity, the spiders like repetition. However, old designers figured this out and started over-stuffing their pages with the same keywords (i.e. over and over and over). As a result, some search engines start to DEDUCT relevancy scores or even skip your site altogether if they think you’re just trying to game the algorithm with keyword stuffing. Repeat where it makes sense, but don’t overdo it.

Another feature of keywords is that the spiders like to see them in special or particular places and you can get the equivalent of bonus relevancy points if your site has them. For example, if you have them in:

the page title (both actually on the page and in the hidden TITLE tag);
headings (using the H1, H2, H3… HTML tags); and
first few paragraphs,

then you will get more points than if it is buried in your last paragraph or just as tags at the end. And what gets you the MOST points? Having that phrase in your URL!

So, if someone searches for polywogg, my www.polywogg.ca domain name will get me a lot of “relevancy” points — for example, on the Cdn google site, I come up first. Great for me, if someone is looking for “polywogg” specifically. But I also write book reviews — if someone searches for that, I am nowhere near the top. You can also get extra points if it is in your page name…so if you’re running a website that has a webpage called www.joeauthor.com/marketing-mystery-novels.htm, then that will get you lots of extra relevancy points. Far more than if the same page was called MMN.htm (the spider doesn’t know what that means). This also relates to sites that have dynamic URLs (i.e. it says www.joeauthor.com/?page=index&ui=72, a spider doesn’t get any useful info out of that, and so no points for you! (As an aside, some spiders won’t even process those dynamic URLs, they’ll just skip them and move on, but more and more they do). Heck, you can even get some extra points if you use bolding and italicizing strategically to show off some keywords throughout the page.

One area that relates to your content is also a double-edged sword — links. In an ideal world, you would have a single link on your website to your books on Amazon’s store, while thousands of fans on thousands of sites would have links to your site. This would tell Google when it indexed the web that, “Hey, a lot of people find your site really useful and are linking to it.” If so, you get more relevancy points. But some unscrupulous people out there figured out how to game the algorithm — they started setting up fake websites that had nothing but links on them. So, you come to these people and say “Hey, I want to sell Gucci purses”, and they say “Great, give me $x, and I’ll get you into the top 10 searches.” And you say, “Oh, wait, I should tell you that they’re really knock-offs.” Response? “Great, give me $x and I’ll get you into the top 10 searches.” They don’t care what you’re selling, they’re just playing with spider results.

Here’s what they do — they go to all their fake sites, and link to your page with the term “Gucci purses”. Next time the spiders go by, they notice that your site is linked to a LOT and think, “Hmmm, must be popular”. So your relevancy points would go up. Except it is like stuffing the ballot box during elections — they’re not real votes.

Macy’s got dinged by Google for this recently, to which they said, “Oh we didn’t know what the advertising company was doing”, because their advertising company had sub-contracted SEO out to a third party, blah blah blah, and suddenly server farms were running fake sites with lots of links to Macy’s. In Europe, the courts are actually starting to ding companies for this behaviour, particularly if some of the links are based on references to COMPETITOR’S NAMES! And in NYC, there was a just-a-little-too-brilliant-for-his-own-good entrepreneur who had done this for his own business selling fake sunglasses. An equally-enterprising reporter found him out and got him to provide details in an interview on his SEO success — he regularly showed up AHEAD of the actual companies he was pirating! Google read the news article and basically said, “Bad dog, no search results for you!”, and the next day his ranking went from second or first, to 20 pages down. Play fair or get dinged…Google controls the algorithm — if they think you’re playing games, they’ll penalize you fast. Small sites probably don’t have to worry, but why risk it?

So, coming back to SEO, what does Google do with the legitimate LINKING info? They basically look at:

a combination of the number of inbound links (popularity i.e. how many good sites link to you — scammers linking to you is NOT helpful!);
number of outbound links (how many good sites you link to, or if your site is nothing but links);
the ratio between them (i.e. are you a content originator or are you just copying what other people do?); and,
what the HTML code looks like that actually forms the link (or rather, what the link actually says — preferably it says your keyword, but you’re not going to be happy if it is like bad tags on Amazon and linking to you with false text!).

Continuing on the content front, updated material will also attract spiders. There are lots of technical explanations for this, but the easiest way to think about it is a lot like any visitor coming to your site. If it comes the first time, looks at everything, and goes away with an index, then great. That is what it was supposed to do. If however, it comes back in a month, and nothing has changed, it isn’t going to schedule you for follow-up right away. On the other hand, if your site is DRASTICALLY different, it might bump you up to another visit in three weeks, then two, then one, then daily, then hourly (like Huffington Post). So, when the question comes, “website or blog”, the real question related to SEO is, “Yes, both, either one — just make sure you have some dynamic comments somewhere, even if it is just a news page, so that the spiders come to visit”. Equally though, don’t delete your old content unless it is reallyno longer relevant — Google spiders need something to read, they don’t like empty sites. And, just for fun? Google gives extra points the longer your site has been around.

What the heck do you do with this info?

Basically you can ensure you use keywords and dynamic, rich content on your site. After that, some tools exist out on the web in various places to “test” your setup. If you go on Google and search for a few key phrases (see below), you’ll find — surprise! — the most relevant ones that Google sees in its database:

“keyword suggestions”– usually you will enter some keywords that you plan to use, and it will suggest others you might want to think about…this isn’t a lot different than peeking at Amazon’s top tags;
“keyword density tool” — this is to double-check you didn’t go overboard on the keyword stuffing … it will come up as a percentage figure, where single digits (5%) are generally okay, but double (10%+) will actually hurt);
search through web directories (different from search engines, these are more like manually created index of sites by category) like http://dmoz.org or http://dir.yahoo.com to see if you can find some quality sites that you might be able to entice to link to you, and while you’re there, add your own site to the directory … and don’t forget to have friends with quality websites link to you as well; and,
Look at the links on other sites to see how they link to you (often your stats engine or comment page track where the links come from).

Some things you might want to avoid that may be great for humans, not so great for spiders:

using image buttons for navigation;
using trailers or pictures to convey content (for example, a series of picture “slides” don’t get indexed) without having a text option too;
using “frames” in your design, as the spiders can’t handle it well, although few designers use this anymore — most websites “create” a full page instead; and,
using javascript or dynamic coding in your HTML pages (there are tricks to tell the spiders to ignore these things, but that is a lot more advanced).

I hope that is helpful…I freely admit that I mainly did this research for myself, trying to figure out what would work and what wouldn’t. But if others can pull from it and use it, more power to you! And if I missed some obvious things, happy to be let in on the secrets…

The PolyBlog

My view from the lilypads

Tag Archives: HTML

The Royal Wedding of Search Engine Optimization for writers…

Background

Time for a Royal Wedding

The Rest of the Wedding Party

What the heck do you do with this info?

Version 2.0 of my website…