|Learn about HTML TAGS - a precursor|
This page is supplied as a precursor to all my other HTML lesson pages for people intending to learn web page design (tags and tag attributes, etc).
As with all of my free tutorial pages, if you think you have seen it elsewhere about tags and html in general - read it all anyway, as I might include comments and experiences that are often overlooked.
A listing of Common Basic Tags
can be accessed by returning to the HTML Code Lab. See separate
tutorials for a detailed list of HTML 3.2 Tags, Forms,
Tables, Image Maps, Frames.
|Tags and Browsers|
|An HTML page is
created as a simple text file (indeed a published survey showed that the program most used
for page creation and editing - about 50%+ - is "MS Windows Notepad"
which is similar to the Macintosh "Simple Text" text editor).
When a browser requests a page from a server, the
server sends the text document to the viewers computer where it is "parsed" by
the browser. In other words the browser 'Reads' the text file and looks for familiar
instructions called TAGS (per inbuilt references) that the browser then uses to lay out
the page, and know where to obtain any objects that are to appear on it.
|All tags must be enclosed in
left and right arrows (<>) and have to be spelled correctly for the browser to
recognise them. Otherwise the incorrect code could appear on the page with the normal text
and graphics. When referring to tags, we assume the arrows are included. The text can be
upper or lowercase.
Many tags have multiple attributes that can be included within the tag - i.e. <p> can also appear as <p align="centre"> - and also many require a closing tag which is a copy of itself, without the attributes, but includes the "/" character. - i.e. <p align="centre"> ...... </p>. The only spaces allowed within a tag are the separators between attributes. Attribute values should be enclosed in full quotes.
Not all tags have to combine with a closing tag (A few tags will not even work with a closing tag).
|Tags may include attributes
although most are not mandatory. It takes a lot of study and regular use to learn them
all, but a printed reference list should overcome the problem.
top of page
|Tags must be
nested correctly. Refer to the basic tags shown below.
| <html> and
</html> appear at the top and bottom.
<head> and </head> appears within the <html> tags as do the <body> and </body> tags.
The <head> and <body> tags also have other tags that open and close within them.
This is called nesting and the patterns must be strictly adhered to.
Even if a WYSIWYG editor is used, we often have to decipher our own - and others - code, or alter it for capabilities not included in the editor used. (using a WYSIWYG editor cannot be considered an excuse for not learning how to read and write HTML code once we venture into complex page layout and objects)
Suggestion: When surfing the web, view the source code from within the browser (usually via the View menu). This will speed up your learning and occasionally introduce a few tricks.
top of page
|REMARKS (the 'Comment' option)|
|As with most
high level languages, the author can include "Comments" if there is a need to
make remarks about the code etc. Comments are ignored and will not appear on the page.
|1. this is hobbledygook
2. <this is hobbledygook>
3. <!-- this is hobbledygook -->
|On the left are three numbered lines of code. The first will be parsed as default text and will appear on the page. The second line is enclosed in arrows (a tag), but the tag is unrecognisable, so the whole line will be ignored. The third line uses the standard "Comment" notation with an exclamation mark as the first character. Note that everything within the left and right arrow will be ignored.|
|The last lines show how a
commnt tag can be used to completely hide other objects. The arrows do not have to appear
on the same line, but they do have to be there, and nested correctly if they are within
Try this code in a text editor (without the line numbers). Save and load the results into a browser. Delete the comment tag arrows and view again - experiment.
top of page
|Because of broad
acceptance, the <font>...</font> tag has been retained as an HTML standard.
Attributes for this tag include color="#RRGGBB", size="?" and
|The face= option can be used
to add a little variety to a web page. The default browser font on most computers is a
Serifed font (Times style) and can become quite boring if used on many pages.
If a Sans Serif font (no serifs) is selected it is usually Arial, because this font is installed with MS Windows. However, what about the Macintosh computers that do not have Arial - all the carefully laid out text would appear and layout very differently.
|To overcome this, recent
browsers can read multiple 'face=' values and search the surfers computer for any of the
fonts listed. i.e. If Arial is not available, then a Mac will probably have Helvetica (default
install) or at least a system font called Geneva. If none of these names are
available then the browser will again default to a serifed style, but at least you have
improved the odds markedly.
(The major W3C members are working on ways to allow the use of many font styles in the future)
top of page
|Tags outside the <body>|
|A text file that
only includes the <body></body> tag, and objects within it, will work with
most modern browsers. However, tags that appear outside <body> can be very important
and we will look at some of them so that you can understand why they appear in html code.
|Always good policy, include the <html> tag at the top (and bottom) of the code. This tag tells the browser that all of the included information is indeed an HTML described page. A few older browsers require this statement.|
|The <head> tag tells the browser, and the reader, that all of the information included has a special meaning separate to the described page itself. Tags nested within can include <meta> (various attributes) and <title> (very important) - see below. Other tags found within the <head></head> tag are BASE, LINK, RANGE, STYLE, and ISINDEX (only style and base are used very much today).|
|There is no
single definition for the <meta> tag although it is a recognised tag. I have listed
just a few common ones above and others will be found as you view the source code of sites
that you visit. Mostly, Meta Tags contain information for other servers to access.
example: My Home page includes one especially for my RSACi site rating that can be accessed or read by their server. Otherwise it means nothing.
The majority of Internet Service Providers do not have access to the WWW backbone (direct connection to other countries telephone systems) so they use the services of another ISP that does, but this usually results in a slower service. To speed things up, use less line time (bandwidth) etc., the major ISP often caches all the files from the secondary ISP and these are what are supplied to the surfers. To make this work and try and force any updates to be saved in the cache, many pages supply a date when the current page will expire and needs to be refreshed, thus the major ISP will not have to connect to the other server all the time, only on the expiry dates (in principal anyway!).
simple <meta> command (shown above) sets this up.
|<meta name="description" content="???">
<meta name="keywords" content="???, ?????, ????">
|Two very important Meta Tags have the "description" and "keywords" attributes (above). Part of a Web Publishers job besides designing (the fun part) is the posting of the site/page information into the major Search Engine data bases in the hope of surfers finding the site by 'searching'. The "description content" should appear on the 'search results' page and hopefully will attract the interest of the person performing the search.|
But the most important of these two Meta Tags uses the "keywords" option that will hopefully match the 'searching surfers' input. Multiple "keywords-content" words are allowed and sometimes a little imagination has to be applied.
My 'HTML Links' page lists an excellent site that
checks nominated URLs and makes suggestions for improving the search chances. Realise that
pages are best 'manually' submitted to the data bases. Once submitted, the search engines
"robot" will access your site and trace all pages via your hyperlinks. One
cannot totaly rely on the 'Search Robots' to find and categorise site pages, and for some
search engines it may take several months to get to you! Every major search engine
works in a different way, with different priorities and indexing methods. To prepare a
page for searching, you need to do a lot of studying on the subject; news groups and
regular newsletters are a good source.
The <title> tag serves three important purposes.
Firstly the browser will display the text entered between the opening and closing tag, on
the main bar at the top of the browser window.
2. Secondly, this same text is what surfers will see listed if they Bookmark the page or save the URL as a Favourite.
3. Thirdly, a successful 'Search Results' page will display the text as a title above (or next to) the "description-content" information described above.
top of page
|Changing Page Names ? What you MUST do!|
|Wanting to change the names
of any pages after you have successfully added them to the search engines presents
quite a dilemma. You must leave your old page name installed for four or five months at
least. And the old trick of duplicating pages with different names to try and increase the
search engine ratings is fast coming to a disastrous end too; in fact you face the possibility
of being permanently banned from some of the major search engines until you change your
A sad but understandable fact because of the "desperates" that try to Spam the engines with repeated multiple submissions. And it has happened to innocent webmasters too!
The answer is simple and takes little work. As an example, I wanted to change many page name extensions over to .shtml so I could place Server Side Include (SSI) cgi tags into those pages.
The following code was used to replace the contents of the original files AFTER I had duplicated them with the new name extensions. The major search engine robots accept and will follow the redirect Meta tag, <meta HTTP-EQUIV="Refresh" CONTENT=".....>, so any clicks from a search page list on my old page name will automatically redirect the browser to the new page; the Search Engine is happy, hopefully the visitor understands and is happy, and so am I!.
The CONTENT= attribute includes the number of seconds until the new page is called (1) and the URL to that new page. Note that you can use relative paths, ie '../newpage.shtml' etc, and the attribute defined by the quotes (") separates the wait time and url with a semicolon plus space (; ). The separate 'Frames' tutorial includes active "meta refresh" tags used for a very different purpose.
It is also wise to include a hyperlink on this page because there are still many browsers that do not handle auto redirects. Naturally you need to replace the page name with your own in each case.
comment: The major search engine robots do not like frames and refuse to trace links from master Frame Sets. The "Search Engine Preparation" page explains the proven basics that will get you well into the search engine indexes, including how to get around the frames problem.
top of page
|The Important DUMMY PAGE ?? or how do you get a page when you don't ask for it!|
|When you first set up a
site, your Hosting service will (should) advise you that the main or 'Home' page, in your
root directory, should be named index.html or default.html etc. If the home page is named
anything other than what the server is set up for, then when a visitor arrives at a site
using the domain URL and not the full URL, they will be handed all site files on a
platter, so to speak, with an FTP style listing of all the sites file and directory names.
Then they can just click on any file and download it, even if you have not actually
published the files for viewing by hyperlinks.
a FULL URL that includes a source file name:
Therefore, the server is set up to automatically send the default file, if it exists, when a full URL is not used. That sounds logical and most web site owners know this. BUT what is often overlooked is the placement of dummy 'index.htm' pages in ALL directories other than the root directory, to hide any unused or perhaps personal pages AND images!
To help you understand this I have set up
such a directory that you can access, via a new window, by using this "short
Because I use my cgi site counter on most pages (you can get it free in the "Scripts Lab"), I want my home page name separated from the dummy pages. That is why I have suggested index.html and index.htm in my comments above - same name, different extension.
All you have to do is create a simple
response page, named as required by your server, and copy it into every other directory
you either have or create in the future. If you have not realised it yet, a further option is to
include in the dummy page, the Meta Refresh tag as explained in the topic above, making the refresh
hyperlink point to the primary page in the directory incorrectly accessed. Simple but very effective!
Simple but very effective!
top of page
|Which HTML Version ? The Nightmare|
|Buying a WYSIWYG HTML Editor
late in 1997 will initially ease the task of web page design considerably. There are many
self proclaimed "Purists" around that will not touch them with a forty foot pole
and insist that the only way to create web pages is by typing out all of the text by hand.
In between there are some excellent software packages that make life a little easier by
enabling the placement of tags and attributes by clicking on a supplied list, and
generally automating the text entry method.
However, I believe the hardest decision is one where we have to take into account not only which browser should we support (the surfers browsers, not our own) but which version of the HTML language, and then whose version of a version. Confused? I will try to explain.
Currently under review is HTML version 4 produced by the W3C (use the HTML Links page to access their site). Over the years the various versions came along but just a few years ago someone designed our first non text-only based WYSIWYG browser. Suddenly every one wanted one and a system of information transmission that had been around for boffins, scientists and professors for many years seemed to appear over night. Driven by the sudden success, a race was on to make lots of money supplying the world with new Browsers. Bill Gates made one of his rare 'big mistakes' and brushed the whole idea off as a fad, but Netscape correctly went for it.
When Netscape finally produced a good/useful browser, HTML version 3.0 was being mooted and was eventually proposed as the next standard. Netscape responded by producing a browser that would accept HTML 3.0 and pushed it onto the market before HTML 3.0 was ratified. Overnight success, except that version 3.0 failed, was rejected by the W3C and a new proposal called HTML 3.2 emerged.
Microsoft were behind and did not rush into using a proposed protocol, so their equivalent to Netscape was accepted HTML version compliant. Unfortunately perhaps, both major software producers added a few tags of their own to HTML, and Netscape had created the need for "Plugins" that has produced a whole new industry. Although Internet Explorer is recognised as the better and often faster browser (better cache access etc.), Netscape still maintains the larger market share.
From time to time I insert an up-to-date copy of my log analysers "Browser and O/S" report page. You can view it here and return by your Back Arrow.
So which browser, and what browser version do we assume the world will view our pages with. A browser that accepts most of HTML 2.0 or 'maverick HTML 3.0' or HTML 3.2. A site may not be very successful if arrogance succeeds, we use all the wizzbang toys, and stop many millions of people around the world from viewing our pages properly through the use of attributes and toys that - i.e. allow text to scroll across the screen (MS I.E.) on one browser but not another - makes text blink on and off - or perhaps disastrously lays out Tables very differently between browsers because Netscape added a few tricks of their own. The use of Frames is quite common but try viewing Borderless Frames on Netscape 2.2, or viewing a design on MS I.E. that includes unfamiliar table attributes.
The better web HTML tutorials list tags with version implementation notes plus Netscape-only warnings.
It is acknowledged that there are many millions of surfers who cannot use the latest browsers or just don't want to bother updating because they use the Internet for fast information access, and have no interest in the pretty toys. If the pages you create are for a business (commercial), then they need to be viewed by all to be successful. Many web designers create beautiful sites adhering only to HTML 2 specs.
(Finally, "Alt=" is an attribute that can be added to a bitmap <img src=.....> tag. It allows the inclusion of a short description that appears when a browser cannot display bitmaps. Surveys have found that a very large number of people do, and want to, surf the web with images switched off.)
top of page
|Over 120 pages: All major topics divided into Classrooms|
|Free Backgrounds & Buttons!||DTP and HTML||"My First Page" HTML lessons|
|Tutorial Text Search||Perl CGI Scripts||Typography & Layout|
|4 pages of Links||Visitors Book||Perl Scripts Forum n/a|
|Free Links page||Feedback Form||Q/A contact Forum|
|pages Designed & Published - Ron F Woolley|
|©1997 '98. Last Revised: Friday, 31 October 2003 22:04|