www.dtp-aus.com

< back

Learn about HTML TAGS - a precursor
 
This page is supplied as a precursor to all my other HTML lesson pages for people intending to learn web page design (tags and tag attributes, etc).

As with all of my free tutorial pages, if you think you have seen it elsewhere about tags and html in general - read it all anyway, as I might include comments and experiences that are often overlooked.

A listing of Common Basic Tags can be accessed by returning to the HTML Code Lab. See separate tutorials for a detailed list of HTML 3.2 Tags, Forms, Tables, Image Maps, Frames.
 

 Tags and Browsers
An HTML page is created as a simple text file (indeed a published survey showed that the program most used for page creation and editing - about 50%+ - is "MS Windows Notepad" which is similar to the Macintosh "Simple Text" text editor).

When a browser requests a page from a server, the server sends the text document to the viewers computer where it is "parsed" by the browser. In other words the browser 'Reads' the text file and looks for familiar instructions called TAGS (per inbuilt references) that the browser then uses to lay out the page, and know where to obtain any objects that are to appear on it.
  

<HTML>
<
HEAD>
<
META>
<
TITLE>
<
BODY>
<
H6>
<
HR>
<
STRONG>
<
EM>
etc.
All tags must be enclosed in left and right arrows (<>) and have to be spelled correctly for the browser to recognise them. Otherwise the incorrect code could appear on the page with the normal text and graphics. When referring to tags, we assume the arrows are included. The text can be upper or lowercase.

Many tags have multiple attributes that can be included within the tag - i.e. <p> can also appear as <p align="centre"> - and also many require a closing tag which is a copy of itself, without the attributes, but includes the "/" character. - i.e. <p align="centre"> ...... </p>. The only spaces allowed within a tag are the separators between attributes. Attribute values should be enclosed in full quotes.

Not all tags have to combine with a closing tag (A few tags will not even work with a closing tag).

<BODY
 bgcolor="#ffffff"  background="Pic?"
 text="#000000"
 link="#0000ff"
 vlink="#800080"
 alink="ff0000"
>
Tags may include attributes although most are not mandatory. It takes a lot of study and regular use to learn them all, but a printed reference list should overcome the problem.

The examples shown on the left are attributes that can be included in the <body> tag. They are: Background colour of the whole page. Name of a background image tiled over the whole page. The colour of all body text (unless otherwise coloured). The colour of Hyperlinked text. The colour representing Hyperlinks that have been visited. The colour of an active Hyperlink as the viewer clicks on it. An onload="???" attribute can be used to auto-start included JavaScript code.

top of page

 NESTING!
Tags must be nested correctly. Refer to the basic tags shown below.
  
<HTML>
  <
HEAD>
    <
META>
    <
TITLE>...</TITLE>
  <
/HEAD>

<
BODY>

   <
H6>....</H6>

   <
HR>

<
/BODY>
<
/HTML>
• <html> and </html> appear at the top and bottom.
• <head> and </head> appears within the <html> tags as do the <body> and </body> tags.
• The <head> and <body> tags also have other tags that open and close within them.

This is called nesting and the patterns must be strictly adhered to.

Even if a WYSIWYG editor is used, we often have to decipher our own - and others - code, or alter it for capabilities not included in the editor used. (using a WYSIWYG editor cannot be considered an excuse for not learning how to read and write HTML code once we venture into complex page layout and objects)

Suggestion: When surfing the web, view the source code from within the browser (usually via the View menu). This will speed up your learning and occasionally introduce a few tricks.

top of page

 REMARKS (the 'Comment' option)
As with most high level languages, the author can include "Comments" if there is a need to make remarks about the code etc. Comments are ignored and will not appear on the page.
  
1. this is hobbledygook
2. <this is hobbledygook>
3. <!-- this is hobbledygook -->
On the left are three numbered lines of code. The first will be parsed as default text and will appear on the page. The second line is enclosed in arrows (a tag), but the tag is unrecognisable, so the whole line will be ignored. The third line uses the standard "Comment" notation with an exclamation mark as the first character. Note that everything within the left and right arrow will be ignored.
<!--
Simple Text
<hr>
<h2>A Heading</h2>
-->
The last lines show how a commnt tag can be used to completely hide other objects. The arrows do not have to appear on the same line, but they do have to be there, and nested correctly if they are within other tags.

Try this code in a text editor (without the line numbers). Save and load the results into a browser. Delete the comment tag arrows and view again - experiment.

top of page

 Naming FONTS
Because of broad acceptance, the <font>...</font> tag has been retained as an HTML standard. Attributes for this tag include color="#RRGGBB", size="?" and face="fontname".
  
<font
  color="#FF0000"
  size="5"
  face="Arial">
</font>
The face= option can be used to add a little variety to a web page. The default browser font on most computers is a Serifed font (Times style) and can become quite boring if used on many pages.

If a Sans Serif font (no serifs) is selected it is usually Arial, because this font is installed with MS Windows. However, what about the Macintosh computers that do not have Arial - all the carefully laid out text would appear and layout very differently.

<font
face="Arial,Helvetica,Geneva">
</font>
To overcome this, recent browsers can read multiple 'face=' values and search the surfers computer for any of the fonts listed. i.e. If Arial is not available, then a Mac will probably have Helvetica (default install) or at least a system font called Geneva. If none of these names are available then the browser will again default to a serifed style, but at least you have improved the odds markedly.

(The major W3C members are working on ways to allow the use of many font styles in the future)

top of page

 Tags outside the <body>
A text file that only includes the <body></body> tag, and objects within it, will work with most modern browsers. However, tags that appear outside <body> can be very important and we will look at some of them so that you can understand why they appear in html code.
  
<html>....</html>
Always good policy, include the <html> tag at the top (and bottom) of the code. This tag tells the browser that all of the included information is indeed an HTML described page. A few older browsers require this statement.
<head>....</head>
The <head> tag tells the browser, and the reader, that all of the information included has a special meaning separate to the described page itself. Tags nested within can include <meta> (various attributes) and <title> (very important) - see below. Other tags found within the <head></head> tag are BASE, LINK, RANGE, STYLE, and ISINDEX (only style and base are used very much today).
<meta>

<meta http-equiv="Expires" content="when">
notifies caching servers when to refresh the page
<meta name="Author" content="who">
use obvious - copyright or development team member info
<meta http-equiv="Pragma" content="no-cache">
notifies browsers and servers not to save in cache, and always access original file
<meta http-equiv="Refresh" content="4; URL=link.url">
content = seconds - refreshes the page with URL (active samples in the Frames tutorial)

There is no single definition for the <meta> tag although it is a recognised tag. I have listed just a few common ones above and others will be found as you view the source code of sites that you visit. Mostly, Meta Tags contain information for other servers to access.

example: My Home page includes one especially for my RSACi site rating that can be accessed or read by their server. Otherwise it means nothing.

 
The majority of Internet Service Providers do not have access to the WWW backbone (direct connection to other countries telephone systems) so they use the services of another ISP that does, but this usually results in a slower service. To speed things up, use less line time (bandwidth) etc., the major ISP often caches all the files from the secondary ISP and these are what are supplied to the surfers. To make this work and try and force any updates to be saved in the cache, many pages supply a date when the current page will expire and needs to be refreshed, thus the major ISP will not have to connect to the other server all the time, only on the expiry dates (in principal anyway!).

A simple <meta> command (shown above) sets this up.
(you must re-publish such pages with the next expiry date just before this happens otherwise any changes will not be available and your pages may be deleted from the cache - check with your service provider if you do not know if their server is cached by a primary server or not)
  

<meta name="description" content="???">

<meta name="keywords" content="???, ?????, ????">

Two very important Meta Tags have the "description" and "keywords" attributes (above). Part of a Web Publishers job besides designing (the fun part) is the posting of the site/page information into the major Search Engine data bases in the hope of surfers finding the site by 'searching'. The "description content" should appear on the 'search results' page and hopefully will attract the interest of the person performing the search.
  
But the most important of these two Meta Tags uses the "keywords" option that will hopefully match the 'searching surfers' input. Multiple "keywords-content" words are allowed and sometimes a little imagination has to be applied.

My 'HTML Links' page lists an excellent site that checks nominated URLs and makes suggestions for improving the search chances. Realise that pages are best 'manually' submitted to the data bases. Once submitted, the search engines "robot" will access your site and trace all pages via your hyperlinks. One cannot totaly rely on the 'Search Robots' to find and categorise site pages, and for some search engines it may take several months to get to you! Every major search engine works in a different way, with different priorities and indexing methods. To prepare a page for searching, you need to do a lot of studying on the subject; news groups and regular newsletters are a good source.
  

<title>....</title>

The <title> tag serves three important purposes.

1. Firstly the browser will display the text entered between the opening and closing tag, on the main bar at the top of the browser window.
2. Secondly, this same text is what surfers will see listed if they Bookmark the page or save the URL as a Favourite.
3. Thirdly, a successful 'Search Results' page will display the text as a title above (or next to) the "description-content" information described above.
  
<script language="???"><!--
function .......
//-->
</script>
If, and only if, the surfers browser is JavaScript Enabled then any scripting code included between the <script> tags will be parsed, but by a separate interpreter within the browser.
When viewing and learning from other web pages, the <script> tag is often encountered. JavaScript (or JScript) coding is not a part of the HTML protocol but the <script> tag has been included in HTML for some time. Often certain script 'functions' are placed above the <body> tag, while the main code sections appear within the 'body' section.
  
At this point note the normal HTML "Remark" tags <!--....--> shown in the simple code above. Should a browser that is "JavaScript Challenged"! come across the script code, then after recognising the <script> tag, all other code between <script> and </script> will be ignored without error. If JavaScript is enabled however, then the <!-- and --> will be ignored by the JavaScript interpreter. (A viewer could end up with a page full of script if this important option is not included.)

top of page

Changing Page Names ? What you MUST do!
Wanting to change the names of any pages after you have successfully added them to the search engines presents quite a dilemma. You must leave your old page name installed for four or five months at least. And the old trick of duplicating pages with different names to try and increase the search engine ratings is fast coming to a disastrous end too; in fact you face the possibility of being permanently banned from some of the major search engines until you change your domain name.

A sad but understandable fact because of the "desperates" that try to Spam the engines with repeated multiple submissions. And it has happened to innocent webmasters too!

The answer is simple and takes little work. As an example, I wanted to change many page name extensions over to .shtml so I could place Server Side Include (SSI) cgi tags into those pages.

The following code was used to replace the contents of the original files AFTER I had duplicated them with the new name extensions. The major search engine robots accept and will follow the redirect Meta tag, <meta HTTP-EQUIV="Refresh" CONTENT=".....>, so any clicks from a search page list on my old page name will automatically redirect the browser to the new page; the Search Engine is happy, hopefully the visitor understands and is happy, and so am I!.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta HTTP-EQUIV="Refresh" CONTENT="1;
URL=http://dtp-aus.com/newpage.shtml">

<title>New Page Name Redirection</title>
</head>

<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080" text="#003366" alink="#FF0000">

<p align="center">
<font face="arial,helvetica,geneva">
  Redirection to <a href="http://dtp-aus.com/newpage.shtml">New Page Name</a>
</font></p>
</body>
</html>

The CONTENT= attribute includes the number of seconds until the new page is called (1) and the URL to that new page. Note that you can use relative paths, ie '../newpage.shtml' etc, and the attribute defined by the quotes (") separates the wait time and url with a semicolon plus space (; ). The separate 'Frames' tutorial includes active "meta refresh" tags used for a very different purpose.

It is also wise to include a hyperlink on this page because there are still many browsers that do not handle auto redirects. Naturally you need to replace the page name with your own in each case.

comment: The major search engine robots do not like frames and refuse to trace links from master Frame Sets. The "Search Engine Preparation" page explains the proven basics that will get you well into the search engine indexes, including how to get around the frames problem.

top of page

The Important DUMMY PAGE ?? or how do you get a page when you don't ask for it!
When you first set up a site, your Hosting service will (should) advise you that the main or 'Home' page, in your root directory, should be named index.html or default.html etc. If the home page is named anything other than what the server is set up for, then when a visitor arrives at a site using the domain URL and not the full URL, they will be handed all site files on a platter, so to speak, with an FTP style listing of all the sites file and directory names. Then they can just click on any file and download it, even if you have not actually published the files for viewing by hyperlinks.

• a FULL URL that includes a source file name:
http://www.dtp-aus.com/dummydir/wombats.html.
• a SHORT URL that does not include a source file name:

http://www.dtp-aus.com/dummydir/.

Therefore, the server is set up to automatically send the default file, if it exists, when a full URL is not used. That sounds logical and most web site owners know this. BUT what is often overlooked is the placement of dummy 'index.htm' pages in ALL directories other than the root directory, to hide any unused or perhaps personal pages AND images!

To help you understand this I have set up such a directory that you can access, via a new window, by using this "short URL":
http://www.dtp-aus.com/dummydir/.
Then you can access my main 'images' directory with another "short URL", again via a new window that you can also close to return here:
http://www.dtp-aus.com/images/.

Because I use my cgi site counter on most pages (you can get it free in the "Scripts Lab"), I want my home page name separated from the dummy pages. That is why I have suggested index.html and index.htm in my comments above - same name, different extension.

All you have to do is create a simple response page, named as required by your server, and copy it into every other directory you either have or create in the future. If you have not realised it yet, a further option is to include in the dummy page, the Meta Refresh tag as explained in the topic above, making the refresh hyperlink point to the primary page in the directory incorrectly accessed.

Simple but very effective!

top of page

 Which HTML Version ? The Nightmare
Buying a WYSIWYG HTML Editor late in 1997 will initially ease the task of web page design considerably. There are many self proclaimed "Purists" around that will not touch them with a forty foot pole and insist that the only way to create web pages is by typing out all of the text by hand. In between there are some excellent software packages that make life a little easier by enabling the placement of tags and attributes by clicking on a supplied list, and generally automating the text entry method.

However, I believe the hardest decision is one where we have to take into account not only which browser should we support (the surfers browsers, not our own) but which version of the HTML language, and then whose version of a version. Confused? I will try to explain.

Currently under review is HTML version 4 produced by the W3C (use the HTML Links page to access their site). Over the years the various versions came along but just a few years ago someone designed our first non text-only based WYSIWYG browser. Suddenly every one wanted one and a system of information transmission that had been around for boffins, scientists and professors for many years seemed to appear over night. Driven by the sudden success, a race was on to make lots of money supplying the world with new Browsers. Bill Gates made one of his rare 'big mistakes' and brushed the whole idea off as a fad, but Netscape correctly went for it.

When Netscape finally produced a good/useful browser, HTML version 3.0 was being mooted and was eventually proposed as the next standard. Netscape responded by producing a browser that would accept HTML 3.0 and pushed it onto the market before HTML 3.0 was ratified. Overnight success, except that version 3.0 failed, was rejected by the W3C and a new proposal called HTML 3.2 emerged.

Microsoft were behind and did not rush into using a proposed protocol, so their equivalent to Netscape was accepted HTML version compliant. Unfortunately perhaps, both major software producers added a few tags of their own to HTML, and Netscape had created the need for "Plugins" that has produced a whole new industry. Although Internet Explorer is recognised as the better and often faster browser (better cache access etc.), Netscape still maintains the larger market share.

From time to time I insert an up-to-date copy of my log analysers "Browser and O/S" report page. You can view it here and return by your Back Arrow.

So which browser, and what browser version do we assume the world will view our pages with. A browser that accepts most of HTML 2.0 or 'maverick HTML 3.0' or HTML 3.2. A site may not be very successful if arrogance succeeds, we use all the wizzbang toys, and stop many millions of people around the world from viewing our pages properly through the use of attributes and toys that - i.e. allow text to scroll across the screen (MS I.E.) on one browser but not another - makes text blink on and off - or perhaps disastrously lays out Tables very differently between browsers because Netscape added a few tricks of their own. The use of Frames is quite common but try viewing Borderless Frames on Netscape 2.2, or viewing a design on MS I.E. that includes unfamiliar table attributes.

The better web HTML tutorials list tags with version implementation notes plus Netscape-only warnings.

Initially use your design skills to create unique pages that include few if any tricks and toys. Although a lot of extra work, a method found on many sites is to offer 'Frames' and 'Non Frames' alternatives, 'Complex Tables' and 'No Tables' alternatives etc., and if you progress to Java or JavaScript understand that only the most recent of the major browsers can handle it.

It is acknowledged that there are many millions of surfers who cannot use the latest browsers or just don't want to bother updating because they use the Internet for fast information access, and have no interest in the pretty toys. If the pages you create are for a business (commercial), then they need to be viewed by all to be successful. Many web designers create beautiful sites adhering only to HTML 2 specs.

(Finally, "Alt=" is an attribute that can be added to a bitmap <img src=.....> tag. It allows the inclusion of a short description that appears when a browser cannot display bitmaps. Surveys have found that a very large number of people do, and want to, surf the web with images switched off.)

top of page

< back


Over 120 pages: All major topics divided into Classrooms
Free Backgrounds & Buttons! DTP and HTML "My First Page" HTML lessons
Tutorial Text Search Perl CGI Scripts Typography & Layout
4 pages of Links Visitors Book Perl Scripts Forum n/a
Free Links page Feedback Form Q/A contact Forum

pages Designed & Published - Ron F Woolley
e-mail 1997 '98. Last Revised:  Friday, 31 October 2003 22:04