Concatenate HTML Files
Description
This script combines a number of HTML files into one. The beginning of the
first file (up to and including <body ...>) is used for all
the files since only their bodies are concatenated. An optional divider
followed by the label of a file is used between files.
Note that:
-
the original files must conform to HTML conventions; if necessary use
htmlfix first to correct major problems
-
<body ...> and </body> must be on a line
of their own; any other information on these lines will be lost
- in anchors, href="..." and name="..." must
be not be split across a line
- any material after "</body>" (such as HTML comments) will
be lost
-
the script might get confused by a symbolic directory index link or
references to files in remote directories (though it does its best)
-
if you move the concatenated HTML file, remember to move any other local
files (e.g. images) to the same relative location (e.g. the same directory)
-
for use with a frame-based collection of files, exclude the
frameset definition file from the list of inputs and probably
start with a contents file
Options
The command line options are:
-
-d
-
print divider between concatenated files
-
-h
-
print usage as help
-
-o file
-
name output file (this will be ignored if present in the input list, e.g.
due to giving *.html)
-
-s
-
sort input files into case-insensitive alphabetical order (putting the index
file first if necessary, and removing the file it points to from the inputs
if it is a symbolic link)
Usage
Run on one or more HTML files. Warning messages are sent to standard error.
Examples of usage are:
-
htmlcat -o some.html def.html res.html
-
concatenate def.html and res.html to
some.html
-
htmlcat -d -o all.html *.html
-
concatenate all HTML files to all.html with dividers between them
-
htmlcat -o -s out.html *.html
-
sort then concatenate all HTML files to out.html
-
htmlcat *.html > /tmp/all.html
-
concatenate all HTML files to standard output (here
/tmp/all.html); for this method, do not create a concatenated
file in the same directory or the script will run indefinitely on its own
output!
The only things likely to need changed for installation are the directory
index filename and the nature of a file divider (see customise
subroutine in the code). Change the first line of the script according to
where Perl is located. Although tested with Perl5, the
script may work with only minor changes for Perl4.
Licence
htmlcat is free software, distributed under the GNU Public License
Version 2. You may re-distribute this software provided you preserve this
README file. The contents of this package may be used freely for
non-commercial purposes provided this README file and copyright notices are
retained. Copyright remains with the author. No warranties are given as to
the accuracy or suitability of this package.
History
First public version Ken Turner <kjt@cs.stir.ac.uk>, 21st November
1998