Concatenate HTML Files
See the download page to obtain this
program
Description
This script combines a number of HTML files into one. The beginning of
the first file (up to and including <body ...>) is used
for all the files since only their bodies are concatenated. An optional
divider followed by the label of a file is used between files.
Note the following limitations. Some of these are fixable, but the author has not worked on the code for a long time.
-
Run the script from the highest level directory in which HTML files are to
be concatenated. If a parent directory of this is used, the
cross-references may be wrong.
-
The code has been developed and tested in Unix-like environments (various
flavours of Unix and CygWin on Windows). Use on MS Windows may cause
problems as follows. Drive prefixes should be avoided for files as they
will be embedded in anchors and so will not work correctly. Using
backslashes in file names will cause problems as forward slashes are used
in the generated references to files in child directories.
-
The code relies on the calling shell to expand wildcard filenames like
'*.html'. This is automatic in a Unix shell, but does not happen
at a DOS prompt. For the latter it is therefore necessary to list files
explicitly.
-
The original files must conform to HTML conventions. If necessary use
htmlfix first to correct major
problems.
-
<body ...> and </body> must be on a
line of their own. Any other information on these lines will be lost.
-
In anchors, href="..." and
name="..." must be not be split across a line.
-
Any material after "</body>" (such as HTML comments) will
be lost.
-
The script might get confused by a symbolic directory index link or
references to files in remote directories (though it does its best).
-
If the concatenated HTML file is moved, remember to move any other local
files (e.g. images) to the same relative location (e.g. the same
directory).
-
For use with a frame-based collection of files, exclude the
frameset definition file from the list of inputs and
probably start with a contents file.
Options
The command line options are:
- -d
- print divider between concatenated files
- -h
- print usage as help
- -o file
-
name output file (this will be ignored if present in the input list,
e.g. due to giving *.html)
- -s
-
sort input files into case-insensitive alphabetical order (putting the
index file first if necessary, and removing the file it points to from
the inputs if it is a symbolic link)
Usage
Run on one or more HTML files. Warning messages are sent to standard
error. Examples of usage are:
- htmlcat -o some.html def.html res.html
-
concatenate def.html and res.html to
some.html
- htmlcat -d -o all.html *.html
-
concatenate all HTML files to all.html with dividers between
them
- htmlcat -o -s out.html *.html
-
sort then concatenate all HTML files to out.html
- htmlcat *.html > /tmp/all.html
-
concatenate all HTML files to standard output (here
/tmp/all.html); for this method, do not create a
concatenated file in the same directory or the script will run
indefinitely on its own output!
The only things likely to need changed for installation are the directory
index filename and the nature of a file divider (see customise
subroutine in the code). Change the first line of the script according to
where Perl is located. Although tested with Perl5,
the script may work with only minor changes for Perl4.
Licence
htmlcat is free software, distributed under the GNU Public
License Version 2. You may re-distribute this software provided you
preserve this README file. The contents of this package may be used
freely for non-commercial purposes provided this README file and
copyright notices are retained. Copyright remains with the author. No
warranties are given as to the accuracy or suitability of this package.
History
First public version Ken Turner, 21st November 1998
Up one level to Web Utilities
Last Update: 13th May 2010
URL: https://www.cs.stir.ac.uk/~kjt/software/web/htmlcat.html