Status: offline
Registered: 04/07/2006
Posts: 175
|
Fitz, at one time you expressed interest in having a script that checks the formatting of IRAF's help files. I wrote a script that helps detect and prevent bad markup. It parses tags and reports the total number of open tags, close tags, their "sum" (technically difference), and the maximum and minimum depth of nesting. It can be ran on a bunch of files at once. The script is written for gawk. It only checks one pair of tags at a time and it doesn't tell you [i:15f3cf2d84]where[/i:15f3cf2d84] any problems it detects are. I have another script that does that but it's not very good yet. The parsing script is fairly robust but occasionally reports false positives like when a filename just happens to have the tag characters in it. The script doesn't "know" troff; it just looks for tags and counts them and so is more generally usual and works on HTML files and so on. Here is what a typical run looks like for IRAF help files:[code:1:15f3cf2d84]
jquinn@aries>gawk -f parse.awk -v starttag=.ls -v endtag=.le *.hlp
access.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
back.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
chdir.hlp opentags= 1 closetags= 0 sum= 1 maxdepth= 1 mindepth= 0
WARNING: Possibly too many open tags.
clear.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
cl.hlp opentags= 14 closetags= 14 sum= 0 maxdepth= 1 mindepth= 0
commands.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0
cursors.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
decls.hlp opentags= 13 closetags= 13 sum= 0 maxdepth= 3 mindepth= 0
WARNING: Check tag nesting.
defpac.hlp opentags= 3 closetags= 3 sum= 0 maxdepth= 1 mindepth= 0
dparam.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
error.hlp opentags= 2 closetags= 1 sum= 1 maxdepth= 1 mindepth= 0
WARNING: Possibly too many open tags.
flprcache.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
for.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0
fprint.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0Searched for starttag=".ls" and endtag=".le".
[/code:1:15f3cf2d84]There's only a handful of pairs of tags needed to check most of IRAF's help files. Generally, I move into a "doc" directory and run the following four commands:[code:1:15f3cf2d84]
gawk -f parse.awk -v starttag=.help -v endtag=.endhelp *.hlp
gawk -f parse.awk -v starttag=.ls -v endtag=.le *.hlp
gawk -f parse.awk -v starttag=.nf -v endtag=.fi *.hlp
gawk -f parse.awk -v starttag=\fI -v endtag=\fR *.hlp
[/code:1:15f3cf2d84]As always in UNIX, you have to be very careful about special characters being interpreted by the shell for commands. If your starttag or endtags have these, you may need to use escapes and shell quoting. The last line of the script output tells the user exactly what strings were used for the start and end tags because of this issue. The last example above has potential shell quoting problems because of the slash. If you run into trouble with them, one way to protect them in tcsh would look like this:[code:1:15f3cf2d84]
gawk -f parse.awk -v starttag="\\fI" -v endtag="\\fR" *.hlp
[/code:1:15f3cf2d84]JasonPS Fitz, I have another batch of spelling fixes on their way shortly. It will take about 2 more batches before the spelling fix project is completed. I will correct the tagging in those. I already corrected the tagging in the previous tarballs I sent to you using this script so the IRAF half of the help files should have pretty good tag-structure.Here is the full "parse.awk" source:
[code:1:15f3cf2d84]
#gawk -f parse.awk -v starttag=value1 -v endtag=value2
#original author Jason L. Quinn
#version 1.01 04-June-2008
#----added maxfilenamesize for prettier output
#----changed output formating slightly
#USER NOTE: getting past the shell for certain characters may require quoting and escaping.
BEGIN {
nf=(ARGC-1)
if ( nf==0 )
error("Usage: gawk -f parse.awk -v starttag=value -v endtag=value filename(s)")
if ( starttag=="" || endtag=="" )
error("ERROR: You must set the starttag and endtag variables.")if ( starttag==endtag )
error("ERROR: starttag and endtag were identical.")maxfilenamesize=0#This just gets the length of the longest inputfile name for pretty output
for ( kf=1; kf<=nf; kf++ )
{
file=ARGV[kf]
if ( length(file)>maxfilenamesize )
maxfilenamesize=length(file)
}#Read in the input files.
for ( kf=1; kf<=nf; kf++ )
{
n=0
numopen=0
numclose=0
maxdepth=0
mindepth=0
file=ARGV[kf]
while ( (getline<file) > 0 )
{
for( i=1;i<=length($0);i++ )
{
if ( substr($0,i,length(starttag))==starttag )
{
n++
numopen++
if ( n> maxdepth )
maxdepth=n
}
if ( substr($0,i,length(endtag))==endtag )
{
n--
numclose++
if ( n < maxdepth )
mindepth=n
}
}
}
printf("%"maxfilenamesize"s opentags=%3g closetags=%3g sum=%3g maxdepth=%3g mindepth=%3g",file,numopen,numclose,n,maxdepth,mindepth)
if ( n<0 || n>0 || maxdepth>1 || mindepth<0 )
printf("\n%"maxfilenamesize"s WARNING: ","")
if ( n<0 )
printf "Possibly too many close tags. "
if ( n>0 )
printf "Possibly too many open tags. "
if ( maxdepth>1 || mindepth<0 )
printf "Check tag nesting."
printf "\n"
}
}END {
printf("\nSearched for starttag=\"%s\" and endtag=\"%s\".\n",starttag,endtag)
}function error(msg)
{
print msg
exit 1
}
[/code:1:15f3cf2d84]
|