Welcome to iraf.net Monday, May 13 2024 @ 02:54 AM GMT


 Forum Index > Help Desk > Systems New Topic Post Reply
 gawk script to detect problems in IRAF help files
   
Jason Quinn
 06/04/2008 04:51PM (Read 3076 times)  
+++++
Active Member

Status: offline


Registered: 04/07/2006
Posts: 175
Fitz, at one time you expressed interest in having a script that checks the formatting of IRAF's help files. I wrote a script that helps detect and prevent bad markup. It parses tags and reports the total number of open tags, close tags, their "sum" (technically difference), and the maximum and minimum depth of nesting. It can be ran on a bunch of files at once. The script is written for gawk. It only checks one pair of tags at a time and it doesn't tell you [i:15f3cf2d84]where[/i:15f3cf2d84] any problems it detects are. I have another script that does that but it's not very good yet. The parsing script is fairly robust but occasionally reports false positives like when a filename just happens to have the tag characters in it. The script doesn't "know" troff; it just looks for tags and counts them and so is more generally usual and works on HTML files and so on. Here is what a typical run looks like for IRAF help files:[code:1:15f3cf2d84]
jquinn@aries>gawk -f parse.awk -v starttag=.ls -v endtag=.le *.hlp
access.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
back.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
chdir.hlp opentags= 1 closetags= 0 sum= 1 maxdepth= 1 mindepth= 0
WARNING: Possibly too many open tags.
clear.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
cl.hlp opentags= 14 closetags= 14 sum= 0 maxdepth= 1 mindepth= 0
commands.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0
cursors.hlp opentags= 0 closetags= 0 sum= 0 maxdepth= 0 mindepth= 0
decls.hlp opentags= 13 closetags= 13 sum= 0 maxdepth= 3 mindepth= 0
WARNING: Check tag nesting.
defpac.hlp opentags= 3 closetags= 3 sum= 0 maxdepth= 1 mindepth= 0
dparam.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
error.hlp opentags= 2 closetags= 1 sum= 1 maxdepth= 1 mindepth= 0
WARNING: Possibly too many open tags.
flprcache.hlp opentags= 1 closetags= 1 sum= 0 maxdepth= 1 mindepth= 0
for.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0
fprint.hlp opentags= 4 closetags= 4 sum= 0 maxdepth= 1 mindepth= 0Searched for starttag=".ls" and endtag=".le".
[/code:1:15f3cf2d84]There's only a handful of pairs of tags needed to check most of IRAF's help files. Generally, I move into a "doc" directory and run the following four commands:[code:1:15f3cf2d84]
gawk -f parse.awk -v starttag=.help -v endtag=.endhelp *.hlp
gawk -f parse.awk -v starttag=.ls -v endtag=.le *.hlp
gawk -f parse.awk -v starttag=.nf -v endtag=.fi *.hlp
gawk -f parse.awk -v starttag=\fI -v endtag=\fR *.hlp
[/code:1:15f3cf2d84]As always in UNIX, you have to be very careful about special characters being interpreted by the shell for commands. If your starttag or endtags have these, you may need to use escapes and shell quoting. The last line of the script output tells the user exactly what strings were used for the start and end tags because of this issue. The last example above has potential shell quoting problems because of the slash. If you run into trouble with them, one way to protect them in tcsh would look like this:[code:1:15f3cf2d84]
gawk -f parse.awk -v starttag="\\fI" -v endtag="\\fR" *.hlp
[/code:1:15f3cf2d84]JasonPS Fitz, I have another batch of spelling fixes on their way shortly. It will take about 2 more batches before the spelling fix project is completed. I will correct the tagging in those. I already corrected the tagging in the previous tarballs I sent to you using this script so the IRAF half of the help files should have pretty good tag-structure.Here is the full "parse.awk" source:
[code:1:15f3cf2d84]
#gawk -f parse.awk -v starttag=value1 -v endtag=value2
#original author Jason L. Quinn
#version 1.01 04-June-2008
#----added maxfilenamesize for prettier output
#----changed output formating slightly
#USER NOTE: getting past the shell for certain characters may require quoting and escaping.
BEGIN {
nf=(ARGC-1)
if ( nf==0 )
error("Usage: gawk -f parse.awk -v starttag=value -v endtag=value filename(s)")
if ( starttag=="" || endtag=="" )
error("ERROR: You must set the starttag and endtag variables.")if ( starttag==endtag )
error("ERROR: starttag and endtag were identical.")maxfilenamesize=0#This just gets the length of the longest inputfile name for pretty output
for ( kf=1; kf<=nf; kf++ )
{
file=ARGV[kf]
if ( length(file)>maxfilenamesize )
maxfilenamesize=length(file)
}#Read in the input files.
for ( kf=1; kf<=nf; kf++ )
{
n=0
numopen=0
numclose=0
maxdepth=0
mindepth=0
file=ARGV[kf]
while ( (getline<file) > 0 )
{
for( i=1;i<=length($0);i++ )
{
if ( substr($0,i,length(starttag))==starttag )
{
n++
numopen++
if ( n> maxdepth )
maxdepth=n
}
if ( substr($0,i,length(endtag))==endtag )
{
n--
numclose++
if ( n < maxdepth )
mindepth=n
}
}
}
printf("%"maxfilenamesize"s opentags=%3g closetags=%3g sum=%3g maxdepth=%3g mindepth=%3g",file,numopen,numclose,n,maxdepth,mindepth)
if ( n<0 || n>0 || maxdepth>1 || mindepth<0 )
printf("\n%"maxfilenamesize"s WARNING: ","")
if ( n<0 )
printf "Possibly too many close tags. "
if ( n>0 )
printf "Possibly too many open tags. "
if ( maxdepth>1 || mindepth<0 )
printf "Check tag nesting."
printf "\n"
}
}END {
printf("\nSearched for starttag=\"%s\" and endtag=\"%s\".\n",starttag,endtag)
}function error(msg)
{
print msg
exit 1
}
[/code:1:15f3cf2d84]

 
Profile Email
 Quote
   
Content generated in: 0.04 seconds
New Topic Post Reply

Normal Topic Normal Topic
Sticky Topic Sticky Topic
Locked Topic Locked Topic
New Post New Post
Sticky Topic W/ New Post Sticky Topic W/ New Post
Locked Topic W/ New Post Locked Topic W/ New Post
View Anonymous Posts 
Anonymous users can post 
Filtered HTML Allowed 
Censored Content 
dog allergies remedies cialis 20 mg chilblain remedies


Privacy Policy
Terms of Use

User Functions

Login