String Palindrome Program In Cobol: Learn the Basics and Beyond

vedzischlitunijusg
Aug 13, 2023
12 min read

Approach: The string obviously has to be a palindrome, but that alone is not enough. All characters in the string should be symmetric so that their reflection is also the same. The symmetric characters are AHIMOTUVWXY.

Last modified: Fri Nov 27 09:44:51 2020Table of ContentsWhat is a Regular Expression?The Structure of a Regular ExpressionThe Anchor Characters: ^ and $Matching a character with a character setMatch any character with .Specifying a Range of Characters with [...]Exceptions in a character setRepeating character sets with *Matching a specific number of sets with \ and \Matching words with \Backreferences - Remembering patterns with $, $ and \1Potential ProblemsExtended Regular ExpressionsPOSIX character setsPerl ExtensionsThanksRegular Expressions and Extended Pattern MatchingBruce BarnettNote that this was written in 1991, before Linux. In the1980's, it was common to have different sets of regular expressionfeatures with different features. ed(1) was different from sed(1)which was different from vi(1), etc. Note thatSun went through every utility and forced each one to use one of twodistinct regular expression libraries - regular or extended. I wrote this tutorial for Sunusers, and some of the commands discussed are now obsolete. On Linux and other UNIX systems, you might find out that some of thesefeatures are not implemented. Your mileage may vary. Copyright 1991 Bruce Barnett & General Electric CompanyCopyright 2001, 2008, 2013 Bruce Barnett All Rights reserved Original version written in 1994 and published in the Sun ObserverWhat is a Regular Expression?A regular expression is a set of characters that specify a pattern.The term "regular" has nothing to do with a high-fiber diet. It comes from a term used todescribe grammars and formal languages.Regular expressions are used when you want to search for specific linesof text containing a particular pattern.Most of the UNIX utilities operate on ASCII files a line at a time.Regular expressions search for patterns on a single line, and not forpatterns that start on one line and end on another.It is simple to search for a specific word or string of characters. Almost every editor on everycomputer system can do this.Regular expressions are more powerful and flexible.You can search for words of a certain size. You can search for a wordwith four or more vowels that end with an"s". Numbers, punctuation characters, you name it, a regular expression canfind it.What happens once the program you are using finds it is another matter.Some just search for the pattern. Others print out the line containingthe pattern. Editors can replace the string with a new pattern.It all depends on the utility.Regular expressions confuse people because they look a lot like thefile matching patterns the shell uses.They even act the same way--almost. The square brackets are similar, and the asterisk acts similar to, butnot identical to the asterisk in a regular expression.In particular, the Bourne shell, C shell, find, andcpio use file name matching patterns and not regular expressions.Remember that shellmeta-characters are expanded before the shell passes the arguments tothe program. To prevent this expansion, the special characters in a regularexpression must be quoted when passed as an option from the shell.You already know how to do this because I covered this topic in lastmonth's tutorial.The Structure of a Regular ExpressionThere are three important parts to a regular expression. Anchors are used to specify the position of the pattern in relation to a line oftext.Character Sets match one or more characters in a single position.Modifiers specify how many times the previous character set is repeated.A simple example that demonstrates all three parts is the regularexpression"^#*". The up arrow is an anchor that indicates the beginning of the line. The character "#" is a simple character set that matches thesingle character"#". The asterisk is a modifier.In a regular expression it specifies that the previous character setcan appear any number of times, including zero.This is a useless regular expression, as you will see shortly.There are also two types of regular expressions: the"Basic" regular expression, and the"extended" regular expression.A few utilities likeawk andegrep use the extended expression.Most use the "basic" regular expression.From now on, if I talk about a "regular expression," it describes a feature in both types.Here is a table of the Solaris (around 1991) commands that allow you to specify regularexpressions:UtilityRegular Expression TypeviBasic sedBasic grepBasic csplitBasic dbxBasic dbxtoolBasic moreBasic edBasic exprBasic lexBasic pgBasic nlBasic rdistBasic awkExtended nawkExtended egrepExtendedEMACSEMACS Regular ExpressionsPERLPERL Regular ExpressionsThe Anchor Characters: ^ and $Most UNIX text facilities are line oriented. Searching for patternsthat span several lines is not easy to do.You see, the end of line character is not included in the block oftext that is searched.It is a separator.Regular expressions examine the text between the separators.If you want to search for a pattern that is at one end or the other,you useanchors. The character"^" is the starting anchor, and the character"$" is the end anchor.The regular expression"^A" will match all lines that start with a capital A.The expression"A$" will match all lines that end with the capital A.If the anchor characters are not used at the proper end of thepattern, then they no longer act as anchors.That is, the "^" is only an anchor if it is the first character in a regularexpression.The"$" is only an anchor if it is the last character.The expression"$1" does not have an anchor.Neither does"1^". If you need to match a"^" at the beginning of the line, or a"$" at the end of a line, you must escape the special characters with a backslash.Here is a summary:PatternMatches^A"A" at the beginning of a lineA$"A" at the end of a lineA^"A^" anywhere on a line$A"$A" anywhere on a line^^"^" at the beginning of a line$$"$" at the end of a lineThe use of"^" and"$" as indicators of the beginning or end of a line is a conventionother utilities use.Thevi editor uses these two characters as commands to go to the beginning orend of a line.The C shell uses"!^" to specify the first argument of the previous line, and"!$" is the last argument on the previous line.It is one of those choices that other utilities go along with tomaintain consistency.For instance,"$" can refer to the last line of a file when usinged andsed. Cat -e marks end of lines with a"$". You might see it in other programs as well.Matching a character with a character setThe simplest character set is a character.The regular expression"the" contains three character sets:"t," "h" and "e". It will match any line with the string"the" inside it. This would also match the word"other". To prevent this, put spaces before and after the pattern:" the ". You can combine the string with an anchor.The pattern"^From: " will match the lines of a mail message that identify the sender.Use this pattern with grep to print every address in your incoming mail box:grep '^From: ' /usr/spool/mail/$USERSome characters have a special meaning in regular expressions.If you want to search for such a character, escape it with a backslash.Match any character with .The character"." is one of those special meta-characters. By itself it will match any character, except the end-of-linecharacter.The pattern that will match a line with a single characters is^.$Specifying a Range of Characters with [...]If you want to match specific characters, you can use the squarebrackets to identify the exact characters you are searching for.The pattern that will match any line of text that contains exactly onenumber is^[0123456789]$This is verbose.You can use the hyphen between two characters to specify a range:^[0-9]$You can intermix explicit characters with character ranges.This pattern will match a single character that is a letter, number,or underscore:[A-Za-z0-9_]Character sets can be combined by placing them next to each other.If you wanted to search for a word thatStarted with a capital letter "T".Was the first word on a lineThe second letter was a lower case letterWas exactly three letters long, andThe third letter was a vowelthe regular expression would be"^T[a-z][aeiou] ".Exceptions in a character setYou can easily search for all characters except those in squarebrackets by putting a"^" as the first character after the "[". To match all characters except vowels use"[^aeiou]". Like the anchors in places that can't be considered an anchor, thecharacters"]" and"-" do not have a special meaning if they directly follow"[". Here are some examples:Regular ExpressionMatches[]The characters "[]"[0]The character "0"[0-9]Any number[^0-9]Any character other than a number[-0-9]Any number or a "-"[0-9-]Any number or a "-"[^-0-9]Any character except a number or a "-"[]0-9]Any number or a "]"[0-9]]Any number followed by a "]"[0-9-z]Any number, or any character between "9" and "z".[0-9\-a\]]Any number, ora "-", a "a", or a "]"Repeating character sets with *The third part of a regular expression is the modifier.It is used to specify how may times you expect to see the previouscharacter set. The special character"*" matcheszero or more copies.That is, the regular expression"0*" matches zero or more zeros, while the expression"[0-9]*" matches zero or more numbers.This explains why the pattern"^#*" is useless, as it matches any number of "#'s" at the beginning of the line, including zero. Therefore this will match every line, because every line starts withzero or more "#'s". At first glance, it might seem that starting the count at zero isstupid.Not so.Looking for an unknown number of characters is very important.Suppose you wanted to look for a number at the beginning of a line,and there may or may not be spaces before the number. Just use"^ *" to match zero or more spaces at the beginning of the line.If you need to match one or more, just repeat the character set.That is, "[0-9]*" matches zero or more numbers, and"[0-9][0-9]*" matches one or more numbers.Matching a specific number of sets with \ and \You can continue the above technique if you want to specify a minimumnumber of character sets. You cannot specify a maximum number of setswith the"*" modifier. There is a special pattern you can use to specify theminimum and maximum number of repeats. This is done by putting those two numbers between "\" and"\". The backslashes deserve a special discussion.Normally a backslash turns off the special meaning for a character.A period is matched by a "\." and an asterisk is matched by a"\*".If a backslash is placed before a"," "," "," "(," ")," or before a digit, the backslashturns on a special meaning.This was done because these special functions were added late in thelife of regular expressions. Changing the meaning of "{" would have broken old expressions. This is a horrible crime punishableby a year of hard labor writing COBOL programs.Instead, adding a backslash added functionality without breaking oldprograms. Rather than complain about the unsymmetry, view it as evolution.Having convinced you that "\{" isn't a plot to confuse you, an example is in order. The regularexpression to match 4, 5, 6, 7 or 8 lower case letters is[a-z]\4,8\Any numbers between 0 and 255 can be used.The second number may be omitted, which removes the upper limit.If the comma and the second number are omitted, the pattern must beduplicated the exact number of times specified by the first number.You must remember that modifiers like"*" and"\1,5\" only act as modifiers if they follow a character set.If they were at the beginning of a pattern, they would not be a modifier.Here is a list of examples, and the exceptions:Regular ExpressionMatches_*Any line with an asterisk\*Any line with an asterisk\\Any line with a backslash^*Any line starting with an asterisk^A*Any line^A\*Any line starting with an "A*"^AA*Any line if it starts with one "A"^AA*BAny line with one or more "A"'s followed by a "B"^A\4,8\BAny line starting with 4, 5, 6, 7 or 8 "A"'s followed by a "B"^A\4,\BAny line starting with 4 or more "A"'s followed by a "B"^A\4\BAny line starting with "AAAAB"\4,8\Any line with "4,8"A4,8Any line with "A4,8"Matching words with \Searching for a word isn't quite as simple as it at first appears.The string"the" will match the word "other". You can put spaces before and after the letters and use this regularexpression:" the ". However, this does not match words at the beginning or end of the line.And it does not match the case where there is a punctuation markafter the word. There is an easy solution.The characters "\" are similar to the"^" and"$" anchors,as they don't occupy a position of a character.They do "anchor" the expression between to only match if it is on a word boundary.The pattern to search for the word"the" would be"\". The character before the"t" must be either a new line character, or anything except a letter,number, or underscore.The character after the"e" must also be a character other than a number, letter, or underscoreor it could be the end of line character.Backreferences - Remembering patterns with $, $ and \1Another pattern that requires a special mechanism is searching forrepeated words.The expression"[a-z][a-z]" will match any two lower case letters.If you wanted to search for lines that had two adjoining identicalletters, the above pattern wouldn't help.You need a way of remembering what you found, and seeing if the same pattern occurred again.You can mark part of a pattern using"$" and"$". You can recall the remembered pattern with"\" followed by a single digit.Therefore, to search for two identical letters, use"$[a-z]$\1". You can have 9 different remembered patterns. Each occurrence of "$" starts a new pattern.The regular expression that would match a 5 letter palindrome, (e.g. "radar"), would be\([a-z]$$[a-z]$[a-z]\2\1Potential ProblemsThat completes the discussion of the basic regular expression.Before I discuss the extensions the extended expressions offer, Iwanted to mention two potential problem areas.The "\" characters were introduced in thevi editor. The other programs didn't have this ability at that time.Also the"\min,max\" modifier is new and earlier utilities didn't have this ability.This made it difficult for the novice user of regular expressions,because it seemed each utility has a different convention.Sun has retrofited the newest regular expression library to all oftheir programs, so they all have the same ability.If you try to use these newer features on other vendor's machines, youmight find they don't work the same way.The other potential point of confusion is the extent of the patternmatches. Regular expressions match the longest possible pattern.That is, the regular expressionA.*Bmatches "AAB" as well as "AAAABBBBABCCCCBBBAAAB". This doesn't cause many problems usinggrep, because an oversight in a regular expression will just match morelines than desired.If you use sed, and your patterns get carried away, you may end up deleting more thanyou wanted to.Extended Regular ExpressionsTwo programs use the extended regular expressions:egrep and awk. With these extensions, those special characters preceded by a backslashno longer have the special meaning:"\" ,"\","\", "$","$" as well as the"\digit". There is a very good reason for this, which I willdelay explaining to build up suspense.The character"?" matches 0 or 1 instances of the character set before, and thecharacter"+" matches one or more copies of the character set.You can't use the \ and \ in the extended regular expressions,but if you could, you might consider the"?" to be the same as"\0,1\" and the"+" to be the same as"\1,\". By now, you are wondering why the extended regular expressions are even worth using. Except for two abbreviations, there are noadvantages, and a lot of disadvantages.Therefore, examples would be useful.The three important characters in the expanded regular expressions are"(", "", and ")". Together, they let you match achoice of patterns.As an example, you can egrep to print all From: and Subject: lines from your incoming mail:egrep '^(FromSubject): ' /usr/spool/mail/$USERAll lines starting with "From:" or"Subject:" will be printed. There is no easy way to do this with the basicregular expressions. You could try"^[FS][ru][ob][mj]e*c*t*: " and hope you don't have any lines that start with"Sromeet:". Extended expressions don't havethe "\" characters.You can compensate by using the alternation mechanism.Matching the word"the" in the beginning, middle, end of a sentence, or end of a line can bedone with the extended regular expression:(^ )the([^a-z]$)There are two choices before the word, a space or the beginining of aline.After the word, there must be something besides a lower case letter orelse the end of the line.One extra bonus with extended regular expressions is the ability touse the"*," "+," and "?" modifiers after a "(...)" grouping. The following will match"a simple problem," "an easy problem," as well as"a problem". egrep "a[n]? (simple easy )?problem" dataNote the space after both "simple" and "easy".I promised to explain why the backslash characters don't work inextended regular expressions.Well, perhaps the"\...\" and"\" could be added to the extended expressions. These are the newestaddition to the regular expression family. They could be added, butthis might confuse people if those characters are added and the"$...$" are not. And there is no way to add that functionality to the extendedexpressions without changing the current usage. Do you see why?It's quite simple. If"(" has a special meaning, then "$" must be the ordinary character.This is the opposite of the Basic regular expressions,where"(" is ordinary, and"\(" is special.The usage of the parentheses is incompatable, and any change couldbreak old programs.If the extended expression used "( .....)" as regular characters, and"\(...\...$" for specifying alternate patterns, then it is possible to have one setof regular expressions that has full functionality.This is exactlywhat GNU emacs does, by the way.The rest of this is random notes.Regular ExpressionClassTypeMeaning_.allCharacter SetA single character (except newline)^allAnchorBeginning of line$allAnchorEnd of line[...]allCharacter SetRange of characters*allModifierzero or more duplicates\BasicAnchorEnd of word$..$BasicBackreferenceRemembers pattern\1..\9BasicReferenceRecalls pattern_+ExtendedModifierOne or more duplicates?ExtendedModifierZero or one duplicate\M,N\ExtendedModifierM to N Duplicates(......)ExtendedAnchorShows alteration_$...\...$EMACSAnchorShows alteration\wEMACSCharacter setMatches a letter in a word\WEMACSCharacter setOpposite of \wPOSIX character setsPOSIX added newer and more portable ways to search for character sets.Instead of using [a-zA-Z] you can replace 'a-zA-Z' with [:alpha:], or to be more complete. replace [a-zA-Z] with [[:alpha:]].The advantage is that this will match international character sets. You can mix the old style and new POSIX styles, such asgrep '[1-9[:alpha:]]'Here is the fill listCharacter GroupMeaning[:alnum:]Alphanumeric[:cntrl:]Control Character[:lower:]Lower case character[:space:]Whitespace[:alpha:]Alphabetic[:digit:]Digit[:print:]Printable character[:upper:]Upper Case Character[:blank:]whitespace, tabs, etc.[:graph:]Printable and visible characters[:punct:]Punctuation[:xdigit:]Extended DigitNote that some people use [[:alpha:]] as a notation, but the outer '[...]'specifies a character set.Perl ExtensionsRegular ExpressionClassTypeMeaning\tCharacter Settab\nCharacter Setnewline\rCharacter Setreturn\fCharacter Setform\aCharacter Setalarm\eCharacter Setescape\033Character Setoctal\x1BCharacter Sethex\c[Character Setcontrol\lCharacter Setlowercase\uCharacter Setuppercase\LCharacter Setlowercase\UCharacter Setuppercase\ECharacter Setend\QCharacter Setquote\wCharacter SetMatch a "word" character\WCharacter SetMatch a non-word character\sCharacter SetMatch a whitespace character\SCharacter SetMatch a non-whitespace character\dCharacter SetMatch a digit character\DCharacter SetMatch a non-digit character\bAnchorMatch a word boundary\BAnchorMatch a non-(word boundary)\AAnchorMatch only at beginning of string\ZAnchorMatch only at EOS, or before newline\zAnchorMatch only at end of string\GAnchorMatch only where previous m//g left offExample of PERL Extended, multi-line regular expressionm \( ( # Start group [^()]+ # anything but '(' or ')' xThanksThanks to the following who spotted some errorsCharuhas MehendaleRounak JainPeter RenzlandKarl Eric Wenzel Axel Schulze Dennis DetersBryan BergertBrad CoanwoodMichael Siegel This document was translated by troff2html v0.21 on June 27, 2001.

String Palindrome Program In Cobol

Download Zip

2ff7e9595c

Annie Branson's

FASHION BLOGGER

String Palindrome Program In Cobol: Learn the Basics and Beyond

String Palindrome Program In Cobol

Recent Posts

Comments