Microsoft
Software
Hardware
Network
Question : Bash scripts: how to strip these html tags
Hi experts,
Here is a html file. How can I print the html tags and "common words" such as "in", "the", "a" into one file "tag_file.txt"; and print others into another file "content_file.txt"?
I use Bash scripts. I am a newbie in Bash field. Any help is highly appreciated!
For example: in the code snippt:
All the html tags such as "
" and "the" are all printed to "tag_file.txt" whereas the rest should be printed to "content_file.txt"
Code Snippet:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
3019337
story
the element is +
-
Open in New Window
Select All
Answer : Bash scripts: how to strip these html tags
Do you mean like this?
tr ' >' '\n' < file | grep '<\|\<\(in\|the\|a\)\>' > tag_file.txt
Random Solutions
drive mapping label wrong
Can't access partition from external enclosure. Need help ASAP!
PPP defaultroute problem(s) (?)
LSA problems
The citrix Metaframe server you have selected is not accepting connections.
SEO Tags
How to access Internet with Site-to-Site VPN established
MS SQL Server 2005 Reindex Software
MS_WORD "LINE" question.
enable domain users to take off proxy settings in IE7