Microsoft
Software
Hardware
Network
Question : Bash scripts: how to strip these html tags
Hi experts,
Here is a html file. How can I print the html tags and "common words" such as "in", "the", "a" into one file "tag_file.txt"; and print others into another file "content_file.txt"?
I use Bash scripts. I am a newbie in Bash field. Any help is highly appreciated!
For example: in the code snippt:
All the html tags such as "
" and "the" are all printed to "tag_file.txt" whereas the rest should be printed to "content_file.txt"
Code Snippet:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
3019337
story
the element is +
-
Open in New Window
Select All
Answer : Bash scripts: how to strip these html tags
Do you mean like this?
tr ' >' '\n' < file | grep '<\|\<\(in\|the\|a\)\>' > tag_file.txt
Random Solutions
Shutdown Batch File
Winword.exe
Adding IE Intranet Zone Via GPO
Dlookup in Access
Form looks great on all browsers except Firefox 3
Exchange POP collector not delivering messages from 2nd domain
windows cannot find the local profile and is logging you on with a temporary profile.
Login and Logout perl SQL IIS
Not able to print color to the ImageRunner c2880i
Two Exchange Servers, Split domain?