Microsoft
Software
Hardware
Network
Question : Bash scripts: how to strip these html tags
Hi experts,
Here is a html file. How can I print the html tags and "common words" such as "in", "the", "a" into one file "tag_file.txt"; and print others into another file "content_file.txt"?
I use Bash scripts. I am a newbie in Bash field. Any help is highly appreciated!
For example: in the code snippt:
All the html tags such as "
" and "the" are all printed to "tag_file.txt" whereas the rest should be printed to "content_file.txt"
Code Snippet:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
3019337
story
the element is +
-
Open in New Window
Select All
Answer : Bash scripts: how to strip these html tags
Do you mean like this?
tr ' >' '\n' < file | grep '<\|\<\(in\|the\|a\)\>' > tag_file.txt
Random Solutions
Printing Problem in IE 6.0 - Corrupted Print jobs - Spanning Many Pages
Outlook to remote Exchange not displaying Public Folders
Conditional formatting for multiple values in Access Datasheet
Taskbar icons hide. quick launch and language bar too big
Close All Applications Command
korn shell pattern match 3 expressions using grep
Windows installed on wrong letter drive
Sending performance counter/monitor alerts by email
Webpage Hihacked
Wordpress Theme - Header Switcheroo