This is a quick and dirty example of using a regular expression to remove XML tags from an an XML file.
Suppose we have the following XML, sample.xml:
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
What we want to do is use Python to strip out the XML element tags, so that we're left with something like this:
Tove Jani Reminder Don't forget me this weekend!
Here's how to do it:
import re text = re.sub('<[^<]+>', "", open("sample.xml").read()) with open("output.txt", "w") as f: f.write(text)