Recently, I need to convert some MathML codes in article metadata from SCOAP3 to LaTex format. Most of institutional repositories escapes XML entities, so MathML doesn't render correctly. I tried the Wiris' API but it's very slow and give errors in most of long formulas.
Finally, I found Yaroshevich's XSL Schema that works without problem.
Example Python code:
import lxml.etree as ET
def to_latex(text):
""" Remove TeX codes in text"""
text = re.sub(r"(\$\$.*?\$\$)", " ", text)
""" Find MathML codes and replace it with its LaTeX representations."""
mml_codes = re.findall(r"(<math.*?<\/math>)", text)
for mml_code in mml_codes:
mml_ns = mml_code.replace('<math>', '<math xmlns="http://www.w3.org/1998/Math/MathML">') #Required.
mml_dom = ET.fromstring(mml_ns)
xslt = ET.parse("mmltex/mmltex.xsl")
transform = ET.XSLT(xslt)
mmldom = transform(mml_dom)
latex_code = str(mml_dom)
text = text.replace(mml_code, latex_code)
return text
Top comments (2)
Hello Furkan, Ignore my previous comment as the code works fine besides a small mistake. Right where you are transforming ----> mmldom = transform(mml_dom) you passed in the same variable "mmldom" into the convert to string function ----> latex_code = str(mml_dom)