I'm hitting some font/tofu problems using the POI library to render a PNG image from XSLFSlide, and it looks to be a latin/ea/cs precedence issue (or lack thereof). Specifically, it appears as though POI 3.15 in the XSLFTextRun.getFontFamily() method is *ONLY* looking for the <latin typeface="foo"> elements, and ignoring the <ea> and <cs> typeface elements. This is a problem for some PPTX files I'm converting, which have 'ambiguous' typeface declarations.

To wit, a pptx slide (from the pptx zip archive: ppt/slides/slide1.xml):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:cSld>
...yadda...
<p:txBody>

<a:p>
<a:pPr>
<a:lnSpc>
<a:spcPct val="100000"/>
</a:lnSpc>
</a:pPr>
<a:r>
<a:rPr lang="en-US" sz="2800">
<a:solidFill>
<a:srgbClr val="44546a"/>
</a:solidFill>
<a:latin typeface="Comic Sans MS"/>
<a:ea typeface="宋体"/>
</a:rPr>
<a:t>B</a:t>
</a:r>
<a:r>
<a:rPr lang="en-US" sz="2800">
<a:solidFill>
<a:srgbClr val="44546a"/>
</a:solidFill>
<a:latin typeface="Comic Sans MS"/>
<a:ea typeface="宋体"/>
</a:rPr>
<a:t>.书对桌面的压力与桌面对书的支持力</a:t>
</a:r>
<a:endParaRPr/>
</a:p>
</p:txBody>
...etc...
Here the <a:t> text of the second <a:r> text run is definitely NOT comic sans, and that text run is specifying both <a:latin> and <a:ea> elements. In this case, the font family would necessarily be the "宋体" font, but, as I referenced, the XSLFTextRun.getFontFamily() only considers the <a:latin> elements.

for completeness, i'm using the POI library to render the png essentially like this:

org.apache.poi.xslf.usermodel.XSLFSlide slide = ...
java.awt.BufferedImage image = new java.awt.BufferedImage(getWidth(), getHeight(), BufferedImage.TYPE_INT_ARGB);
java.awt.Graphics2D graphics2D = image.createGraphics();
slide.draw(graphics2D);
javax.imageio.ImageIO.write(image, "PNG", new File("/tmp/sad_panda"));
Doing some digging on the microsoft forums, I ran across this gem: https://social.msdn.microsoft.com/Fo...=os_binaryfile

which eventually states:
>>I guess this is the question. How do you figure out weither it's latin/ea/cs from the characers? Is it just looking for a particular set of codepages based on all the text, some of the text, or something else? Is it mulitple codepages?

...We use UNICODE sub ranges + some Windows APIs to decide this....

perhaps reading too much into that msdn post, but it definitely reads to me like there needs to be more logic around the POI 'XSLFTextRun.getFontFamily()' call. At a minimum it needs to look for the 'ea' or 'cs' elements if 'latin' is not present, but preferable there should be some sort of java.awt.Font.canDisplay() or java.awt.Font.canDisplayUpTo() call to ensure the text run text can be rendering using a specified font.

To throw some gas on the fire, the pptx file renders correctly (ie, sans tofu) in libre office, but I haven't gone digging through that codebase to determine how they are determining the correct font.

rfc