Your Digital Media Has Never Looked So Good

 
wlwest82
Topic Author
Posts: 2
Joined: Fri Feb 07, 2014 9:49 am

XML Orphaned Text

Fri Feb 07, 2014 10:06 am

I'm having trouble getting some text out of parsed XML. For the most part, everything I need is visible as I walk through the xml, but in the following instance, I can't get to the text. Let's say I have an object Parser (type roXMLElement), and a string xml which contains the following:
<xml>
    <tag1 class="class1">
        <a href="http://www.google.com">Google!</a>
         Some more text goes here
     </tag1>
</xml>

After parsing this as Parser.Parse(xml), Parser.tag1@class would return "class1", Parser.tag1.a@href would return "http://www.google.com", and Parser.tag1.a.getText() would return "Google!". The problem I'm having is getting to the "Some more text goes here". I thought that using Parser.tag1.getText() would return that, but it's returning empty. Am I doing this correctly? Is there a way to get to that text?
 
joetesta
Posts: 790
Joined: Wed Apr 20, 2011 11:48 am

Re: XML Orphaned Text

Tue Feb 11, 2014 4:55 pm

I bet you can get it if you put another wrapper around it; assuming you need to maintain html compatibility I'd use <span>

<xml>
    <tag1 class="class1">
        <a href="http://www.google.com">Google!</a>
         <span>Some more text goes here</span>
     </tag1>
</xml>


and see if you can get it with Parser.tag1.span.getText()
aspiring
 
EnTerr
** Valued Community Member **
Posts: 3834
Joined: Sun Jan 02, 2011 2:41 am

Re: XML Orphaned Text

Tue Feb 11, 2014 7:04 pm

joetesta wrote:
I bet you can get it if you put another wrapper around it; assuming you need to maintain html compatibility I'd use <span>

No, the question was how to work with the XML as given, not to mangle it server-side. The example is a well-formed XML (i fed it to validator just to be sure) and as such would parse and all content should be available. Question is, how to reach that "Some more text goes here" via roXML* APIs?

I subscribed the topic the other day, thinking there is obvious answer i can learn from. XML elements may contain any text or other elements, or mixture of text and elements in any order; that much i know. But how do we eat it? Time for a lifeline: Ask-the-Expert
 
User avatar
TheEndless
** Valued Community Member **
Posts: 9231
Joined: Mon Oct 04, 2004 10:15 am
Location: US
Contact:

Re: XML Orphaned Text

Tue Feb 11, 2014 7:23 pm

I don't think there's any way to get orphaned text. I tested this, and while it parses successfully, when you output it with xml.GenXml(False), the orphaned text is gone, so it seems the parser is losing it.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
 
EnTerr
** Valued Community Member **
Posts: 3834
Joined: Sun Jan 02, 2011 2:41 am

XML Offal Text

Tue Feb 11, 2014 7:58 pm

i just checked how this will be done in Python - it would be to ask for the .tail of the <a> element, like so:
>>> import xml.etree.ElementTree as et
>>> e = et.fromstring('<xml> <tag1 class="class1"> <a href="http://www.google.com">Google!</a> Some more text goes here </tag1> </xml>')
>>> e
<Element xml at 442968>
>>> e[0]
<Element tag1 at 4427d8>
>>> e[0][0]
<Element a at 4429b8>
>>> e[0][0].tail
' Some more text goes here '
>>> et.tostring(e[0][0])
'<a href="http://www.google.com">Google!</a> Some more text goes here '

Simplifying the BRS example even more:
BrightScript Debugger> x = CreateObject("roXMLElement")
BrightScript Debugger> x.parse("<xml> foo <tag> bar </tag> qux </xml>")
BrightScript Debugger> ? x.genXML(false)
<xml><tag> bar </tag></xml>
genXML should have reconstituted (more or less) the original but seems Foo and Qux have been lost in translation. Even Foo, that should've been the getText() to <xml>. Bugs?

PS. in sample python libraries, "foo" will be made .text to <xml/>, "qux" is .tail to <tag/>, all is preserved/accessible.

PPS. i can think of alternative representation too, in which <xml/> will have 3 children, [0] being the string "foo" (or element with empty getName and "foo" as getText), then [1] is <tag> as usual, [2] the "qux" text (string or another empty tag). Then ifXMLElement.getText() will have to be clarified to "returns the first text contained in the element". This is less hacky and more to the spirit of xml but likely requires more changes in parser and may surprise some existing BRS code that is very stuck up on the sequence list it gets from getChildElements().
Last edited by EnTerr on Sun Feb 16, 2014 1:06 pm, edited 1 time in total.
 
joetesta
Posts: 790
Joined: Wed Apr 20, 2011 11:48 am

Re: XML Orphaned Text

Tue Feb 11, 2014 10:15 pm

EnTerr wrote:
joetesta wrote:
I bet you can get it if you put another wrapper around it; assuming you need to maintain html compatibility I'd use <span>

No, the question was how to work with the XML as given, not to mangle it server-side.


It may be well formed XML and I may not have answered the question, but if you need this to happen now, I bet double or nothing my solution works.
Last edited by joetesta on Tue Feb 11, 2014 10:23 pm, edited 2 times in total.
aspiring
 
User avatar
gonzotek
** Valued Community Member **
Posts: 2206
Joined: Thu May 06, 2010 12:40 pm
Contact:

Re: XML Orphaned Text

Tue Feb 11, 2014 10:21 pm

joetesta wrote:
EnTerr wrote:
joetesta wrote:
I bet you can get it if you put another wrapper around it; assuming you need to maintain html compatibility I'd use <span>

No, the question was how to work with the XML as given, not to mangle it server-side.


It may be well formed XML and I may not have answered the question, but if you need this to happen now, I bet double or nothing my solution works.

Sure, if you have control over the server side. What if the XML is coming from some embedded device or other server you have no control over?
Remoku.tv - A free web app for Roku Remote Control!
Want to control your Roku from nearly any phone, computer or tablet? Get started at http://help.remoku.tv
by Apps4TV - Applications for television and beyond: http://www.apps4tv.com
 
wlwest82
Topic Author
Posts: 2
Joined: Fri Feb 07, 2014 9:49 am

Re: XML Orphaned Text

Tue Feb 11, 2014 10:23 pm

joetesta wrote:
I bet you can get it if you put another wrapper around it; assuming you need to maintain html compatibility I'd use <span>

<xml>
    <tag1 class="class1">
        <a href="http://www.google.com">Google!</a>
         <span>Some more text goes here</span>
     </tag1>
</xml>


and see if you can get it with Parser.tag1.span.getText()


I ended up working around my problem using essentially this approach. In my actual XML, there was a <br /> tag after the </a> which I didn't need, so I used a roRegex to find all <br /> instances and replace them with the following: </tag1><tag1 class="text">. After the replace all, my new code looked like this (minus all the nice formatting):
<xml>
    <tag1 class="class1">
        <a href="http://www.google.com">Google!</a>
    </tag1>
    <tag1 class="text">
         Some more text goes here
     </tag1>
</xml>


I was then able to get to the text I needed with Parser.tag1[1].getText(). I still would be interested in how this is supposed to be done using the xml api.

PS-I don't have control over the server-side output. In a bit of an "Aha!" moment, I thought of the approach that Joetesta suggested.
Last edited by wlwest82 on Tue Feb 11, 2014 10:28 pm, edited 1 time in total.
 
joetesta
Posts: 790
Joined: Wed Apr 20, 2011 11:48 am

Re: XML Orphaned Text

Tue Feb 11, 2014 10:24 pm

gonzotek wrote:
joetesta wrote:
It may be well formed XML and I may not have answered the question, but if you need this to happen now, I bet double or nothing my solution works.

Sure, if you have control over the server side. What if the XML is coming from some embedded device or other server you have no control over?


Then you'd be up a creek. Fortunately for wlwest82 that wasn't the case :)
aspiring
 
EnTerr
** Valued Community Member **
Posts: 3834
Joined: Sun Jan 02, 2011 2:41 am

Re: XML Offal Text

Sat Feb 15, 2014 1:56 pm

EnTerr wrote:
... Simplifying the BRS example even more:
BrightScript Debugger> x = CreateObject("roXMLElement")
BrightScript Debugger> x.parse("<xml> foo <tag> bar </tag> qux </xml>")
BrightScript Debugger> ? x.genXML(false)
<xml><tag> bar </tag></xml>
genXML should have reconstituted (more or less) the original but seems Foo and Qux have been lost in translation. Even Foo, that should've been the getText() to <xml>. Bugs?

Somebody with Roku* name, please respond: How are such text handled with roXML?

Here is another example snippet ( http://www.xmlnews.org/docs/xml-basics.html#elements ):
<p><person>Tony Blair</person> is <function>Prime Minister</function> of <location><country>Great Britain</country></location></p>

Image

The texts in question are " is " and " of ". Where them at, after roXMLelement.parse()?

PS. for real-life examples, try NITF format of the news industry. E.g. http://www.iptc.org/std/NITF/3.2/exampl ... ishing.xml , <body.content> element
 
EnTerr
** Valued Community Member **
Posts: 3834
Joined: Sun Jan 02, 2011 2:41 am

Re: XML Offal Text

Wed Feb 19, 2014 2:44 pm

bump!
here is yet another trivial example, will it blend?
http://oreilly.com/catalog/learnxml/cha ... ml#anatomy
 
EnTerr
** Valued Community Member **
Posts: 3834
Joined: Sun Jan 02, 2011 2:41 am

Re: XML Offal Text

Sun Jun 29, 2014 12:31 pm

Update: just noticed that a new method has sprung, ifXMLElement.GetChildNodes() - and it seems designed to address the issue from this thread.

Other changes have been made to roXMLElement too: parsing might have been improved so that additional texts don't get lost; getText() now seems to concatenate any such texts. An example is worth a thousand words:
BrightScript Debugger> x = CreateObject("roXMLElement")
BrightScript Debugger> x.parse("<xml> foo <tag> bar </tag> qux </xml>")
BrightScript Debugger> ? x.genXML(false)    'before, the output of this was not right
<xml> foo <tag> bar </tag> qux </xml>
BrightScript Debugger> ? x.getText()
 foo  qux
BrightScript Debugger> ? x.getChildElements()
<Component: roXMLElement>

BrightScript Debugger> ? x.getChildNodes()
 foo
<Component: roXMLElement>
 qux

It's good news. My hopes that somebody in RokuCo is listening are rekindled!

Unfortunately nobody thought of dropping a note here for us to know about the change nor did mention it in Release Notes.
Which firmware does the new method work in, shouldn't this be documented?

Who is online

Users browsing this forum: No registered users and 3 guests