Ticket #62 (closed enhancement: wontfix)

Opened 4 years ago

Last modified 3 years ago

StreamListener: parser_stream: halt processing of the stream part way through

Reported by: dcgrigsby Owned by: ser
Priority: normal Milestone: 3.1.6
Component: Stream Version: 3.1.3
Severity: normal Keywords:
Cc: jmcauley Ruby version: 1.8.2
Operating system: Linux

Description

I'd like to be able to signal parser_stream to halt processing of the stream part way through.

Perhaps the methods could return a false value to indicate that the parser ought not continue.

Thanks! REXML is fantastic. It's a pleasure to program with.

Change History

Changed 4 years ago by dcgrigsby

I was thinking: I could probably just raise an exception from my StreamListener? and then handle it up in the code that called parse_stream, couldn't I?

Changed 4 years ago by ser

  • status changed from new to assigned

Perhaps. The exception may be caught in the parsing code somewhere, but that's a good idea. I'll also add a "nicer" option via the API.

Changed 4 years ago by ser

John McAuley came up with an intriguing solution: allow the listener methods (or blocks) to return false, and stop when the parser sees it. To make this backward compatible, it must be false (not nil).

This works better for the StreamParser? than SAX2, because SAX2 uses blocks, and there are problems with using return in in procs, but it is still manageable, if awkward. It also avoids the whole encapsulation problem; again, this is only a problem with SAX2, in that the break function would have to be on the parser, and the blocks would have to have access to the parser to call break, which is sort of ugly. With StreamParser?, we could just define a stop? on the Listener API and have the parser check that on each Listener. On the other hand, that's more API for the users to have to implement, and it is much easier to just allow them to return false.

The more I think about it, the better it sounds. I don't think it'll incur any significant additional penalty over using break, and it seems to be cleaner.

Changed 4 years ago by jmcauley

  • cc jmcauley added

Changing the behaviour of the return value would probably break a lot of people's code because they didn't need to care what their handlers returned before. So it could be pot luck what they return.

For example, this code works OK if the parser ignores the return value but quits prematurely if the first name found is not 'section'.

def tag_start name, attrs
  if name=='section' 
    @id = attrs['id'] if attrs['type']=='fragment'
  end
end

If I want it to quit the second it gets the section, but not before :-), then I have to change the code to ensure that I return the correct value:

def tag_start name, attrs
  if name=='section' 
    @id = attrs['id'] if attrs['type']=='fragment'
    return false
  end

  return true
end

Personally I don't care as I haven't written enough code to worry about it but other people's mileage may vary.

Maybe the parser could be envoked or constructed with a flag that tells it if it should examine the return value. Or just change the behaviour, document the change, and go for bust.

Its your call :-)

Changed 4 years ago by ser

It is possible that the last call in code could result in false, but I'm not convinced that this would be common. It isn't in the example you provide:

def tag_start name, attrs
  if name=='section' 
    @id = attrs['id'] if attrs['type']=='fragment'
  end
end

If name does not equal 'section', tag_start() evaluates to nil, not false. The REXML parser code would be something like:

  # got a tag start ... 
  stop_parsing = false
  for listener in @listeners
    stop_parsing |= ( listener.tag_start( name, attrs ) == false )
  end
  # exit if stop_parsing == true

This would allow all listeners to get the event and quit parsing before the next event. The only time this would have unwanted effects is in listener code that evaluates to false -- that is, the last statement evaluates to false. I'm not sure that's common.

Changed 4 years ago by jmcauley

I see what you mean about nil.

What does it mean when one of many listeners returns false?

Does it mean 'I don't want to listen anymore'? or does it mean 'nobody wants to listen anymore'?

Should the distiction be made between ending the parse 'session' and ending a listener's involvement in the parse session?

Changed 3 years ago by ser

  • status changed from assigned to closed
  • resolution set to wontfix

After giving this much thought, I've determined that the best solution is to not implement a special stopping mechanism. My reasons are these:

  1. Adding a stopping mechanism would introduce an additional branch to each parse event, which would have a negative impact on parsing speed.
  2. Wanting to stop parsing is probably the unusual case, rather than the common case.
  3. It doesn't make much sense to slow the parser for the common case, when a solution (throwing an exception to stop parsing) currently exists.
  4. Java's SAX parser behaves the same way: if you want to break parsing, you throw an exception; there is no other mechanism. While this fact doesn't influence me much, it does mean that there's some additional orthogonality between the Java and Ruby parsers.

If anybody strongly objects, I can re-open this.

Note: See TracTickets for help on using tickets.