JavaScript Extract all text from possibly nested <span>s on a webpage extract text from image,extract text from pdf,extract text

I have a webpage with a variety of text snippets enclosed in <span class="x"></span> tags. I'd like to generate an ordered list of each such snippet. Straightforward enough.

The wrinkle: It frequently occurs that there are additional <span class="x"> tags nested inside the outer ones, which I don't care about. Essentially, I want a list of every string that is within at least one <span class="x"> tag, but any additional nested such tags should be ignored and discarded.

Here is some example HTML:

<p>
  Outer text. <span class="x">Inside a single span.</span> Back to outer text once more. <span class="x"><span class="x">Inside two spans</span> or just one</span>. Perhaps a <span class="x">single span contains <span class="x">several</span> 
  <span class="x">nests</span>  <span class="x">within <span class="x">it</span>
  </span>!</span>
</p>
<span>Maybe there's a span out here.</span><span>(Or two.)</span>
<p>
  <table>
    <tr>
      <td>
        <span class="x">Or <span class="x">in</span><span class="x">here</span></span>.
      </td>
    </tr>
  </table>
</p>
<p>
  <span>No.</span>  <span>Still no, but<span class="x">yes</span>.</span>
</p>

along with my desired output:

[ "Inside a single span.",
  "Inside two spans or just one",
  "single span contains several nests within it!",
  "Maybe there's a span out here.",
  "(Or two.)",
  "Or inhere",
  "yes" ]

Specific features of this example I'd like to call attention to:

  • The outermost span can occur at any depth within the larger HTML document.
  • The spans can be nested arbitrarily deep. (Though in practice I haven't found any instances with more than 3 or 4 layers so far)
  • There may or may not be whitespace between neighboring outer spans; I'd like their contents parsed as separate strings either way.
  • Span tags without class "x" are not desired.
  • There may or may not be whitespace between neighboring inner tags; I'd like to preserve this as-is.
  • I do not anticipate any <span class="x"> tag containing any HTML tags other than additional nested <span class="x"> tags.

I would be happy with a JavaScript + jQuery solution, or a Python3 + BeautifulSoup solution, or something else entirely if it is sufficiently better suited to the task at hand than either of those.

Answer:1

You can get a complete list of text in JavaScript via simple jQuery statement:

$("span.x").map(function(e) {return $(this).text() == "" ? null : $(this).text()})

It's up to you how to use it.

Answer:2

first get the top most spans with classx but checking it doesnt have a parent with class x. then get the innerText of these.

var topMost = $('span.x').filter(function() {
  return !$(this).parents('.x').length;
});

var texts = topMost.map(function() {
  return this.innerText;
});

console.log(texts);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<p>
  Outer text. <span class="x">Inside a single span.</span> Back to outer text once more. <span class="x"><span class="x">Inside two spans</span> or just one</span>. Perhaps a <span class="x">single span contains <span class="x">several</span> 
  <span class="x">nests</span>  <span class="x">within <span class="x">it</span>
  </span>!</span>
</p>
<span>Maybe there's a span out here.</span><span>(Or two.)</span>
<p>
  <table>
    <tr>
      <td>
        <span class="x">Or <span class="x">in</span><span class="x">here</span></span>.
      </td>
    </tr>
  </table>
</p>
<p>
  <span>No.</span> <span>Still no, but<span class="x">yes</span>.</span>
</p>
Answer:3

Replacing inner span tags with blanks should do the job :

var st = [];
$("span.x").map(function(e) {
    st.push($(this).html().replace('<span class="x">','').replace('</span>',''));
});

console.log(st);

This is a bit dirty but you get the idea

Answer:4

Try:

$('span.x').each(function(index, el) {
console.log(el.childNodes[0].textContent)
});

or

$('span.x').each(function(index, el) {
 $(el).text();
});

this is ofcourse jquery example. It will list in console all your spans text values.

Simply build with this snippet your ordered list.

Answer:5

not as elegant as the other solutions...

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

spans = soup.find_all('span', {'class':'x'})

children = []
for span in spans:
    chilren.extend(span.findChildren())

children = [child.text for child in children]

results = [span.text for span in spans if span.text not in children]
Answer:6

JS solution:

function detect(elem, rettext=false){
var answer=[];
//loop trough childs
for(i=0;i<elem.childNodes.length;i++){
  e=elem.childNodes[i];
  if(e.nodeType==3&&rettext){
      //elems child is direct x child+text so lets add it
      answer.push(e.textContent);
  }else{
  //elems child is an element so lets loop trough
  if( (" " + e.className + " ").replace(/[\n\t]/g, " ").indexOf(" x ") > -1 ){
       //e is x so lets get direct childs and create one string
 answer.push(detect(e,true).join(""));
     }else{
     //not x so lets loop trough and return array

     a=detect(e);
     for(b=0;b<a.length;b++){
     answer.push(a[b]);
     }
     }
     }
     }
     return answer;
  }



 //start if window loaded
  window.onload=()=>{
  theansweris=detect(document.body);
  }

This function loops trough all elements of the html tree. If one of the elements is an x class, all the inner results are joined, and the direct textNodes are added

Note: This uses ES6. If you dont know what that is please write a comment, so i explain it to you

Answer:7

I'm trying to make a super simple lightbox for some images. Basically I have a div with a set of images showing up using ng-repeat, and I want to have a lightbox div show up when one of the images is ...

I'm trying to make a super simple lightbox for some images. Basically I have a div with a set of images showing up using ng-repeat, and I want to have a lightbox div show up when one of the images is ...

  1. angular show hide
  2. angular show hide div
  3. angular show hide div on click
  4. angular show hide component
  5. angular show hide element
  6. angular show hide password
  7. angular show hide animation
  8. angular show hide on click
  9. angular show hide element on click
  10. angular show hide button
  11. angular show hide div on button click
  12. angular show hide div on checkbox
  13. angular show hide component on click
  14. angular show hide directive
  15. angular show hide div with animation
  16. angular show hide form control
  17. angular show hide based on variable
  18. angular show hide menu
  19. angular show hide div on click with animation
  20. angular show hide sidebar

I have AdobeEdge and I need to wrap it inside React.js component. Component's code: let ModalWithAnimation = React.createClass({ render() { return ( <div className="modal-...

I have AdobeEdge and I need to wrap it inside React.js component. Component's code: let ModalWithAnimation = React.createClass({ render() { return ( <div className="modal-...

According to this The syntax for adding new elements to the page is easy, so it's tempting to forget that there's a huge performance cost for adding to the DOM repeatedly. If you're adding many ...

According to this The syntax for adding new elements to the page is easy, so it's tempting to forget that there's a huge performance cost for adding to the DOM repeatedly. If you're adding many ...

I'm new to JS and JQuery, and I'm implementing a simple web app: I used 16 pictures to make a map (4x4 grid). When the user clicks or releases the mouse, the console prints out the mouse's X and Y ...

I'm new to JS and JQuery, and I'm implementing a simple web app: I used 16 pictures to make a map (4x4 grid). When the user clicks or releases the mouse, the console prints out the mouse's X and Y ...