Jsoup Examples

  • Post author:
  • Post category:Jsoup
  • Post comments:1 Comment
Jsoup Examples

In this guide we will discuss about Examples of Jsoup. There are given a lot of jsoup examples such as getting title, total links, total images and meta data of an URL or HTML document.

Get title of URL

  1. Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  2. String title = doc.title();  

Let’s see the jsoup example to print title of an url e.g. www.Adglob.in. By the help of Jsoup.connect() method, we will connect with the URL. The get() method returns the reference of Document object. The document class provides title() method that returns the title of the document.

  1. import java.io.IOException;  
  2. import org.jsoup.Jsoup;  
  3. import org.jsoup.nodes.Document;  
  4. public class FirstJsoupExample{  
  5.     public static void main( String[] args ) throws IOException{  
  6.                 Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  7.                 String title = doc.title();  
  8.                 System.out.println(“title is: ” + title);  
  9.     }  
  10. }  

Output:

title is: Adglob - A Solution of all Technology

Get title from HTML file

  1. Document doc = Jsoup.parse(new File(“e:\\register.html”),”utf-8″);//assuming register.html file in e drive  
  2. String title = doc.title();  

In this example, we will get the title of the HTML page from the HTML file. To do so, we are going to call Jsoup.parse() method that returns the reference of Document. The title() method of Document class returns the title of the HTML document.

  1. import java.io.File;  
  2. import java.io.IOException;  
  3. import org.jsoup.Jsoup;  
  4. import org.jsoup.nodes.Document;  
  5. public class JsoupPrintTitlefromHtml{  
  6.     public static void main( String[] args ) throws IOException{  
  7.                 Document doc = Jsoup.parse(new File(“e:\\register.html”),”utf-8″);  
  8.                 String title = doc.title();  
  9.                 System.out.println(“title is: ” + title);  
  10.     }  
  11. }  

Output:

title is: Please Register

Get total links of URL

  1. Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  2. Elements links = doc.select(“a[href]”);  
  3. for (Element link : links) {  
  4.     System.out.println(“\nlink : ” + link.attr(“href”));  
  5.     System.out.println(“text : ” + link.text());  
  6. }  

In this example, we will print the total links of an URL. To do so, we are going to call select() method of Document class that returns the reference of Elements. The Elements class have elements that can be traversed by for-each loop. The Element class provides attr() and text() methods to return link and text of the link.

  1. import java.io.IOException;  
  2. import org.jsoup.Jsoup;  
  3. import org.jsoup.nodes.Document;  
  4. import org.jsoup.nodes.Element;  
  5. import org.jsoup.select.Elements;  
  6. public class JsoupPrintLinks {  
  7.      public static void main( String[] args ) throws IOException{  
  8.             Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  9.             Elements links = doc.select(“a[href]”);  
  10.             for (Element link : links) {  
  11.                 System.out.println(“\nlink : ” + link.attr(“href”));  
  12.                 System.out.println(“text : ” + link.text());  
  13.             }  
  14. }  
  15. }  

Output:

link : http://www.Adglob.in/contribute-us
text : Contribute Us

link : http://www.Adglob.in/asknewquestion.jsp
text : Ask Question

link : http://www.Adglob.in/login.jsp
text : login

.....

Get meta information of URL

  1. Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  2. String keywords = doc.select(“meta[name=keywords]”).first().attr(“content”);  
  3. System.out.println(“Meta keyword : ” + keywords);  
  4. String description = doc.select(“meta[name=description]”).get(0).attr(“content”);  
  5. System.out.println(“Meta description : ” + description);  

In this example, we will print the meta keywords and description of an URL. To do so, you need to call select(), first(), get() and attr() methods of Document class.

  1. import java.io.IOException;  
  2. import org.jsoup.Jsoup;  
  3. import org.jsoup.nodes.Document;  
  4. public class JsoupPrintMetadata {  
  5.      public static void main( String[] args ) throws IOException{  
  6.             Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  7.               
  8.             String keywords = doc.select(“meta[name=keywords]”).first().attr(“content”);  
  9.             System.out.println(“Meta keyword : ” + keywords);  
  10.             String description = doc.select(“meta[name=description]”).get(0).attr(“content”);  
  11.             System.out.println(“Meta description : ” + description);  
  12. }  
  13. }  

Output:

Meta keyword : jsoup, chapter, beginners, professionals, introduction, example, java, html, parser
Meta description : Jsoup chapter for beginners and professionals provides html parsing facility 
in java with examples of printing title, links, images, form elements from url.

Get total images of URL

  1. Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  2. Elements images = doc.select(“img[src~=(?i)\\.(png|jpe?g|gif)]”);  
  3. for (Element image : images) {  
  4.     System.out.println(“src : ” + image.attr(“src”));  
  5.     System.out.println(“height : ” + image.attr(“height”));  
  6.     System.out.println(“width : ” + image.attr(“width”));  
  7.     System.out.println(“alt : ” + image.attr(“alt”));  
  8. }  

In this example, we will print the total images of an URL. To do so, we are calling select() method passing “img[src~=(?i)\\.(png|jpe?g|gif)]” as a parameter so that it can print png, jpeg or gif images.

  1. import org.jsoup.Jsoup;  
  2. import org.jsoup.nodes.Document;  
  3. import org.jsoup.nodes.Element;  
  4. import org.jsoup.select.Elements;  
  5. public class JsoupPrintImages {  
  6.      public static void main( String[] args ) throws IOException{  
  7.             Document doc = Jsoup.connect(“http://www.Adglob.in”).get();  
  8.             Elements images = doc.select(“img[src~=(?i)\\.(png|jpe?g|gif)]”);  
  9.             for (Element image : images) {  
  10.                 System.out.println(“src : ” + image.attr(“src”));  
  11.                 System.out.println(“height : ” + image.attr(“height”));  
  12.                 System.out.println(“width : ” + image.attr(“width”));  
  13.                 System.out.println(“alt : ” + image.attr(“alt”));  
  14.             }  
  15.        
  16. }  
  17. }  

Output:

src : http://www.Adglob.in/images/social/r.png
height : 
width : 
alt : RSS Feed
src : http://www.Adglob.in /images/social/m.png
height : 
width : 
alt : Subscribe to Get Email Alerts
src : http://www.Adglob.in/images/social/f.png
height : 
width : 
alt : Facebook Page
src : http://www.Adglob.in/images/social/g.png
height : 
width : 
alt : Google Page
src : http://www.Adglob.in/images/social/t.png
height : 
width : 
alt : Twitter Page
src : images/logo/javahome.png
height : 
width : 
alt : Java chapter
src : images/logo/javascripthome.png
height : 
width : 
alt : JavaScript chapter
src : images/logo/sqlhome.png
height : 
width : 
alt : SQL chapter
src : images/logo/androidhome.png
height : 
width : 
alt : Android chapter
src : images/logo/clanguagehome.png
height : 
width : 
alt : C Language chapter
src : images/logo/html-chapter.png
height : 
width : 
alt : html chapter
src : images/logo/pythonhome.png
height : 
width : 
alt : Python chapter
src : images/logo/ajaxhome.png
height : 
width : 
alt : AJAX chapter
src : images/logo/cloudhome.png
height : 
width : 
alt : Cloud chapter
src : images/logo/javahome.png
height : 
width : 
alt : Core Java chapter
src : images/logo/javahome.png
height : 
width : 
alt : Java Servlet chapter
src : images/logo/jsphome.png
height : 
width : 
alt : Java JSP chapter
src : images/logo/javahome.png
height : 
width : 
alt : EJB chapter
src : images/logo/javahome.png
height : 
width : 
alt : JAXB chapter
src : images/logo/strutshome.png
height : 
width : 
alt : Struts chapter
src : images/logo/hibernatehome.png
height : 
width : 
alt : Hibernate chapter
src : images/logo/springhome.png
height : 
width : 
alt : Spring chapter
src : images/logo/javahome.png
height : 
width : 
alt : Java Mail chapter
src : images/logo/javahome.png
height : 
width : 
alt : Java Design Pattern chapter
src : images/logo/javahome.png
height : 
width : 
alt : JUnit chapter
src : images/logo/strutshome.png
height : 
width : 
alt : Maven chapter
src : images/logo/interviewhome.png
height : 
width : 
alt : Interview Questions
src : images/logo/projecthome.png
height : 
width : 
alt : Free Projects
src : images/logo/forumhome3.png
height : 
width : 
alt : Forum chapter
src : images/logo/quizhome.png
height : 
width : 
alt : Online quiz
src : images/logo/javacompiler.png
height : 
width : 
alt : Online java compiler
src : images/sonoo9.jpg
height : 
width : 
alt : sonoo jaiswal
src : http://www.Adglob.in/images/social/rss1.png
height : 
width : 
alt : RSS Feed
src : http://www.Adglob.in/images/social/mail1.png
height : 
width : 
alt : Subscribe to Get Email Alerts
src : http://www.Adglob.in/images/social/facebook1.jpg
height : 
width : 
alt : Facebook Page
src : http://www.Adglob.in/images/social/google1.png
height : 
width : 
alt : Google Page
src : http://www.Adglob.in/images/social/twitter1.png
height : 
width : 
alt : Twitter Page
src : http://www.Adglob.in/images/social/blog.png
height : 
width : 
alt : Blog Page
src : http://images.dmca.com/Badges/dmca_protected_sml_120c.png
?ID=e8b533d5-7356-47f5-820b-72c890f03a4e
height : 
width : 
alt : DMCA.com

Get form parameters

  1. Document doc = Jsoup.parse(new File(“e:\\register.html”),”utf-8″);  
  2. Element loginform = doc.getElementById(“registerform”);  
  3.   
  4. Elements inputElements = loginform.getElementsByTag(“input”);  
  5. for (Element inputElement : inputElements) {  
  6.     String key = inputElement.attr(“name”);  
  7.     String value = inputElement.attr(“value”);  
  8.     System.out.println(“Param name: “+key+” \nParam value: “+value);  
  9. }  

In this example, we will print form parameters like parameter name and parameter value. To do so, we are calling getElementById() method of Document class and getElementsByTag() method of Element class.register.html

  1. <!DOCTYPE html>  
  2. <html>  
  3. <head>  
  4. <meta charset=”utf-8″>  
  5. <title>Register Please</title>  
  6. </head>  
  7. <body>  
  8. <form id=”registerform” action=”register.jsp” method=”post”>  
  9. Name:<input type=”text” name=”name” value=”sonoo”/><br/>  
  10. Password:<input type=”password” name=”password” value=”sj”/><br/>  
  11. Email:<input type=”email” name=”email” value=”sonoojaiswal1987@gmail.com”/><br/>  
  12. <input name=”submitbutton” type=”submit” value=”register”/>  
  13. </form>  
  14. </body>  
  15. </html>  

JsoupPrintFormParameters.java

  1. import java.io.File;  
  2. import java.io.IOException;  
  3. import org.jsoup.Jsoup;  
  4. import org.jsoup.nodes.Document;  
  5. import org.jsoup.nodes.Element;  
  6. import org.jsoup.select.Elements;  
  7. public class JsoupPrintFormParameters {  
  8. public static void main(String[] args) throws IOException {  
  9.     Document doc = Jsoup.parse(new File(“e:\\register.html”),”utf-8″);  
  10.     Element loginform = doc.getElementById(“registerform”);  
  11.    
  12.     Elements inputElements = loginform.getElementsByTag(“input”);  
  13.     for (Element inputElement : inputElements) {  
  14.         String key = inputElement.attr(“name”);  
  15.         String value = inputElement.attr(“value”);  
  16.         System.out.println(“Param name: “+key+” \nParam value: “+value);  
  17.     }  
  18. }  
  19. }  

Output:

Param name: name 
Param value: sonoo
Param name: password 
Param value: sj
Param name: email 
Param value: sonoojaiswal1987@gmail.com
Param name: submitbutton 
Param value: register

Learn More : Click Here

This Post Has One Comment

Leave a Reply