<?xml version="1.0" encoding="UTF-8"?>        <rss version="2.0"
             xmlns:content="http://purl.org/rss/1.0/modules/content/"
             xmlns:dc="http://purl.org/dc/elements/1.1/"
             xmlns:atom="http://www.w3.org/2005/Atom"
             xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
                          >
            <channel>
                <title>Comments on: Parsing Web Pages</title>
                <atom:link href="http://www.ussysadmin.com/2009/10/parsing-web-pages/feed/" rel="self" type="application/rss+xml" />
                <link>http://www.ussysadmin.com/2009/10/parsing-web-pages/</link>
                <description>System Administration Weblog</description>
                <lastBuildDate>Sat, 12 Jun 2010 21:28:54 +0000</lastBuildDate>
                <sy:updatePeriod>hourly</sy:updatePeriod>
                <sy:updateFrequency>1</sy:updateFrequency>
                <generator>http://wordpress.org/?v=3.3.1</generator>
                                        <item>
                            <title>By: Jon</title>
                            <link>http://www.ussysadmin.com/2009/10/parsing-web-pages/comment-page-1/#comment-457</link>
                            <dc:creator>Jon</dc:creator>
                            <pubDate>Mon, 26 Oct 2009 06:00:59 +0000</pubDate>
                            <guid isPermaLink="false">http://www.ussysadmin.com/?p=137#comment-457</guid>
                                                            <description>I work for a company that is trying to scrape several sites to get their pricing competitive with everyone else online. I spent a good month learning the programs out there and using them until I found one that actually worked. I am very new when it comes to &lt;strong&gt;web scraping&lt;/strong&gt; but not only did I find something that worked, it blew me way. These guys have a great vision making the system open enough to have the data harvested and then sent back to the user via REST, FTP, or email. Try mozenda.com for your web scraper I think you will be very happy. I wish I had read this before spending so much time trying the other ones.

In full disclosure I have to say that I Invested in this company (Mozenda Inc.) after trying their tool. The down side is that they charged a monthly fee.

Hope this helps,
Jon</description>
                                <content:encoded><![CDATA[<p>I work for a company that is trying to scrape several sites to get their pricing competitive with everyone else online. I spent a good month learning the programs out there and using them until I found one that actually worked. I am very new when it comes to <strong>web scraping</strong> but not only did I find something that worked, it blew me way. These guys have a great vision making the system open enough to have the data harvested and then sent back to the user via REST, FTP, or email. Try mozenda.com for your web scraper I think you will be very happy. I wish I had read this before spending so much time trying the other ones.</p>
<p>In full disclosure I have to say that I Invested in this company (Mozenda Inc.) after trying their tool. The down side is that they charged a monthly fee.</p>
<p>Hope this helps,<br />
Jon</p>
]]></content:encoded>
                                                    </item>
                                            <item>
                            <title>By: rsmith</title>
                            <link>http://www.ussysadmin.com/2009/10/parsing-web-pages/comment-page-1/#comment-450</link>
                            <dc:creator>rsmith</dc:creator>
                            <pubDate>Sun, 18 Oct 2009 19:15:35 +0000</pubDate>
                            <guid isPermaLink="false">http://www.ussysadmin.com/?p=137#comment-450</guid>
                                                            <description>&lt;p&gt;You can do Python Parsing. Parsing web pages using python, which is a newer programming language than php and perl, can give you great functional control of the output you are looking for.&lt;/p&gt;
&lt;p&gt;There has been a number of Python tools out there that can help parsing out data structures from a web page, log file or general text files. Just to name a few of these tools:&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;a target=&quot;_blank&quot; href=&quot;http://www.lava.net/~newsham/pyggy/&quot; rel=&quot;nofollow&quot;&gt;PyGgy&lt;/a&gt;&lt;br&gt;
&lt;/b&gt;PyGgy is a python package for generating parsers and lexers in python. The PyGgy distribution contains two tools: &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;PyLly - (Pronounced &quot;pile-ey&quot;) A lexer generator that generates DFA tables for lexing tokens. &lt;/li&gt;
	&lt;li&gt;PyGgy - (Pronounced &quot;piggy&quot;) A parser generator that generates SLR tables for a GLR parsing engine.
	&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;&lt;a target=&quot;_blank&quot; href=&quot;http://pyparsing.wikispaces.com/&quot; rel=&quot;nofollow&quot;&gt;Pyparsing&lt;/a&gt;&lt;/b&gt;&lt;br&gt;
pyparsing is a general parsing module for Python. Grammars are implemented directly in the client code using parsing objects, instead of externally, as with lex/yacc-type tools. Includes simple examples for parsing SQL, CORBA IDL, and 4-function math.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;a target=&quot;_blank&quot; href=&quot;http://github.com/brehaut/picoparse&quot; rel=&quot;nofollow&quot;&gt;picoparse&lt;/a&gt;&lt;/b&gt;&lt;br&gt;
Picoparse is a very small parser / scanner library for Python. It is built to make constructing parsers straight forward, and without the complications regular expressions bring to the table.&lt;/p&gt;
&lt;p&gt;Parsing web pages with Python does require a pretty good understanding of the Python programming language but if you already know it then getting the results should not be a big hassle.&lt;/p&gt;
</description>
                                <content:encoded><![CDATA[<p>You can do Python Parsing. Parsing web pages using python, which is a newer programming language than php and perl, can give you great functional control of the output you are looking for.</p>
<p>There has been a number of Python tools out there that can help parsing out data structures from a web page, log file or general text files. Just to name a few of these tools:</p>
<p><b><a target="_blank" href="http://www.lava.net/~newsham/pyggy/" rel="nofollow">PyGgy</a><br />
</b>PyGgy is a python package for generating parsers and lexers in python. The PyGgy distribution contains two tools: </p>
<ul>
<li>PyLly &#8211; (Pronounced &quot;pile-ey&quot;) A lexer generator that generates DFA tables for lexing tokens. </li>
<li>PyGgy &#8211; (Pronounced &quot;piggy&quot;) A parser generator that generates SLR tables for a GLR parsing engine.
	</li>
</ul>
<p><b><a target="_blank" href="http://pyparsing.wikispaces.com/" rel="nofollow">Pyparsing</a></b><br />
pyparsing is a general parsing module for Python. Grammars are implemented directly in the client code using parsing objects, instead of externally, as with lex/yacc-type tools. Includes simple examples for parsing SQL, CORBA IDL, and 4-function math.</p>
<p><b><a target="_blank" href="http://github.com/brehaut/picoparse" rel="nofollow">picoparse</a></b><br />
Picoparse is a very small parser / scanner library for Python. It is built to make constructing parsers straight forward, and without the complications regular expressions bring to the table.</p>
<p>Parsing web pages with Python does require a pretty good understanding of the Python programming language but if you already know it then getting the results should not be a big hassle.</p>
]]></content:encoded>
                                                    </item>
                                            <item>
                            <title>By: frankg</title>
                            <link>http://www.ussysadmin.com/2009/10/parsing-web-pages/comment-page-1/#comment-449</link>
                            <dc:creator>frankg</dc:creator>
                            <pubDate>Sun, 18 Oct 2009 18:32:05 +0000</pubDate>
                            <guid isPermaLink="false">http://www.ussysadmin.com/?p=137#comment-449</guid>
                                                            <description>I looked at that web parsing program you mentioned and it is a huge time saver. In the past I have used the java based web page parser library from sourceforge. Worked very well for me but you must know or understand java. Here is a quote from their webpage: &quot;HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.&quot;

For your readers to review the HTML Parser program, here&#039;s the link. 
http://htmlparser.sourceforge.net/</description>
                                <content:encoded><![CDATA[<p>I looked at that web parsing program you mentioned and it is a huge time saver. In the past I have used the java based web page parser library from sourceforge. Worked very well for me but you must know or understand java. Here is a quote from their webpage: &#8220;HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.&#8221;</p>
<p>For your readers to review the HTML Parser program, here&#8217;s the link.<br />
<a href="http://htmlparser.sourceforge.net/" rel="nofollow">http://htmlparser.sourceforge.net/</a></p>
]]></content:encoded>
                                                    </item>
                    
                            </channel>
        </rss>
        
