<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Open Government and PDF</title>
	<atom:link href="http://shebanator.com/2009/11/02/open-government-and-pdf/feed/" rel="self" type="application/rss+xml" />
	<link>http://shebanator.com/2009/11/02/open-government-and-pdf/</link>
	<description>Thoughts on Dynamic Languages, Web Apps, and more</description>
	<lastBuildDate>Wed, 04 Nov 2009 18:42:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: bowerbird</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2951</link>
		<dc:creator><![CDATA[bowerbird]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 18:42:52 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2951</guid>
		<description><![CDATA[oh, and by the way, it&#039;s not just the government sector
which is using .pdf in a careless and stupid manner...

the academic world -- another place where reusability
should triumph -- has been equally and sadly myopic.

the business segment has likewise been thoughtless...

even, believe it or not, the book-scanning operations
like the internet archive haven&#039;t been smart enough to
see the advantages of flexible, powerful light markup.
(amazingly, peter brantley, the director now, often puts 
out their position papers in a very sloppy form of .pdf.)

so this failure to adopt an intelligent archival format has
been very broad, reaching across a variety of segments.

-bowerbird]]></description>
		<content:encoded><![CDATA[<p>oh, and by the way, it&#8217;s not just the government sector<br />
which is using .pdf in a careless and stupid manner&#8230;</p>
<p>the academic world &#8212; another place where reusability<br />
should triumph &#8212; has been equally and sadly myopic.</p>
<p>the business segment has likewise been thoughtless&#8230;</p>
<p>even, believe it or not, the book-scanning operations<br />
like the internet archive haven&#8217;t been smart enough to<br />
see the advantages of flexible, powerful light markup.<br />
(amazingly, peter brantley, the director now, often puts<br />
out their position papers in a very sloppy form of .pdf.)</p>
<p>so this failure to adopt an intelligent archival format has<br />
been very broad, reaching across a variety of segments.</p>
<p>-bowerbird</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Twitter Trackbacks for Open Government and PDF « Shebanator [shebanator.com] on Topsy.com</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2950</link>
		<dc:creator><![CDATA[Twitter Trackbacks for Open Government and PDF « Shebanator [shebanator.com] on Topsy.com]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 17:18:32 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2950</guid>
		<description><![CDATA[[...] Open Government and PDF « Shebanator  shebanator.com/2009/11/02/open-government-and-pdf &#8211; view page &#8211; cached  Daring Fireball just linked to two equally foolish articles about how PDFs are “bad” for Open Government. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Open Government and PDF « Shebanator  shebanator.com/2009/11/02/open-government-and-pdf &ndash; view page &ndash; cached  Daring Fireball just linked to two equally foolish articles about how PDFs are “bad” for Open Government. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evan</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2949</link>
		<dc:creator><![CDATA[Evan]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 14:10:20 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2949</guid>
		<description><![CDATA[I don&#039;t know anyone who&#039;s proposing to publish *only* in XML or HTML; rather, that an open, machine-readable format be the baseline, rather than a closed, impenetrable format like PDF. Centralizing around releasing large data sets in PDF format makes it vastly harder for anyone to access or analyze the data. (For example, I&#039;d rather they released an Excel spreadsheet than a PDF table, even though Excel is proprietary and PDF is ISO. Converting a PDF table into something I can use in any analytical software is generally a manual process.) We shouldn&#039;t have to settle for our government obscuring data from us behind an opaque format.]]></description>
		<content:encoded><![CDATA[<p>I don&#8217;t know anyone who&#8217;s proposing to publish *only* in XML or HTML; rather, that an open, machine-readable format be the baseline, rather than a closed, impenetrable format like PDF. Centralizing around releasing large data sets in PDF format makes it vastly harder for anyone to access or analyze the data. (For example, I&#8217;d rather they released an Excel spreadsheet than a PDF table, even though Excel is proprietary and PDF is ISO. Converting a PDF table into something I can use in any analytical software is generally a manual process.) We shouldn&#8217;t have to settle for our government obscuring data from us behind an opaque format.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: アドビの Flash や PDF を公文書のフォーマットとして使うのは適切でない &#171; maclalala:link</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2948</link>
		<dc:creator><![CDATA[アドビの Flash や PDF を公文書のフォーマットとして使うのは適切でない &#171; maclalala:link]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 12:40:15 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2948</guid>
		<description><![CDATA[[...] Open Government and PDF &#124; Shebanator The issue at hand is not whether governments should pick HTML or PDF. The issue at hand is whether governments are capable of publishing information at all. Show me an HTML creation tool that creates high quality, standards conformant markup from a Word document or any of the zillions of editing tools that government employees use. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Open Government and PDF | Shebanator The issue at hand is not whether governments should pick HTML or PDF. The issue at hand is whether governments are capable of publishing information at all. Show me an HTML creation tool that creates high quality, standards conformant markup from a Word document or any of the zillions of editing tools that government employees use. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Symphonious &#187; Conversion for the Web</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2947</link>
		<dc:creator><![CDATA[Symphonious &#187; Conversion for the Web]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 11:22:16 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2947</guid>
		<description><![CDATA[[...] Andrew Shebanow in Open Government and PDF:   The issue at hand is not whether governments should pick HTML or PDF. The issue at hand is whether governments are capable of publishing information at all. Show me an HTML creation tool that creates high quality, standards conformant markup from a Word document or any of the zillions of editing tools that government employees use. Now add in all the tools used by people who submit documents to the government. And all the versions of those tools released in the last 20 years. Now make sure that the HTML/XML works correctly even when the user doesn’t have the right browser or the right fonts installed. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Andrew Shebanow in Open Government and PDF:   The issue at hand is not whether governments should pick HTML or PDF. The issue at hand is whether governments are capable of publishing information at all. Show me an HTML creation tool that creates high quality, standards conformant markup from a Word document or any of the zillions of editing tools that government employees use. Now add in all the tools used by people who submit documents to the government. And all the versions of those tools released in the last 20 years. Now make sure that the HTML/XML works correctly even when the user doesn’t have the right browser or the right fonts installed. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bowerbird</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2946</link>
		<dc:creator><![CDATA[bowerbird]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 10:51:19 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2946</guid>
		<description><![CDATA[ok, first, the &quot;standards&quot; perspective is a non-starter.
.pdf is a standard, and so is .xml, and they both suck.

but .pdf sucks for a worse reason.  .pdf sucks because
-- even though it&#039;s easy to put content _into_ a .pdf --
it&#039;s very difficult to scrape it back out in a systematic way.
that&#039;s why .pdf is called &quot;the roach motel format&quot;, because
content can go in, but it cannot come out.  and that sucks.
because it&#039;s important that government data can come out.

.xml sucks too.  and forget what i said above, because
.xml might even suck worse than .pdf.  that&#039;s because
it&#039;s difficult to apply the .xml markup in the first place,
and then it&#039;s also often difficult to scrape it back out...
and even if you _can_ scrape the content back out again,
you have to repeat the difficult step of reapplying .xml
if you want to make the content useful in its next round.

what&#039;s needed is a simple straight-out plain-text format
which can serve as a &quot;master format&quot; that can generate
.html (for the web) _and_ .pdf (for those who prefer it).
(with extra credit if a straight copy from either of the
output formats gives you the same plain-text &quot;master&quot;,
such that you could do infinite &quot;round-tripping&quot; of it.)

it&#039;s not hard to invent a light-markup format to do this...

i&#039;ve done it myself -- something called &quot;z.m.l.&quot;, short for
&quot;zen markup language&quot; -- and it works astonishingly well.
(you don&#039;t just get &quot;serviceable&quot; output in .html and .pdf,
you get _powerful_ documents that have lots of features.)

or -- ironic, being that you&#039;re riffing off daring fireball --
you could use gruber&#039;s &quot;markdown&quot;, a fairly similar beast.

the funny thing is that -- since the format is &quot;plain-text&quot;
in nature -- it&#039;s dirt-simple for people to learn and use it,
and surprisingly easy to code apps like authoring-tools...
and, for conversion routines, you don&#039;t have .xslt difficulty.

-bowerbird

p.s.  plus end-users can still keep using their old tools,
because all that old software can produce plain-text files.]]></description>
		<content:encoded><![CDATA[<p>ok, first, the &#8220;standards&#8221; perspective is a non-starter.<br />
.pdf is a standard, and so is .xml, and they both suck.</p>
<p>but .pdf sucks for a worse reason.  .pdf sucks because<br />
&#8211; even though it&#8217;s easy to put content _into_ a .pdf &#8211;<br />
it&#8217;s very difficult to scrape it back out in a systematic way.<br />
that&#8217;s why .pdf is called &#8220;the roach motel format&#8221;, because<br />
content can go in, but it cannot come out.  and that sucks.<br />
because it&#8217;s important that government data can come out.</p>
<p>.xml sucks too.  and forget what i said above, because<br />
.xml might even suck worse than .pdf.  that&#8217;s because<br />
it&#8217;s difficult to apply the .xml markup in the first place,<br />
and then it&#8217;s also often difficult to scrape it back out&#8230;<br />
and even if you _can_ scrape the content back out again,<br />
you have to repeat the difficult step of reapplying .xml<br />
if you want to make the content useful in its next round.</p>
<p>what&#8217;s needed is a simple straight-out plain-text format<br />
which can serve as a &#8220;master format&#8221; that can generate<br />
.html (for the web) _and_ .pdf (for those who prefer it).<br />
(with extra credit if a straight copy from either of the<br />
output formats gives you the same plain-text &#8220;master&#8221;,<br />
such that you could do infinite &#8220;round-tripping&#8221; of it.)</p>
<p>it&#8217;s not hard to invent a light-markup format to do this&#8230;</p>
<p>i&#8217;ve done it myself &#8212; something called &#8220;z.m.l.&#8221;, short for<br />
&#8220;zen markup language&#8221; &#8212; and it works astonishingly well.<br />
(you don&#8217;t just get &#8220;serviceable&#8221; output in .html and .pdf,<br />
you get _powerful_ documents that have lots of features.)</p>
<p>or &#8212; ironic, being that you&#8217;re riffing off daring fireball &#8211;<br />
you could use gruber&#8217;s &#8220;markdown&#8221;, a fairly similar beast.</p>
<p>the funny thing is that &#8212; since the format is &#8220;plain-text&#8221;<br />
in nature &#8212; it&#8217;s dirt-simple for people to learn and use it,<br />
and surprisingly easy to code apps like authoring-tools&#8230;<br />
and, for conversion routines, you don&#8217;t have .xslt difficulty.</p>
<p>-bowerbird</p>
<p>p.s.  plus end-users can still keep using their old tools,<br />
because all that old software can produce plain-text files.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Travis</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2944</link>
		<dc:creator><![CDATA[Travis]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 02:22:31 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2944</guid>
		<description><![CDATA[Er, what? Because both Word and Pages use XML as an internal format, XML is somehow a human-readable format? Uh-huh. Try feeding either Word or Pages some random XML file, and see what happens?

XML is not a universal *format*. XML is a universal *language* that can be used to define data formats - but you need to know how to interpret it. Average humans are generally very bad at interpreting raw XML. Programs that use XML generally only know how to interpret the formats they define, and maybe a few other publicly-defined formats; they can&#039;t interpret the language well enough to understand every random format. Heck, Filemaker Pro deals with data in a far broader way than Word or Pages, and imports/exports XML - but try sending it a random XML file without an XSLT stylesheet to translate it, and see how far you get!

This falls prey to the same fallacy that many open source advocates have - as long as the code is open (document is in XML) everything is somehow OK. It&#039;s only OK for those who have the skill and knowledge to work with the raw data, and that is a tiny tiny TINY fraction of the population. Documents in &quot;XML&quot; are utterly useless to 99% of the public; to be actually useful, they need to be in a defined XML format that most people can read. ODF sort of counts, in that anyone skilled enough to download, install and use Open Office can read it; but a fair number of people simply aren&#039;t. (And I, and probably a fair number of other people, view forcing someone to download and use the entire OO suite just to read government documents to be an unfair imposition; it&#039;s a lot more to download and a lot more work to set up and use than even Acrobat, let alone the simple PDF readers like Preview.)

As for the fantasy that if you mandate it, they will come... that worked real well for ODF, didn&#039;t it? Oh, wait. Microsoft leapt to the call - no, they put on a crash program to develop their own incompatible XML format in OOXML, which last  I heard still contains big binary lumps that make it just as hard for other programs to interpret as the old binary Office formats were. *And* started a massive lobbying/astroturfing campaign to both reverse the ODF decision and establish their own format instead, turning it into a nasty political fight. &quot;[If] you mandate clean HTML or a particular XML version Word will be very quickly updated to produce that format&quot;? Uh-huh. Riiiiiiight.

I&#039;m more of an Adobe-loather than an Adobe-lover these days, but I have to admit PDF is the best tool for distributing human-readable documents. I&#039;ve spent my time down in the trenches too; in my case, it was managing the library of Material Safety Data Sheets for all the products our company distributed. I was there when we still got mostly paper copies from the manufacturers, and we had to scan them to get them in digital format; I&#039;ve dealt with manufacturers that sent them in ASCII text or Word documents; I&#039;ve dealt with them in PDF. As a tool for storing them and forwarding them on to customers, who would either print them out or read them on the computer, PDF was by far the best tool, bar none.]]></description>
		<content:encoded><![CDATA[<p>Er, what? Because both Word and Pages use XML as an internal format, XML is somehow a human-readable format? Uh-huh. Try feeding either Word or Pages some random XML file, and see what happens?</p>
<p>XML is not a universal *format*. XML is a universal *language* that can be used to define data formats &#8211; but you need to know how to interpret it. Average humans are generally very bad at interpreting raw XML. Programs that use XML generally only know how to interpret the formats they define, and maybe a few other publicly-defined formats; they can&#8217;t interpret the language well enough to understand every random format. Heck, Filemaker Pro deals with data in a far broader way than Word or Pages, and imports/exports XML &#8211; but try sending it a random XML file without an XSLT stylesheet to translate it, and see how far you get!</p>
<p>This falls prey to the same fallacy that many open source advocates have &#8211; as long as the code is open (document is in XML) everything is somehow OK. It&#8217;s only OK for those who have the skill and knowledge to work with the raw data, and that is a tiny tiny TINY fraction of the population. Documents in &#8220;XML&#8221; are utterly useless to 99% of the public; to be actually useful, they need to be in a defined XML format that most people can read. ODF sort of counts, in that anyone skilled enough to download, install and use Open Office can read it; but a fair number of people simply aren&#8217;t. (And I, and probably a fair number of other people, view forcing someone to download and use the entire OO suite just to read government documents to be an unfair imposition; it&#8217;s a lot more to download and a lot more work to set up and use than even Acrobat, let alone the simple PDF readers like Preview.)</p>
<p>As for the fantasy that if you mandate it, they will come&#8230; that worked real well for ODF, didn&#8217;t it? Oh, wait. Microsoft leapt to the call &#8211; no, they put on a crash program to develop their own incompatible XML format in OOXML, which last  I heard still contains big binary lumps that make it just as hard for other programs to interpret as the old binary Office formats were. *And* started a massive lobbying/astroturfing campaign to both reverse the ODF decision and establish their own format instead, turning it into a nasty political fight. &#8220;[If] you mandate clean HTML or a particular XML version Word will be very quickly updated to produce that format&#8221;? Uh-huh. Riiiiiiight.</p>
<p>I&#8217;m more of an Adobe-loather than an Adobe-lover these days, but I have to admit PDF is the best tool for distributing human-readable documents. I&#8217;ve spent my time down in the trenches too; in my case, it was managing the library of Material Safety Data Sheets for all the products our company distributed. I was there when we still got mostly paper copies from the manufacturers, and we had to scan them to get them in digital format; I&#8217;ve dealt with manufacturers that sent them in ASCII text or Word documents; I&#8217;ve dealt with them in PDF. As a tool for storing them and forwarding them on to customers, who would either print them out or read them on the computer, PDF was by far the best tool, bar none.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bangersandmash</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2943</link>
		<dc:creator><![CDATA[bangersandmash]]></dc:creator>
		<pubDate>Wed, 04 Nov 2009 02:07:05 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2943</guid>
		<description><![CDATA[Interesting piece.  While I&#039;m not invested in either side of the debate I would like to point out that many PDFs are simply page images. These cannot be indexed in search engines and are thus harder to find.  You are right that documents should be easy to read and create by humans, but it is equally important that the documents be findable using tools that people already know. This is where PDF struggles, IMHO.]]></description>
		<content:encoded><![CDATA[<p>Interesting piece.  While I&#8217;m not invested in either side of the debate I would like to point out that many PDFs are simply page images. These cannot be indexed in search engines and are thus harder to find.  You are right that documents should be easy to read and create by humans, but it is equally important that the documents be findable using tools that people already know. This is where PDF struggles, IMHO.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Shebanow</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2942</link>
		<dc:creator><![CDATA[Andrew Shebanow]]></dc:creator>
		<pubDate>Tue, 03 Nov 2009 20:41:44 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2942</guid>
		<description><![CDATA[I have nothing against HTML and readily admit that it is technically superior in many aspects as a choice for government data. I would love it if governments chose to publish their documents in multiple formats (PDF, HTML, XML, etc.). But I&#039;m also painfully aware of the needs of the non-technical users (the producers and consumers), and what is being proposed (publishing &lt;em&gt;solely&lt;/em&gt; in XML or HTML) is completely and utterly impractical in that context.]]></description>
		<content:encoded><![CDATA[<p>I have nothing against HTML and readily admit that it is technically superior in many aspects as a choice for government data. I would love it if governments chose to publish their documents in multiple formats (PDF, HTML, XML, etc.). But I&#8217;m also painfully aware of the needs of the non-technical users (the producers and consumers), and what is being proposed (publishing <em>solely</em> in XML or HTML) is completely and utterly impractical in that context.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gxm</title>
		<link>http://shebanator.com/2009/11/02/open-government-and-pdf/comment-page-1/#comment-2941</link>
		<dc:creator><![CDATA[gxm]]></dc:creator>
		<pubDate>Tue, 03 Nov 2009 20:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://shebanator.com/?p=274#comment-2941</guid>
		<description><![CDATA[I think you might be missing the point. PDF is a dead end, it is a presentation-focussed universal container format that has a lot of support but the data is locked in when the PDF is baked. Yes there are ways to make PDFs accessible and ways to extract information from scanned PDFs but the broad base means it is a stupid format and should be treated as such. As citizens of a democracy we own &#039;our&#039; government&#039;s data and should be able to easily analyse and manipulate it. As the UK Government MPs expenses scandal has recently illustrated, pretending to be &#039;open&#039; by publishing scanned redacted PDFs is not &#039;open&#039; at all, and is an exercise in obfuscation. It isn&#039;t about the standards geeks, it is about if you want to be able to easily examine and extract this information in future years. HTML is a subset of SGML. PDF is a derivative of postscript. One approach values information, the other presentation. What do you think is more important to capture in a government archive? What is says or how it looked?]]></description>
		<content:encoded><![CDATA[<p>I think you might be missing the point. PDF is a dead end, it is a presentation-focussed universal container format that has a lot of support but the data is locked in when the PDF is baked. Yes there are ways to make PDFs accessible and ways to extract information from scanned PDFs but the broad base means it is a stupid format and should be treated as such. As citizens of a democracy we own &#8216;our&#8217; government&#8217;s data and should be able to easily analyse and manipulate it. As the UK Government MPs expenses scandal has recently illustrated, pretending to be &#8216;open&#8217; by publishing scanned redacted PDFs is not &#8216;open&#8217; at all, and is an exercise in obfuscation. It isn&#8217;t about the standards geeks, it is about if you want to be able to easily examine and extract this information in future years. HTML is a subset of SGML. PDF is a derivative of postscript. One approach values information, the other presentation. What do you think is more important to capture in a government archive? What is says or how it looked?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

