From 4125d8cb23df69cc3ba908da497d71729bc8ad96 Mon Sep 17 00:00:00 2001 From: Tamar Levine Date: Tue, 24 Dec 2024 13:02:55 +0000 Subject: [PATCH] Remove tlevine. --- blog/_posts/2013-12-25-parsing-pdfs.md | 340 ------------------ blog/_posts/2014-01-16-newsletter.md | 3 - blog/_posts/2014-06-05-newsletter.md | 8 - ...f-tools-extract-text-and-data-from-pdfs.md | 2 - img/people/tlevine.jpg | Bin 7047 -> 0 bytes members/_posts/2013-12-19-tlevine.md | 13 - 6 files changed, 366 deletions(-) delete mode 100644 blog/_posts/2013-12-25-parsing-pdfs.md delete mode 100644 img/people/tlevine.jpg delete mode 100644 members/_posts/2013-12-19-tlevine.md diff --git a/blog/_posts/2013-12-25-parsing-pdfs.md b/blog/_posts/2013-12-25-parsing-pdfs.md deleted file mode 100644 index e7a9cc8f..00000000 --- a/blog/_posts/2013-12-25-parsing-pdfs.md +++ /dev/null @@ -1,340 +0,0 @@ ---- -title: How I parse PDF files -author: Thomas Levine -username: tlevine ---- -Much of the world's data are stored in portable document format (PDF) files. -This is not my preferred storage or presentation format, so I often convert -such files into databases, graphs, or [spreadsheets](http://csvsoundsystem.com). -I sort of follow this decision process. - -1. Do we need to read the file contents at all? -2. Do we only need to extract the text and/or images? -3. Do we care about the layout of the file? - -## Example PDFs -I'll show a few different approaches to parsing and analyzing -[these](https://github.com/tlevine/scott-documents) PDF files. -Different approaches make sense depending on the question you ask. - -These files are public notices of applications for permits to dredge or fill -wetlands. The Army Corps of Engineers posts these notices so that the public -may comment on the notices before the Corps approves them; people are thus -able to voice concerns about whether these permits would fall within the rules -about what sorts of construction is permissible. - -Theses files are -[downloaded daily](https://github.com/tlevine/scott/tree/master/reader) -from the [New Orleans Army Corps of Engineers website](http://www2.mvn.usace.army.mil/ops/regulatory/publicnotices.asp?ShowLocationOrder=False) -and renamed according to the permit application and the date of download. -They once fed into that the Gulf Restoration Network in their efforts to -protect the wetlands from reckless destruction. - -## If I don't need the file contents -Basic things like file size, file name and modification date might be useful -in some contexts. In the case of PDFs, file size will give you an idea of how -many/much of the PDFs are text and how many/much are images. - -Let's [plot a histogram](https://github.com/dzerbino/ascii_plots/blob/master/hist) -of the file sizes. I'm running this from the root of the documents repository, -and I cleaned up the output a tiny bit. - - $ ls --block-size=K -Hs */public_notice.pdf | sed 's/[^0-9 ].*//' | hist 5 - 15 | 2 | ** - 20 | 55 | ******************************************************************************** - 25 | 4 | ***** - 30 | 4 | ***** - 35 | 11 | **************** - 40 | 4 | ***** - 45 | 2 | ** - 50 | 2 | ** - 60 | 1 | * - 75 | 1 | * - 80 | 1 | * - 95 | 1 | * - 100 | 2 | ** - 120 | 1 | * - 125 | 2 | ** - 135 | 1 | * - 145 | 3 | **** - 150 | 6 | ******** - 155 | 4 | ***** - 160 | 8 | *********** - 165 | 3 | **** - 170 | 6 | ******** - 175 | 7 | ********** - 180 | 24 | ********************************** - 185 | 11 | **************** - 190 | 6 | ******** - 195 | 4 | ***** - 200 | 23 | ********************************* - 205 | 7 | ********** - 210 | 7 | ********** - 215 | 3 | **** - 220 | 3 | **** - 225 | 1 | * - 230 | 1 | * - 235 | 1 | * - 240 | 2 | ** - 245 | 2 | ** - 250 | 1 | * - 255 | 3 | **** - 265 | 1 | * - 280 | 1 | * - 460 | 1 | * - 545 | 1 | * - 585 | 1 | * - 740 | 1 | * - 860 | 2 | ** - 885 | 1 | * - 915 | 1 | * - 920 | 1 | * - 945 | 1 | * - 950 | 1 | * - 980 | 1 | * - 2000 | 1 | * - 2240 | 1 | * - 2335 | 1 | * - 7420 | 1 | * - TOTAL| 248 | - -The histogram shows us two modes. The smaller mode, around 20 kb, corresponds to -files with no images (PDF export from Microsoft Word), and the larger mode -corresponds to files with images (scans of print-outs of the Microsoft Word -documents). It looks like about 80 are just text and the other 170 are scans. - -This isn't a real histogram, but if we'd used a real one with an interval scale, -the outliers would be more obvious. Let's cut off the distribution at 400 kb -and look more closely at the unusually large documents that are above that -cutoff. - -What's in that 7 mb file? Well let's find it. - - $ ls --block-size=K -Hs */public_notice.pdf | grep '742.K' - 7424K MVN-2010-1080-WLL_MVN-2010-1032-WLLB/public_notice.pdf - -You can see it [here](https://github.com/tlevine/scott-documents/raw/master/MVN-2010-1080-WLL_MVN-2010-1032-WLLB/public_notice-2012-08-09.pdf). -It's not a typical public notice; rather, it is a series of scanned documents -related to a permit transfer request. Interesting. - -Next, how are two large files within 5 kb of each other? - - $ ls --block-size=K -Hs */public_notice.pdf | grep 860K - 860K MVN-2012-006152-WII/public_notice.pdf - 860K MVN-2012-1797-CU/public_notice.pdf - -Those are here - -* [MVN-2012-006152-WII](https://github.com/tlevine/scott-documents/raw/master/MVN-2012-006152-WII/public_notice-2012-11-20.pdf) -* [MVN-2012-1797-CU](https://github.com/tlevine/scott-documents/raw/master/MVN-2012-1797-CU/public_notice-2012-10-02.pdf) - -Hmm. Nothing special about those. People see patterns in randomness. - -Now let's look at some basic properties of the pdf files. This will give us a -basic overview of one file. - - $ pdfinfo MVN-2013-00026-WKK/public_notice.pdf - Creator: FUJITSU fi-4010CU - Producer: Adobe Acrobat 9.52 Paper Capture Plug-in - CreationDate: Fri Jan 25 09:45:08 2013 - ModDate: Fri Jan 25 09:46:16 2013 - Tagged: yes - Form: none - Pages: 3 - Encrypted: no - Page size: 606.1 x 792 pts - Page rot: 0 - File size: 199251 bytes - Optimized: yes - PDF version: 1.6 - -Let's run it on all of the files. - - $ for file in */public_notice.pdf; do pdfinfo $file && echo; done - # Lots of output here - -What was used to produce these files? - - $ for file in */public_notice.pdf; do pdfinfo $file|sed -n 's/Creator: *//p' ; done|sort|uniq -c - 33 Acrobat PDFMaker 10.1 for Word - 48 Acrobat PDFMaker 9.1 for Word - 10 FUJITSU fi-4010CU - 135 HardCopy - 7 HP Digital Sending Device - 2 Oracle9iAS Reports Services - 6 PScript5.dll Version 5.2.2 - 4 Writer - -When were they created? - - $ for file in */public_notice.pdf; do pdfinfo $file|grep CreationDate: > /dev/null && date -d "$(pdfinfo $file|sed -n 's/CreationDate: *//p')" --rfc-3339 date ; done - 2012-07-03 - 2012-07-06 - 2012-07-06 - 2012-07-06 - # ... - -How many pages do they have? - - $ for file in */public_notice.pdf; do pdfinfo $file|sed -n 's/Pages: *//p' ; done | hist 1 - 1 | 1 | - 2 | 27 | ********** - 3 | 198 | ******************************************************************************** - 4 | 16 | ****** - 5 | 1 | - 8 | 2 | - 10 | 1 | - 31 | 1 | - 40 | 1 | - TOTAL | 248 | - -It might actually be fun to see relate these variables to each other. For -example, when did the Corps upgrade from PDFMaker 9.1 to PDFMaker 10.1? - -Anyway, we got somewhere interesting without looking at the files. Now let's -look at them. - -## If messy, raw file contents are fine -The main automatic processing that I run on the PDFs is a search for a few -identification numbers. The Army Corps of Engineers uses a number that starts -with "MVN", but other agencies use different numbers. I also search for two -key paragraphs - -[My approach](https://github.com/tlevine/scott/blob/master/reader/bin/translate) -is pretty crude. For the PDFs that aren't scans, I just use `pdftotext`. - - # translate - pdftotext "$FILE" "$FILE" - -Then I just use regular expressions to search the resulting text file. - -`pdftotext` normally screws up the layout of PDF files, especially when they -have multiple columns, but it's fine for what I'm doing because I only need to -find small chunks of text rather than a whole table or a specific line on -multiple pages. - -As we saw earlier, most of the files contain images, so I need to run OCR. -Like `pdftotext`, OCR programs often mess up the page layout, but I don't -care because I'm using regular expressions to look for small chunks. - -I don't even care whether the images are in order; I just use `pdfimages` -to pull out the images and then `tesseract` to OCR each image and add that -to the text file. (This is all in the -[`translate`](https://github.com/tlevine/scott/blob/master/reader/bin/translate) -script that I linked above.) - -## If I care about the layout of the page -If I care about the layout of the page, `pdftotext` probably won't work. -Instead, I use `pdftohtml` or `inkscape`. I've never needed to go deeper, -but if I did, I'd use something like -[PDFMiner](http://www.unixuser.org/~euske/python/pdfminer/). - -### pdftohtml -`pdftohtml` is useful because of its `-xml` flag. - - $ pdftohtml -xml MVN-2013-00180-ETT/public_notice.pdf - Page-1 - Page-2 - Page-3 - $ head MVN-2013-00180-ETT/public_notice.xml - - - - - - - - - - - -Open that with an XML parser like lxml - - # This is python - import lxml.etree - pdf2xml = lxml.etree.parse('MVN-2013-00180-ETT/public_notice.xml') - -One of the things that I try to extract is the "CHARACTER OF WORK" section. -I do this with regular expressions, but we could also do this with the XML. -Here are some XPath selectors that get us somewhere. - - # This is python - print pdf2xml.xpath('//text/b[text()="CHARACTER OF WORK"]/../text()') - print pdf2xml.xpath('//text/b[text()="CHARACTER OF WORK"]/../following-sibling::text/text()') - -### Inkscape -Inkscape can convert a PDF page to an SVG file. I have a -[little script](https://github.com/scraperwiki/pdf2svg) that runs this across -all pages within a PDF file. - -Once you've converted the PDF file to a bunch of SVG files, you can open it -with an XML parser just like you could with the `pdftohtml` output, except -this time much more of the layout is preserved, including the groupings of -elements on the page. - -Here's a snippet from one project where I used Inkscape to parse PDF files. -I created a crazy system for receiving a very messy PDF table over email and -converting it into a spreadsheet that is hosted on a website. - -This function is contains all of the parsing functions for a specific page of -the pdf file once it has been converted to SVG. It takes an -`lxml.etree._ElementTree` object like the one we get from `lxml.etree.parse`, -along with some metadata. It runs a crazy XPath selector (determined only after -much test-driven development) to pick out the table rows, and then runs a bunch -of functions (not included) to pick out the cells within the rows. - - def page(svg, file_name, page_number): - 'I turn a svg tree into a list of dictionaries.' - # County name - county = unicode(svg.xpath( - '//svg:g/svg:path[position()=1]/following-sibling::svg:text/svg:tspan/text()', - namespaces = { 'svg': 'http://www.w3.org/2000/svg' } - )[0]) - rows = _page_tspans(svg) - - def skip(reason): - print 'Skipped a row on %s page %d because %s.' % (file_name, page_number, reason) - - data = [] - for _row in rows: - row_text = [text.xpath('string()') for text in _row] - try: - if row_text == []: - skip('the row is empty') - print row_text - elif _is_header(row_text): - skip('it appears to be a header.') - print row_text - # ... - -I'd like to point out the `string()` xpath command. That converts the current -node and its decendents into plain text; it's particularly nice for -inconsistently structured files like this one. - -## Optical character recognition -People often think that optical character recognition (OCR) is going to be -a hard part. It might be, but it doesn't really change this decision process. -If I care about where the images are positioned on the page, I'd probably -use Inkscape. If I don't, I'd probably use `pdfimages`, as I did here. - -## Review -When I'm parsing PDFs, I use some combination of these tools. - -1. Basic file analysis tools (`ls` or another language's equivalent) -2. PDF metadata tools (`pdfinfo` or an equivalent) -3. `pdftotext` -4. `pdftohtml -xml` -5. Inkscape via [`pdf2svg`](https://github.com/scraperwiki/pdf2svg) -6. [PDFMiner](http://www.unixuser.org/~euske/python/pdfminer/) - -I prefer the -ones earlier in the list when the parsing is less involved because the tools -do more of the work for me. I prefer the ones towards the end as the job gets -more complex because these tools give me more control. - -If I need OCR, I use `pdfimages` to remove the images and `tesseract` to run -OCR. If I needed to run OCR and know more about the layout, I might convert the -PDFs to SVG with Inkscape and, and then take the images out of the SVG in order -to know more precisely where they are in the page's structure. - -*This article was originally posted [on Thomas Levine's site](http://thomaslevine.com/!/parsing-pdfs).* diff --git a/blog/_posts/2014-01-16-newsletter.md b/blog/_posts/2014-01-16-newsletter.md index 1ec2a5a0..97238140 100644 --- a/blog/_posts/2014-01-16-newsletter.md +++ b/blog/_posts/2014-01-16-newsletter.md @@ -30,8 +30,6 @@ The [Enki][14] package for analyzing PyBossa applications was also released over We've had a couple of great new contributions on the [Labs blog][18] since the last newsletter. -[Thomas Levine][19] has written about [how he parses PDF files][20], lovingly exploring a problem that all data wranglers will encounter and gnash their teeth over at least a few times in their lives. - [Stefan Urbanek][21], meanwhile, has written an [introduction to OLAP][22], "an approach to answering multi-dimensional analytical queries swiftly", explaining what that means and why we should take notice. ## Dānabox @@ -68,7 +66,6 @@ Labs is the Labs community, no more and no less, and you're invited to become a [16]: http://ipython.org/notebook.html [17]: http://daniellombrana.es/blog/2013/12/16/pybossa-enki.html [18]: http://okfnlabs.org/blog/ -[19]: http://okfnlabs.org/members/tlevine/ [20]: http://okfnlabs.org/blog/2013/12/25/parsing-pdfs.html [21]: http://okfnlabs.org/members/Stiivi/ [22]: http://okfnlabs.org/blog/2014/01/10/olap-introduction.html diff --git a/blog/_posts/2014-06-05-newsletter.md b/blog/_posts/2014-06-05-newsletter.md index 95bc8a17..f0a2e3ef 100644 --- a/blog/_posts/2014-06-05-newsletter.md +++ b/blog/_posts/2014-06-05-newsletter.md @@ -9,14 +9,6 @@ Welcome back to the OKFN Labs! Members of the Labs have been building tools, vis If you'd like to suggest a piece of news for next month's newsletter, leave a comment on its [GitHub issue](https://github.com/okfn/okfn.github.com/issues/215). -## commasearch - -[Thomas Levine](http://okfnlabs.org/members/tlevine/) has been working on an innovative new approach to searching tabular data, [commasearch](https://github.com/tlevine/commasearch). - -Unlike a normal search engine, where you submit words and get pages of words back, with commasearch, you submit spreadsheets and get spreadsheets in return. - -What does that mean, and how does it work? Check out Thomas's excellent blog post "[Pagerank for Spreadsheets](http://dada.pink/dada/pagerank-for-spreadsheets/)" to learn more. - ## GitHub diffs for CSV files *Submitted by [Paul Fitzpatrick](http://okfnlabs.org/members/paulfitz/).* diff --git a/blog/_posts/2016-04-19-pdf-tools-extract-text-and-data-from-pdfs.md b/blog/_posts/2016-04-19-pdf-tools-extract-text-and-data-from-pdfs.md index e4b808e0..2f6f4d46 100644 --- a/blog/_posts/2016-04-19-pdf-tools-extract-text-and-data-from-pdfs.md +++ b/blog/_posts/2016-04-19-pdf-tools-extract-text-and-data-from-pdfs.md @@ -86,10 +86,8 @@ We also note that Google app engine [used to do this](http://developers.google.c ## Other good intros -* [Thomas Levine on Parsing PDFs][levine] * [Extracting Data from PDFs - School of Data][scoda-1] -[levine]: http://okfnlabs.org/blog/2013/12/25/parsing-pdfs.html [scoda-1]: http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/ [scoda-simple-tables]: http://schoolofdata.org/2013/06/18/get-started-with-scraping-extracting-simple-tables-from-pdf-documents/ diff --git a/img/people/tlevine.jpg b/img/people/tlevine.jpg deleted file mode 100644 index b5e8b88826fe38a80c1ab531151a919a4463ce94..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 7047 zcmbW3WmFqXx9<}IMGFnC2}KHp;!+&iKDa}1ZK1fk7AsnumKKT@mk>0#OK^u^#UTV~ zaW7n+_uO^wTKCgE=Ra%qe3(76_HXu@nSDQZzY2JwB&R3`00IF3;6nr4F9O~Ga6s7D z*dUw-gM)+f2$u*C_dz6tg!n{Y5>iqy2^dU9Nl!&aPD=p>Q$MAqWnh3pp=4A{EKCp< zdI%Kq??3k$bVb+9RM&MAQ6xV0x|)xz(5cfc;5qnKE#Ow{EqmmLJfCuluhj{=t7>D#J->XMtn(uI#T*>)El5+8w zUsrWfXiXfl2)uU-#V4SoqNbr`ea6QAoI_AZ_@#)b*qeW3<>VCO6uy7-2B4Y`o`wgHhKrMdvttqdUk$sd3F6S7Z3pYFRTaqUts?a7x;k- z>me=Jxc_nivAiET2#k&Mlr$a$i1F{*J+a zL3Jo~t0z$GD13=I&PJPV9JX)pC*bw2!IZ>p4A~8rj30?|_+Vn)|EfCUsDC~AGal`3 z!s3fl_`}l^t8F;e^Q;Uc+X4cLz~!blCJki-!?C~}i*q_W&t)zubO_7hMqu17?wESv zqd%M>1|_y!!|ly9QfC-4LzW%QzPGdg(0cIM(J`qIT~BFL&d9m-^&MEU>lINj8Lqsh z2?d3iFkC%Zaaf#%jSyUME1j9zY)u;vua_Gt4@Ja#07_s{r^SfvoT*%$85ZU&KfRwm z0_qdLhaM|-Wn74q8a};MS&K-+!^FHPg2TKZJhf9?++CRc@y|EyDv;D=u5O@!OOa!e6m5!_7T~Xcxx|)sA{Q_2SMm%>A}q5Uo7)$&#nDq37@;io|f|2UJQ6y+S4?j#9R|7 zE1T#1O`}c;*SwW4+AwxKLWdUGri{tyZ`!C|i)RiKQ?vKE;UTXCmGUhbHM{GpHfdpT z-Nv}~@AEA)DtGu&E-l{x4_5ezQ86 zt^ke!0KTOumq=|R|3#7C)|4dBYjI;E`2256Hf|uJl)ANXNS_QnioeGsljPACr@>S| z;(ACCOI(p^?Wwqny{6!ZQOY^Ki~o8n+1fpzClgl}VcArJs6^Do5=K8u@kpxL49I~d zdMvAJxZ(JD2%l^JliZcqqyW@2elpMS>dzT$LsK&LL*$nA2&`ateqd?~ckZpV@=U^Q z_ttG0Cd}4;8SfrI(aSdXpL4s}hNq?MC!OH@jxHafoS7)C5eO5&%)h-10i1YYgANTasuu=>P-_g_Is2@$xR@inTjQF+o z`R)o1xcos+bq^pHY^KhWp)@%NnG3q!6;*{qT@#Wwz&X@E&lcU8EZ$uSNtjWV&sUni zyVO<{a?9==^otN!_(D7f7fVnKCM_qqtwFA79=uyZ&Gr9590SI-RMYiy zq6^aB{-dGzUZ&Tj`c`>zEd>4jQ#@Y-qRaIle)AlZVyeB~4zUv}I9}L&xO5%+y+aIp zj=-xl-c}u8^Jbr4Or)L_`n7X2-;rNwSFbTSy*56H>0nExnEw>D= z^zh*1D?6H7c-|n+E8&A~`l${K87Cnr**S1wl+fbmMRY3v#ABoli0+XZF2fc2THC|6 zE-iC?iY5%_Y7_S>_C3HK;U3Sc-fD#r-FsR(5s@UxbkS4K%RMwAyv>GP&7##}N$QuZ zc9Io{f3RGl?BKaqoMk_^+Qb+Q@l}afmfHH9Qh5_(xGscuCe9fdfByQJ1fLc4*t@L8 zzrLkj+6qPc<5B^`Bvz905F|>Z;4Dy(bMK{JUTw|cU2~Bg-fX?@kfRUXONWjSf&(jy zerjM%7{;{b@Y_0mRYS@@vpg0iqB^lJwR`rOU=_X>k*VuDK+0s zM&D+)hoxK}AyMPH0qLdPWp8}#@nbu5Wwm-rE78H~YTt2>3UJu_S=p7?7D@MJy_%Z! zO-54>D56`^HqAfNzYG0sg_#aH!x)MMZhD`uZ{z$J2-vrdEnM3Lb|R!qp!S%du}RpHm+()RAAn?f-RacEtMTTe#+rD=#`=7>b`iUP z>lBkmbw_#1{`f!RT=p9<#?UTj(r%I%c%4Dsx~(^8=Xq0-QBwb+BKhrCv*s8|p_LI? zXV-0B`V+Y#>G?d8nD0erUT^cWSoJ63Hr95R{iv^E3As%5(Q#j-;Dy_mU{WJ z15z)#^~JI%D%JW>)!YhNZ|y|-9IfE&l8u$D8MNQQ@c<_h2lS#K)mm8}BZGNI}r z{2b;?G5&(`)z;Y#uJc%XolCW7s#VfaswfKqCXdd}Wu3#`QfVTtK&Lz73~)}dbjvmY ztK)Lw6vuEggHL%z>i#d{$5&y{G zfR}5jX4SIAc9oB)qb(5xRta{_E{b$-D(S&l;#P<+lU{rUiesi>W1Y?4Ju8MFi|!@|;G>8P+UqhszDv9y=W9o-uqM;jTf# zy|U+%bQv!MdIe*lMKrn#>T6eLytjNaY=>Dz^fkLE#e5ZP|INo7e8|CazJ`~%NT*iz z8lxTE1jj%kBlS>-@T?Gvqa7!qxQ!es9xE{bJr#SMz9}VmDW?8d*>0kj8u4ngrsOK~ ziX~eu@)$06By7kR_bH z#`ttQ6_YNjMMuLy<;5R3X4vPgwcR+OCNb`+aIDJ3*!T%s>zmDwFMW$9he44uP3Y5# zi!P_@h%iUjm&0hc)V`PIN!dsWq}h(%!ZVnC(GUyq?|VSAzppn@{2LNAJgHTbML}1S z)DMbU2iGBJ$u=S;ATIC6TE^Qr=c<3ynJNR!2#gFrhzJGPmE$r?YZ~7JqCLUhQT4=P zF{=qeV@ghxUZTF^2QNlOyn~1xkaNS~K2kHirtJEzdnTx`I)8sjK4yy(a}U^_Y{QP* zxUJg032)!Woz1sR<(#Yy{U!T3vcVJ}NsKeE1^%At=lV+PG_|{5#!k_NuQ#?SS_F;U za$QmJb>H$<77Y87l$KIr4q%K1YX;0)s_*2|Z7?w5e4mS62I1q`1zvsQ#Wx8ncvynw zDdASQMDAIEefCkcQ%+Ws6?Lq5i1S{Zv>!NW4FqjP^w~<+(mhS2W|w~FBO6u*c@jJp zDCzIGhZUf;t*q?g+b2h?aXoyJ*LECS7v{WHZ_#Wybaaa_3g!XxBD)Ht(BbPYE_LK= zYCTR%BAl6HgGgpW=ya)#aPl{{wvl1e_xa@apHkEC_K>2Z(#(5iBDYMv2?>rFp#~H` zapZ~lP_RJnwKTQ+9^N40JG$)w4pBBZb&-~Km8NF9h4Z>qLeRt4 zPGS1W(n(Kt`oxoTF{9@h)@j{au9mQ*W?XWDtLy`C%{n!f=6C z!%2sopoYj|F~Ja>vAJP&{<267>}^!DF=C}0sJKTfSkx&P(PF98YfL zbi(|k=E<-sx2Lp)#@5|bP5SR8utRp(60Ngc?2PC<8on@d%lb9=i@lUE*0Kj%K+e}^ z9Mde`w8RwsjdOmMc;VFEvMXRB1^=654LnhDLMp)t3PzJyigCZj^BH^oJl&H2$WXW< zAknoevneN#=wk1z^6APBa9}_io7%pRWgHN>A`?-mbN`y2s6mf z87!wEa!&M81aSB|Tf=@iKsC?KkR;-V<_&Z6Pv^P3?OV>2%i+n(qSd9tF`GZqb3K*v z`~egF1eq%)&5`SNRRXcF!Qn%V;vY>CArkM_{?Pa=s?u^NUVGN1lbZ=RVFIJF6^f%g&_)#0{?uy>FFAcZ1F{_31JJ^=j<-aozN%$T2q{ny#KEhIc-#Ryr1# zK3PmZOThCfgo1a#cH*YES{9Arqp$fBU9&c@&9C1#DytLi){nDfG2RHK4g!zA5(7th z-H>zBH3P)qMY|;*$dpcMM4~LW(q0#LoB}?69nvX0SUL!en~Lv038Fy0UuFq8y(va( ze*ib|p|PyEk3oM@$2K%Oc@f-mGz6|!IxsdRthMFSTFWeOQ;ZN76J09b;?gIcM71pdIPRL-}yrPfl^Qs48Wr zmNH==R?C$Ig(`UY7&D>dXfvW7Y)aGK?(2zfek0kk^wZ)eI^^v=z_x#N3T83oBzPvA zQeq~^?GiX<_ve;w-AGwgSisTsN!*i1NGHyW%Yl;l%e+oh<@5M$neY@)qU`(C4O`>I zbuUC@w_iauW?`={HZv5;rZi}0k1^;PVgF`jEh8 z7GDv(pof|Zc&WTj5yL^r+wileHwdRd zF&fQZ?z^I0+fq<#B~n6k({b8HA50ReaWk8o=phyLm1>eH$&wCp`6n#+Jyu8 z%ZjcJlPo+Zk29msPvNqc%^PNf#&+pE;oFDDyMDVrm);bsQokw8rK^s+NQ;7cyz`}D zE6|>K@hH!RxvNBio_KjFDBBN{VWKST+GY18T|TMv2&6sfq`*s;VZp+!CXHOkZ@nCM zimhDx`PtTt%T2ToI99xuHBW(3Uj0GXuo+x|;*c@-9rZ&W-xX_I|FtGE_Na@H;ok2AlrscI4=p%q*q)4RgKj-Q4pt zUDghE4|O#&yp--j2zwpN2HDadun`aEr7Sl<=>nK1^z^ee5k$;i!+b3Y1Ixzsb*SCx zr+$3dM~FcQ?5=Sxg3Cv?YOS2votD~`rthE=9(#mA2a6(+-_<^vx(Hy5s~6`nn)0^% zBC0t!%oOQu(q=@vU;|ac5&MSBTz)zG&;p(fT!_1TE0v?bz+e2Pg!T@9sv zdHQ^$#eXvsZwR6%goYB-O_kdSN9?ZVhP;kFXM+3uG5k8=NUH0izARy>TeIg@(%h>E zf!zr79AWCKd&2hr@BywNXS)#Ah{5fZfkqDRc;I%+w{y>j8VLTW0Fjm|fNsW8duct{ zN+Q!3snIsg17}LocnTie0QZ#gE0c9xr?2Ny={n(cKe=KZ9iyLa5r-S!D%4j_g?+ce z2@4_sPG@|ifNjh{RVKL2Z&Xlv8YMTM>p;XES@>|1>gj^yNh(gK-L=RC^X?BMza%zJ zM||6R0-07vohX*|dA;Stcp4y|JpbBH=Td8XIiVR;RA9NQ5GE8l#jFN(M(49Vor}3vpee;oA0DSQY3|aOUgmBPrfHgX;?F$d&s#KdJcb42?ckcB!{eyb*+$zo8wa zHSeKh;3Z)YIXf?1~CwnbiAS8!gk z*dW{&B?p_Sx6QTKyaw2bNz_Cyuv(z6CbOQ;XgzEGzTsdC*|-Ew?U^>N{^HPI=Ql$} za()UiVtsYNa1R(W&8VAZtQ9X7RbPK8*mxGmF|sv#u-KnePuLKZ{Ad74+-Tb3=APbp zL=yFJ(f4sd&K0=RilL*|uP9j1{y@l+^K;1wW!?+7ZaXu#9tvLlWcA^Zp+KtT5_<^o z!`tVU;B3o<$n}_XpoCY+7>91*EruXfEHhjU-nlCYd?^`moN*vSEA_)E<8t@>9G+cI zK4T;z+cV`mLUxiAR?d1v&FVAbSjIK3UQq1$jfG1D+S=B4Ph#|{kX;2swfqW_ujsyH zS4`ZFsA>&q1QCo-);TeL$yM~`{UC=_9ZLG6&8;3oG%Mr_qk)p%YD_yhdED42g{v;D zx$e!jefgUipi+4&RfG6mte@9`w7WvDn|z!1{E;cVg9BD_KUlc}o3M49&%Be;X(?1T zzK!t%>+2;5Qp6RjCl@v|yzFQOEV%oQ-!@nCa~2%cM+m9f9TQ5@si$wR!iUIZEE61( z?p}>4?c9-L5-SpD!oA^iN*1Y-Wz&wQUEiLJkd?bGN+hY1B{MZAT5K~{^v^35|2ZMW zlPHbnGH?=NQ+sZVz9S6~X&gV@Aui^sa1v-TTjmSJ5&HPx-$tNUUgv=QVns5QIxmh>SJ#xyj}HNJv*ytHDwAFSpx%apZDJbNZl_{CQ9%kZA0>JrlFXNXj{Dag;{Fy zAR-$o>2GGE%_|Ytj=%PY%74kt8R(2xN3g)>anD#g^QgVGy| zk$3li7oB&O9=Ovnrw86f;{klmq;}uV&(lnjjEY(3fRiF$C>^{2u6t%e=^J{7C7cL9 zaMQ23EY6wA!?pHgxx_2+i;$F5X?94FJMdD+48Qx_Uy&u*Dt+A~_gAF?=x4&aYqZe7 zrL%7PQ7!CIwxeWQXtmW-PY32?TaYt0z@0W8kL02O6Yp>1INP^E0}Nq=O9sK;jKSQ^ zD@VWZu$FPAZ%1Z6JM6eQ(shFE!`E!!PviAB=`3}joF)An-IpQQhp!+r$;nmNJnCC& zH!@%WJ`vLvl-tyML;0ux*T3ntasCR~numcsvxXV2rAS+r&*$>8Y@%bsk+`AMvB7wt zBy=0qY=9I(xQ%MEMbg{49CNT`c6CT@W2qeq%W_Zj3p(4|ZROOr&CK6=ulDKm9w5DD wYE+2f3_P0Z@kUV!V;=vl%b$&pd2|ZC`tW0>o|)(xXgHbdG@P1`kiMV)U!76BzyJUM diff --git a/members/_posts/2013-12-19-tlevine.md b/members/_posts/2013-12-19-tlevine.md deleted file mode 100644 index 395d42e2..00000000 --- a/members/_posts/2013-12-19-tlevine.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -username: tlevine -title: Thomas Levine -area: Dada artist -email: _@thomaslevine.com -web: https://thomaslevine.com -img: /img/people/tlevine.jpg -permalink: /members/tlevine/index.html -roles: [newsletter-contributor] ---- -Thomas Levine is a dada artist interested in sleep. -He wrote about [government open data portals](https://thomaslevine.com/open-data/) -from 2012 to 2013.