The New York Public Library has digitized its collection of city directories, and the resulting high-resolution images can be browsed and downloaded on our Digital Collections.
As part of the NYPL's NYC Space/Time Directory project and in collaboration with the Data Services at New York University's Bobst Library, we are using optical character recognition (OCR) to turn the city directories into a searchable atlas of historical New York City.
Two meetups have been organized about the digitized city directories:
- City Directories: 137 years of NYC History
- Extracting a Million-Record Dataset from Historical NYC City Directories (slides)
See DIRECTORIES.md
for a table of the city directories we are processing and extracting text from. To just browse and download the scanned books, visit Digitial Collections.
Processed city directory data will soon be published on the NYC Space/Time Directory homepage!
hOCR files will soon be published in our data repository!
Anatomy of an hOCR file name:
Example:
1849.00030.28.56749967.e10e9aa0-5291-0134-79ba-00505686a51c.processed.hocr
1849
⟶ year of directory00030
⟶ page number of original print page28
⟶ image number of sequentially downloaded images from single item (i.e. a directory)56749967
⟶ NYPL asset's individual image IDe10e9aa0-5291-0134-79ba-00505686a51c
⟶ NYPL UUID for individual imageprocessed
⟶ image was preprocessed by ImageMagick textcleaner