Skip to content

Commit

Permalink
Merge pull request #3257 from programminghistorian/Issue-3256
Browse files Browse the repository at this point in the history
Issue 3256
  • Loading branch information
charlottejmc authored May 9, 2024
2 parents ae81317 + d2a628f commit e358599
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion en/lessons/text-mining-with-extracted-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ As before, the data is returned as a Pandas DataFrame. This time, there is much

{% include figure.html filename="single-row-tokencount.png" caption="Single row of tokenlist." %}

The columns in bold are an index. Unlike the typical one-dimensional index seen before, here there are four dimensions to the index: page, section, token, and pos. This row says that for the 24th page, in the body section (i.e. ignoring any words in the header or footer), the word 'years' occurs 1 time as an plural noun. The part-of-speech tag for a plural noun, `NNS`, follows the [Penn Treebank](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) definition.
The columns in bold are an index. Unlike the typical one-dimensional index seen before, here there are four dimensions to the index: page, section, token, and pos. This row says that for the 24th page, in the body section (i.e. ignoring any words in the header or footer), the word 'years' occurs 1 time as an plural noun. The part-of-speech tag for a plural noun, `NNS`, follows the [Penn Treebank](https://web.archive.org/web/20180730200619/https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) definition.

> The "words" on the first page seems to be OCR errors for the cover of the book. The HTRC Feature Reader refers to "pages" as the $$n^{th}$$ scanned image of the volume, not the actual number printed on the page. This is why "page 1" for this example is the cover.
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/topic-modeling-and-mallet.md
Original file line number Diff line number Diff line change
Expand Up @@ -619,7 +619,7 @@ report.
[Guided Tour to Topic Modeling]: http://www.scottbot.net/HIAL/?p=19113
[Topic modeling made just simple enough]: http://tedunderwood.wordpress.com/2012/04/07/topic-modeling-made-just-simple-enough/
[Some Assembly Required]: http://web.archive.org/web/20160704150726/http://www.lisarhody.com:80/some-assembly-required/
[Topic Modeling in the Humanities: An Overview | Maryland Institute for Technology in the Humanities]: http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/
[Topic Modeling in the Humanities: An Overview | Maryland Institute for Technology in the Humanities]: https://web.archive.org/web/20130116223500/http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/
[Latent dirichlet allocation]: http://dl.acm.org/citation.cfm?id=944937
[bibliography of topic modeling articles]: http://mimno.infosci.cornell.edu/topics.html
[Computational Historiography]: http://www.perseus.tufts.edu/publications/02-jocch-mimno.pdf
Expand Down
2 changes: 1 addition & 1 deletion es/lecciones/topic-modeling-y-mallet.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ Puedes reutilizar los datos tomándolos de [Figshare.com](https://ndownloader.fi
- Para amplia información adicional y una bibliografía sobre *topic modeling* podrías empezar con el [Guided Tour to Topic Modeling](http://www.scottbot.net/HIAL/?p=19113) de Scott Weingart.
- Una discusión importante sobre la interpretación del significado de los tópicos es '[Topic modeling made just simple enough](http://tedunderwood.wordpress.com/2012/04/07/topic-modeling-made-just-simple-enough/)' de Ted Underwood.
- El artículo de blog '[Some Assembly Required](http://web.archive.org/web/20160704150726/http://www.lisarhody.com:80/some-assembly-required/)' *Lisa @ Work* 22 de agosto de 2012 escrito por Lisa Rhody también es muy revelador.
- Clay Templeton, '[Topic Modeling in the Humanities: An Overview](http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/)', Maryland Institute for Technology in the Humanities, n.d.
- Clay Templeton, '[Topic Modeling in the Humanities: An Overview](https://web.archive.org/web/20130116223500/http://mith.umd.edu/topic-modeling-in-the-humanities-an-overview/)', Maryland Institute for Technology in the Humanities, n.d.
- David Blei, Andrew Ng, and Michael Jordan, '[Latent dirichlet allocation](http://dl.acm.org/citation.cfm?id=944937)', The Journal of Machine Learning Research 3 (2003).
- Finalmente, te recomendamos que consultes la [bibliografía de artículos sobre *topic modeling*](http://mimno.infosci.cornell.edu/topics.html) de David Mimno. Están clasificados por temas para facilitar encontrar el artículo más adecuado para una aplicación determinada. También puedes echar un vistazo a su reciente artículo sobre [Historiografía Computacional](http://www.perseus.tufts.edu/publications/02-jocch-mimno.pdf) en la revista *ACM Transactions on Computational Logic* en el que analiza revistas científicas de los Clásicos a lo largo de cien años para aprender algo sobre este campo. Mientras el artículo debe leerse como un buen ejemplo de *topic modeling*, su sección sobre 'métodos' es especialmente relevante porque incluye una discusión sobre cómo preparar los textos para un análisis de ese tipo.[^13]

Expand Down

0 comments on commit e358599

Please sign in to comment.