rebuild pubs

wwcohen · Mar 18, 2024 · ac02f99 · ac02f99
1 parent 9ef214c
commit ac02f99
Show file tree

Hide file tree

Showing 7 changed files with 39 additions and 214 deletions.
diff --git a/index.html b/index.html
@@ -39,9 +39,11 @@ <h3 class="title">Principal Scientist, <a href="http://ai.google.com">Google AI<
 
 <h3 class="sec"><a name="bio"></a>Biography</h3 class="sec">
 
-William Cohen is a Principal Scientist at Google, and is based in
-Google's Pittsburgh office. He received his bachelor's degree in
-Computer Science from
+William Cohen is a Visiting Professor at Carnegie Mellon University in
+the <a href="http://www.ml.cmu.edu">Machine Learning Department</a>.
+He also holds a position as a Principal Scientist at Google, where he
+worked full-time between May 2018 and March 2024.  He received his
+bachelor's degree in Computer Science from
 <a href="http://www.duke.edu">Duke University</a> in 1984, and a PhD
 in Computer Science from <a href="http://www.rutgers.edu">Rutgers
 University</a> in 1990.  From 1990 to 2000 Dr. Cohen worked at
@@ -54,10 +56,7 @@ <h3 class="sec"><a name="bio"></a>Biography</h3 class="sec">
 the <a href="http://www.ml.cmu.edu">Machine Learning Department</a>,
 with a joint appointment in
 the <a href="http://www.lti.cs.cmu.edu">Language Technology
-Institute</a>, as an Associate Research Professor, a Research
-Professor, and a Professor.  Dr. Cohen also was the Director of the
-Undergraduate Minor in Machine Learning at CMU and co-Director of the
-Master of Science in ML Program.
+Institute</a>.
 
 <p>
 Dr. Cohen is a past president of
@@ -102,17 +101,13 @@ <h3 class="sec"><a name="bio"></a>Biography</h3 class="sec">
 Award</a> for the most influential paper of the ISWC-2013 conference.
 
 <p>
-
-Dr. Cohen's research interests include question answering, machine
-learning for NLP tasks, and neuro-symbolic reasoning.  He has a
-long-standing interest in statistical relational learning and learning
-models, or learning from data, that display non-trivial structure.  He
+Dr. Cohen's research interests include include question answering,
+machine learning for NLP tasks, and neuro-symbolic reasoning, and he
+has a long-standing interest in statistical relational learning.  He
 holds seven patents related to learning, discovery, information
-retrieval, and data integration, and is the author of more than 200
+retrieval, and data integration, and is the author of more than 300
 publications.
 
-<p>Dr. Cohen is also a Consulting Professor at the School of Computer
-Science at Carnegie Mellon University.
 <!-- <h3 class="sec"><a name="cv">Curriculum vita</cv></h3 class="sec">
 
 <ul>
@@ -125,16 +120,23 @@ <h3 class="sec"><a name="announce"></a>Announcements and FAQs</h3 class="sec">
 
 <ul>
 
+  <li>March 2024: As you can see from my updated bio above, I am have
+  returned to CMU's ML department full-time (although I still have a
+  20% involvement a Google, so that email will work!)  I'm really
+  looking forward to re-engaging with my friends at colleagues at CMU.
+
   <li>Nov 2023: I'm honored to report that the paper <a href="https://link.springer.com/chapter/10.1007/978-3-642-41335-3_34">Knowledge
-      Graph Identification</a>written by Jay Pujara, Hui Miao, Lise Getoor and myself,
+      Graph Identification</a>, written by Jay Pujara, Hui Miao, Lise Getoor and myself,
     won a <a href="https://iswc2023.semanticweb.org/awards/">10 year best paper award at
       the International Semantic Web Conference, 2023.
-<li>Oct 2023: I will be visiting CMU's ML department on Tuesdays in Fall 2023.
-<li>May 2023: I'm very honored to report that one of
+
+ <li>May 2023: I'm very honored to report that one of
   the <a href="https://arxiv.org/abs/2209.12153">papers</a> I
   co-authored at EACL 2023 (with Julian Eisenschlos, Jeremy Cole, and
   Fangyu Liu) won an Outstanding Paper Award.
 
+</ul>
+
 <!-- 
 
 <h3 class="sec"><a name="proj">Projects</a></h3 class="sec">
@@ -182,196 +184,16 @@ <h3 class="sec"><a name="teach"></a>Teaching</h3 class="teach">
 
 <h3 class="sec"><a name="sw">Software and demos</a></h3 class="sec">
 
-<!-- 
-<b>Demos:</b> 
-<ul>
-
-
-<li>
-Measure twice, cut once - <a
-href="http://www.cs.cmu.edu/~vitor/">Vitor</a> and <a
-href="http://www.cs.cmu.edu/rbalasub">Ramnath</a> have developed a <a
-href="http://www.cs.cmu.edu/~vitor/cutonce/cutOnce.html">Thunderbird
-plugin</a> that implements <a
-href="http://www.cs.cmu.edu/~wcohen/postscript/ecir2008.pdf">recipient
-recommendation</a> and <a
-href="http://www.cs.cmu.edu/~wcohen/postscript/sdm-2007-leak.pdf">leak
-detection</a> for email.  It modifies Thunderbird by adding an
-additional pane that pops up after you send a message, giving you one
-final chance to fix any errors in your recipient list.  There's a
-brief <a href="cutonce.pdf">writeup on how to use it,</a> but it's
-pretty self-explanatory: just download it, open Thunderbird, and go to
-the tools->addon menu to install.  After you've installed it, you
-train by opening your folder of "Sent" mail and pressing the "train"
-button.  (This took about an hour for my 9000+ old messages.)
-
-<li>
-<a href="http://www.cs.cmu.edu/~nmramesh/">Ramesh
-Nallapati</a> has put together two nice demos of his <a
-href="http://www.cs.cmu.edu/~wcohen/postscript/topic-tomography-submitted.pdf">multiscale topic tomography</a> topic-modeling technique, one
-for articles from <a
-href="http://www.cs.cmu.edu/~nmramesh/science_demo/multiscale_home.html">Science</a>,
-and one with <a
-href="http://www.cs.cmu.edu/~nmramesh/cancer_demo/multiscale_home.html">cancer-related
-articles from PubMed</a>.
-
-<li>
-Here are two movies that demo SimStudent, a programming-by-demonstration
-system for constructing cognitive tutors, built by <a href="http://www.cs.cmu.edu/~mazda/">Noboru Matsuda</a>.
-  <ul>
-    <li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Interactive/2x+3_5.mov">Interactive mode</a> (solves problems proactively, as way of posing queries)</li> 
-    <li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Non-interactive/3x_9.mov">Non-interactive mode</a></li>
-  </ul>
-
-</ul>
-
--->
-
-
-<ul>
-<li><a href="https://github.com/TeamCohen/TensorLog/wiki">TensorLog is
-a probabilistic first-order logic which is fully differentiable.
-<li><a href="https://github.com/TeamCohen/ProPPR/wiki">ProPPR</a> is an older
-"locally groundable" probabilistic first-order
-logic. 
-<li><a href="https://github.com/TeamCohen/GuineaPig">Guinea Pig</a> is
-a pure Python workflow language for Hadoop.
-
-<p>
-
-
-<li>Bhuwan Dhingra is
-distributing <a href="https://github.com/bdhingra/ga-reader">an
-updated version of the Gated Attention Reader</a> via Github.  As of
-Dec 2016 the GA Reader is obtaining state-of-the-art results on
-several of the standard benchmarks for answering cloze questions.
-
-<li>Here is <a href="http://www.cs.cmu.edu/afs/cs/Web/People/dmovshov/software.html">a comment-completion Plugin for Eclipse</a>, from Dana Movshovitz-Attias.
-<li>Here is <a href="https://github.com/rbalasub/jigsaw.git">Ramnath Balasubramanyan's BlockLDA</a> code, as well as some of the other algorithms from his thesis, is available on GitHub.
-<li>Code for <a href="http://www.cs.cmu.edu/~nlao/code/2010.pra.gz">Ni
-Lao's PRA method</a> (described in
-our <a href="http://www.cs.cmu.edu/~wcohen/postscript/ecml-2010-ni.pdf">ECML
-paper</a>) is available.
-<li>
-<a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page contains
-<ul>
-<li>the <a href="http://www.cs.cmu.edu/~frank/code/icml2010-code.zip">code</a>
-for power iteration clustering (the algorithm described in our
-ICML-2010 paper) as well as
-the <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">datasets</a>
-we used in the experiments.
-<li>the <a href="http://www.cs.cmu.edu/~frank/code/asonam2010-code.zip">code</a>
-for MultiRandomWalk (the semi-supervised learning algorithm described in our
-ASONAM-2010 paper) as well as
-the <a href="http://www.cs.cmu.edu/~frank/data/anonam2010-data.zip">datasets</a>
-we used in those experiments.
-</ul>
-
-<p>
-
-
-<li>
-<a href="http://secondstring.sourceforge.net">SecondString</a> is
-another open-source Java package, of approximate string matching
-techniques.
-  <ul><li>SecondString includes a jar for part of an ancient version 
-  of Minorthird.  For those that are interested in <a href="radar.tgz">the source behind 
-  the mysterious cls.jar</a>, here it is.
-  </ul>
-
-<!---
-
-<li><a href="slipper/">SLIPPER</a> and <a href="whirl/">WHIRL</a> are
-now being distributed via Rutgers University.  They are free for research
-purposes.
-
---->
-
-<li><a href="slipper-linux.tgz/">SLIPPER</a> is an old old
-rule-learning system Yoram Singer and I developed.  This code is
-provided with absolutely no warranty, promise of support, or really,
-any expectation that it will keep working.  You are totally on your
-own with this one, friend.  
-
-<h3 class="sec"><a name="data">Datasets</a></h3 class="sec">
-
-The following datasets are available for anyone to use for research
-purposes:
-<ul>
-
-<li>Zhilin Yang is
-distributing <a href="http://kimi.ml.cmu.edu/qa_ssl/">the data from our
-ACL-2017 paper on semi-supervised QA<a>.
-
-<li>Lidong Bing has
- distributed <a href="http://www.cs.cmu.edu/~lbing/#Datasets">two
- datasets from our joint work</a>: the data used in our EMNLP 2015
- paper, Improving Distant Supervision for Information Extraction Using
- Label Propagation Through List, and also the dataset used in our AAAI
- 2016 paper, Distant IE by Bootstrapping Using Lists and Document
- Structure.  The <a href="http://curtis.ml.cmu.edu/gnat/biomed">data
- extracted by this system can also be browsed</a>.
-
-
-<li>Ni Lao has distributed the labeled data from our EMNLP 2010 paper,
-Random Walk Inference and Learning in A Large Scale Knowledge Base,
-both <a href="http://www.cs.cmu.edu/~nlao/data/publish.amt.labels.tar.gz">Turker-labeled
-data</a>
-and <a href="http://www.cs.cmu.edu/~nlao/data/publish.distant.supervision.tar.gz">NELL
-pseudo labels</a>.
-
-<li><a href="http://rtw.ml.cmu.edu/wk/coordterm/syntactic/">Coordinate
-terms extracted from a MALT-parsed corpus with 230B sentences</a>,
-produced by Malcolm Greaves. (Corpus is ClueWeb 2009, Wikipedia from
-November 2011, Project Gutenberg, and Citeseer.)
-
-<li><a href="CrowdComp_MTurkData.tar.gz">Data sets</a> for my paper
-"Crowdsourced Comprehension: Predicting Prerequisite Structure in
-Wikipedia" with Partha Talukdar from BEA-2012.
-
-<li><a href="http://rtw.ml.cmu.edu/wk/WebSets/wsdm_2012_online/index.html">Collections
-of HTML Tables, hyponyms, as well as extracted entity clusters and MLT
-evaluations</a>, all associated with
-<a href="http://www.cs.cmu.edu/afs/cs/Web/People/bbd/">Bhavana
-Dalvi</a>'s paper
-on <a href="postscript/wsdm-2012-bdd.pdf">WebSets</a> from WSDM-2012.
-
-<li>The <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">network
-datasets</a> used in the experiments of our ICML-2010 paper
-are on <a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page.
-
-<li>
-<a href="all-bibdata.tgz">100,000+ bibliography entries</a>, in the original BibTeX format, converted to an EndNote-like format, and in a featurized format, for experiments with matching (60M).
-
-<li><a href="http://www.cs.cmu.edu/~vitor/codeAndData.html">617
-messages from 20 Newsgroups, annotated for reply bodies and
-signatures</a>, prepared by my former student <a
-href="http://www.cs.cmu.edu/~vitor">Vitor Carvalho</a>
-
-<li><a href="http://www.cs.cmu.edu/~einat/datasets.html">
-Two subsets of the Enron data, annotated with person names</a>,
-prepared by my student <a href="http://www.cs.cmu.edu/~einat">Einat
-Minkov</a>.
 
 <li><a href="http://www.cs.cmu.edu/~enron">Enron email dataset</a>
 (400Mb, once you get there) contains 800,000+ emails from 150 users+
 organized into 4700+ folders.
 
-
-<li><a href="repository.tgz">A collection of various extraction datasets
-in Minorthird format</a> (6Mb), including about 1000 Enron emails tagged
-for person names and temporal expressions.
-
 <li><a href="classify.tar.gz">classify.tar.gz</a> (0.4Mb) contains
 nine problems in which the goal is to classify short entity names.
 This data was used in <i>Joins that Generalize: Text Classification
 Using WHIRL</i> (KDD-98).
 
-<li><a href="ranking-data.tar.gz">ranking.tar.gz</a> (8Mb) contains the
-data used for the meta-search experiments in my JAIR paper <a
-href="http://www.jair.org/abstracts/cohen99a.html">Learning to Order
-Things</a> (with Rob Schapire and Yoram Singer).
-
 <li><a href="match.tar.gz">match.tar.gz</a> (0.7Mb) contains a suite of
 <i>labeled</i> entity-name matching and clustering problems
 (i.e. problems for which the correct matches/clusters are provided),

diff --git a/pubgen/pubs.json b/pubgen/pubs.json
@@ -15,7 +15,7 @@
     "title": "SEMQA: Semi-Extractive Multi-Source Question Answering",
     "authors": "Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler",
     "venues": "",
-    "year": "2023",
+    "year": "2024",
     "topics": "nxR",
     "url": "https://arxiv.org/abs/2311.04886",
     "cite": "NAACL-2024",
@@ -28,7 +28,7 @@
     "title": "MEMORY-VQ: Compression for Tractable Internet-Scale Memory",
     "authors": "Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie",
     "venues": "",
-    "year": "2023",
+    "year": "2024",
     "topics": "nxR",
     "url": "https://arxiv.org/abs/2308.14903",
     "cite": "NAACL-2024",

diff --git a/pubs-R.html b/pubs-R.html
@@ -3,9 +3,9 @@
 </head>
 <body><h3>William W. Cohen's Papers: Retrieval Augmented LMs</h3>
 <ol>
-<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2023): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
+<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2024): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
 </li>
-<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2023): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
+<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2024): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
 </li>
 <li>Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2023): <a href="https://arxiv.org/abs/2308.08661">Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions</a> in progress.<br><ul><li><font size=-1>Following up the 'QA is the new KR' paper, we present a new collection of question-answer pairs automatically generated from Wikipedia which are more specific and ambiiguous than generated questions used in prior work, and show that this can be used to answer ambiguous questions.  On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 10-15%.  The new queston DB can also be used to improve diverse passage retrieval.</font></ul>
 </li>

diff --git a/pubs-n.html b/pubs-n.html
@@ -5,9 +5,9 @@
 <ol>
 <li>Chung-Ching Chang, William W. Cohen, Yun-Hsuan Sung (2023): <a href="https://arxiv.org/abs/2311.10083">Characterizing Tradeoffs in Language Model Decoding with Informational Interpretations</a> in progress.
 </li>
-<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2023): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
+<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2024): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
 </li>
-<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2023): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
+<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2024): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
 </li>
 <li>Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2023): <a href="https://arxiv.org/abs/2308.08661">Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions</a> in progress.<br><ul><li><font size=-1>Following up the 'QA is the new KR' paper, we present a new collection of question-answer pairs automatically generated from Wikipedia which are more specific and ambiiguous than generated questions used in prior work, and show that this can be used to answer ambiguous questions.  On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 10-15%.  The new queston DB can also be used to improve diverse passage retrieval.</font></ul>
 </li>

diff --git a/pubs-s.html b/pubs-s.html
@@ -4,15 +4,15 @@
 <body><h3>Selected and/or recent papers by William W. Cohen</h3>
 <h3>Recent papers: 2024</h3>
 <ol>
+<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2024): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
+</li>
+<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2024): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
+</li>
 </ol>
 <h3>Recent papers: 2023</h3>
 <ol>
 <li>Chung-Ching Chang, William W. Cohen, Yun-Hsuan Sung (2023): <a href="https://arxiv.org/abs/2311.10083">Characterizing Tradeoffs in Language Model Decoding with Informational Interpretations</a> in progress.
 </li>
-<li>Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2023): <a href="https://arxiv.org/abs/2311.04886">SEMQA: Semi-Extractive Multi-Source Question Answering</a> in NAACL-2024.
-</li>
-<li>Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2023): <a href="https://arxiv.org/abs/2308.14903">MEMORY-VQ: Compression for Tractable Internet-Scale Memory</a> in NAACL-2024.
-</li>
 <li>Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2023): <a href="https://arxiv.org/abs/2308.08661">Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions</a> in progress.<br><ul><li><font size=-1>Following up the 'QA is the new KR' paper, we present a new collection of question-answer pairs automatically generated from Wikipedia which are more specific and ambiiguous than generated questions used in prior work, and show that this can be used to answer ambiguous questions.  On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 10-15%.  The new queston DB can also be used to improve diverse passage retrieval.</font></ul>
 </li>
 <li>Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie (2023): <a href="https://arxiv.org/abs/2306.10231">GLIMMER: generalized late-interaction memory reranker</a> in progress.