index.html~

<html>
<head>
<title>William W. Cohen</title>
<!-- script type="text/javascript" src="http://shots.snap.com/snap_shots.js?ap=1&amp;key=f189a52ff115e29092c9f9bb3678047a&amp;sb=1&amp;th=orange&amp;cl=0&amp;si=0&amp;po=0&amp;df=0&amp;oi=0&amp;lang=en-us&amp;domain=wcohen.com"></script -->
<link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body bgcolor="white">


<table>
<tr>
<td>
  <img align=center src="william-at-whiteboard-small.JPG" height="auto" width="75%" alt="Picture of William Cohen">
</td>
<td>
<h2 class="name">William W. Cohen</h2> 

<h3 class="title">Director, Research Engineering, <a href="http://ai.google.com">Google AI</a></h3>

<b>News:</b> I have moved to Google! Starting June 2018 I will be
starting up and leading a new research group in AI/ML that will be
located in Pittsburgh in Google's Bakery Square location.

</td> 
</table>

<table><tr><td>
<p>[</font>
<a class="nav" href="#bio">Bio</a> |
<a class="nav" href="#announce">Announcements and FAQs</a> |
<a class="nav" href="#teach">Teaching</a> |
<!-- <a class="nav" href="#proj">Projects</a> | -->
<a class="nav" href="#pubs">Publications</a> (<a class="nav" href="pubs-s.html">recent</a>, <a class="nav" href="pubs.html">all</a>) |
<a class="nav" href="#sw">Software</a> |
<a class="nav" href="#data">Datasets</a> |
<a class="nav" href="#talks">Talks</a> |
<a class="nav" href="#buddies">Students &amp; Colleagues</a> |
<a class="nav" href="http://wcohen.blogspot.com">Blog</a> | 
<a class="nav" href="#contact">Contact Info</a> |
<a class="nav" href="#misc">Other Stuff</a>
]
<tr><td>Prospective visitors/students: see <a href="#announce">announcements</>
</table>

<h3 class="sec"><a name="bio"></a>Biography</h3 class="sec">

William Cohen is a Director of Research & Engineering at Google AI,
and is based in Google's Pittsburgh office. He received his bachelor's
degree in Computer Science from
<a href="http://www.duke.edu">Duke University</a> in 1984, and a PhD
in Computer Science from <a href="http://www.rutgers.edu">Rutgers
University</a> in 1990.  From 1990 to 2000 Dr. Cohen worked at
AT&T <a href="http://www.bell-labs.com/">Bell Labs</a> and
later <a href="http://www.research.att.com">AT&T Labs-Research</a>,
and from April 2000 to May 2002 Dr. Cohen worked
at <a href="http://www.whizbang.com">Whizbang Labs</a>, a company
specializing in extracting information from the web.  From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the <a href="http://www.ml.cmu.edu">Machine Learning Department</a>,
with a joint appointment in
the <a href="http://www.lti.cs.cmu.edu">Language Technology
Institute</a>, as an Associate Research Professor, a Research
Professor, and a Professor.  Dr. Cohen also was the Director of the
Undergraduate Minor in Machine Learning at CMU and co-Director of the
Master of Science in ML Program.

<p>
Dr. Cohen is a past president of
the <a href="http://www.machinelearning.org/">International Machine
Learning Society</a>.  In the past he has also served as an action
editor for the
the <a href="http://secure.aidcvt.com/mcp/searchresult.asp?INPUT=AI&Type=Pass&PCS=MCP">AI
and Machine Learning</a> series of books published
by <a href="http://www.morganclaypool.com/">Morgan Claypool</a>, for
the
journal <a href="http://pages.stern.nyu.edu/~fprovost/MLJ/"><i>Machine
Learning</i></a>, the
journal <a href="http://www.elsevier.com/locate/artint"><i>Artificial
Intelligence</i></a>, the <a href="http://www.jmlr.org"><i>Journal of
Machine Learning Research</i></a>, and
the <a href="http://www.jair.org"><i>Journal of Artificial
Intelligence Research</i></a>. He was General Chair for
the <a href="http://icml2008.cs.helsinki.fi/">2008 International
Machine Learning Conference</a>, held July 6-9 at
the <a href="http://www.helsinki.fi/university">University of
Helsinki</a>,
in <a href="http://cc.oulu.fi/~thu/personal/Finland.html">Finland</a>;
Program Co-Chair of
the <a href="http://www.autonlab.org/icml2006/home.html">2006
International Machine Learning Conference</a>; and Co-Chair of
the <a href="http://www.cs.rutgers.edu/pub/learning94/learning94.html">1994
International Machine Learning Conference</a>. Dr. Cohen was also the
co-Chair for the <a href="http://www.icwsm.org/2009/index.shtml">3rd
Int'l AAAI Conference on Weblogs and Social Media</a>, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the <a href="http://www.icwsm.org/2010/index.shtml">4rd Int'l AAAI
Conference on Weblogs and Social Media</a>. He is
a <a href="http://www.aaai.org/Awards/fellows-list.php">AAAI
Fellow</a>, and was a winner of the 2008
the <a href="http://www.sigmod.org/sigmod-awards/sigmod-awards#time">SIGMOD
"Test of Time" Award</a> for the most influential SIGMOD paper of
1998, and the
2014 <a href="http://sigir.org/sigir-2014-best-paper-awards/"> SIGIR
"Test of Time" Award</a> for the most influential SIGIR paper of
2002-2004.

<p>

Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets.  He has a
long-standing interest in statistical relational learning and learning
models, or learning from data, that display non-trivial structure.
He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 200 publications.

<!-- <h3 class="sec"><a name="cv">Curriculum vita</cv></h3 class="sec">

<ul>
<li><a href="cv.pdf">My c.v. in PDF.</a>
</ul>

-->

<h3 class="sec"><a name="announce"></a>Announcements and FAQs</h3 class="sec">

<ul>

<li><b>I have moved to Google.</b> After the spring 2018 semester ends,
I will move from CMU to Google.  I will be leading a new research
group in AI/ML that will be located in Pittsburgh in Google's Bakery
Square location.  And case you're wondering - yes, we will be hiring!

<p><a href="http://www.cs.cmu.edu/~mgormley/">Matt Gormley</a> is the
new Director of the Undergraduate Minor in ML.  The new co-Directors
of the MS in ML Program will
be <a href="http://www.cs.cmu.edu/~ninamf/">Nina Balcan</a>
and <a href="http://www.cs.cmu.edu/~rsalakhu/">Ruslan
Salakhutdinov</a>

<p>

<li><b>I'll be an invited speaker at ILP-2018</b> - that's
the <a href="http://ilp2018.unife.it/">28th International Conference
on Inductive Logic Programming</a> on September 2nd - 4th 2018, in
Ferrara, Italy.

<p>

<li><b>I'll be an invited speaker at KR-2018</b> - that's
the <a href="http://reasoning.eas.asu.edu/kr2018/">16th International
Conference on Principles of Knowledge Representation and Reasoning</a>
to be held in Tempe, Arizona (USA) on October 30-November 2, 2018.

<p>

<li><b>Can I visit CMU and work with you?, or can I apply to CMU and
work with you as a grad student?</b>  I will continue to advise my
current students as needed through May 2019, but I will not be taking
any new students or hosting any visitors.

<p>


<li><b>Can I take 10-605 or 10-805 this fall?</b> Yes: 10-605/10-805
  will be taught in fall 2018
  by <a href="http://www.cs.cmu.edu/~bapoczos/">Barnabas Poczos<a>.


<p>

<li><b>What's the difference between 10-601 and ...?</b>
If you're having trouble with the MLD's growing menu of intro ML courses here's 
<a href="https://docs.google.com/document/d/17IP9WLWAE7h6ShEF4CHQFQNFL-u2tJYdgw3wMLK7Xig/edit?usp=sharing">a
draft of a document that explains the differences.</a>
If you're not sure if you're qualified,
the <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Spring_2016#Prerequisites">prereqs
for the course</a> are listed on the course home page, and we're
fairly strict about enforcing them for undergrads. Grad students
should have equivalent experience: good programming skills - the
equivalent of a one-semester college course - and some mathematical
maturity, including prior exposure to calculus, probability and linear
algebra.  If you're not sure about your background there is
a <a href="http://www.cs.cmu.edu/~wcohen/10-601/self-assessment/Intro_ML_Self_Evaluation.pdf">self-assessment</a>
test you can take.
</ul>

<!-- 

<h3 class="sec"><a name="proj">Projects</a></h3 class="sec">

Projects I'm currently involved with include:
<ul>

<li><a href="http://curtis.ml.cmu.edu/gnat/">GNAT is an automatic KB
construction toolkit</a> that has been used to build KBs for several
different domains,
including <a href="http://curtis.ml.cmu.edu/gnat/biomed">consumer
health information</a>
and <a href="http://curtis.ml.cmu.edu/gnat/software">software</a>.

<li><a href="http://rtw.ml.cmu.edu/rtw/">NELL</a> is a web-scale
information extraction system.

</ul>

-->

<!-- <li><a href="http://sites.google.com/site/simstudentprojectweb/">SimStudent</a>, a project that adds learning-by-demonstration to <a href="http://ctat.pact.cs.cmu.edu/">CTAT</a>. -->

<!-- <li><a href="querendipity/">Querendipity</a>, an adaptive personal information management system for biologists. -->

<!-- 
<li><a href="http://boowa.com">SEAL</a>, a Google-Sets-like bootstrapping tool written by my former student, <a
href="http://rcwang.com">Richard Wang</a>. -->

<!--

<li><a href="http://murphylab.web.cmu.edu/services/SLIF2/">SLIF</a>, a system that analyzes the text and images
in online journal articles to find information about the subcellular localization of proteins. -->

<!--
<li><a href="http://teamcohen.github.com/MinorThird/">Minorthird</a>,
an open-source Java package of information extraction software. (Note: we've
migrated the code now from SourceForge to GitHub.)
-->

<h3 class="sec"><a name="sw">Software and demos</a></h3 class="sec">

<!-- 
<b>Demos:</b> 
<ul>


<li>
Measure twice, cut once - <a
href="http://www.cs.cmu.edu/~vitor/">Vitor</a> and <a
href="http://www.cs.cmu.edu/rbalasub">Ramnath</a> have developed a <a
href="http://www.cs.cmu.edu/~vitor/cutonce/cutOnce.html">Thunderbird
plugin</a> that implements <a
href="http://www.cs.cmu.edu/~wcohen/postscript/ecir2008.pdf">recipient
recommendation</a> and <a
href="http://www.cs.cmu.edu/~wcohen/postscript/sdm-2007-leak.pdf">leak
detection</a> for email.  It modifies Thunderbird by adding an
additional pane that pops up after you send a message, giving you one
final chance to fix any errors in your recipient list.  There's a
brief <a href="cutonce.pdf">writeup on how to use it,</a> but it's
pretty self-explanatory: just download it, open Thunderbird, and go to
the tools->addon menu to install.  After you've installed it, you
train by opening your folder of "Sent" mail and pressing the "train"
button.  (This took about an hour for my 9000+ old messages.)

<li>
<a href="http://www.cs.cmu.edu/~nmramesh/">Ramesh
Nallapati</a> has put together two nice demos of his <a
href="http://www.cs.cmu.edu/~wcohen/postscript/topic-tomography-submitted.pdf">multiscale topic tomography</a> topic-modeling technique, one
for articles from <a
href="http://www.cs.cmu.edu/~nmramesh/science_demo/multiscale_home.html">Science</a>,
and one with <a
href="http://www.cs.cmu.edu/~nmramesh/cancer_demo/multiscale_home.html">cancer-related
articles from PubMed</a>.

<li>
Here are two movies that demo SimStudent, a programming-by-demonstration
system for constructing cognitive tutors, built by <a href="http://www.cs.cmu.edu/~mazda/">Noboru Matsuda</a>.
  <ul>
    <li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Interactive/2x+3_5.mov">Interactive mode</a> (solves problems proactively, as way of posing queries)</li> 
    <li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Non-interactive/3x_9.mov">Non-interactive mode</a></li>
  </ul>

</ul>

-->


<ul>
<li><a href="https://github.com/TeamCohen/TensorLog/wiki">TensorLog is
a probabilistic first-order logic which is fully differentiable.
<li><a href="https://github.com/TeamCohen/ProPPR/wiki">ProPPR</a> is an older
"locally groundable" probabilistic first-order
logic. 
<li><a href="https://github.com/TeamCohen/GuineaPig">Guinea Pig</a> is
a pure Python workflow language for Hadoop.

<p>


<li>Bhuwan Dhingra is
distributing <a href="https://github.com/bdhingra/ga-reader">an
updated version of the Gated Attention Reader</a> via Github.  As of
Dec 2016 the GA Reader is obtaining state-of-the-art results on
several of the standard benchmarks for answering cloze questions.

<li>Here is <a href="http://www.cs.cmu.edu/afs/cs/Web/People/dmovshov/software.html">a comment-completion Plugin for Eclipse</a>, from Dana Movshovitz-Attias.
<li>Here is <a href="https://github.com/rbalasub/jigsaw.git">Ramnath Balasubramanyan's BlockLDA</a> code, as well as some of the other algorithms from his thesis, is available on GitHub.
<li>Code for <a href="http://www.cs.cmu.edu/~nlao/code/2010.pra.gz">Ni
Lao's PRA method</a> (described in
our <a href="http://www.cs.cmu.edu/~wcohen/postscript/ecml-2010-ni.pdf">ECML
paper</a>) is available.
<li>
<a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page contains
<ul>
<li>the <a href="http://www.cs.cmu.edu/~frank/code/icml2010-code.zip">code</a>
for power iteration clustering (the algorithm described in our
ICML-2010 paper) as well as
the <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">datasets</a>
we used in the experiments.
<li>the <a href="http://www.cs.cmu.edu/~frank/code/asonam2010-code.zip">code</a>
for MultiRandomWalk (the semi-supervised learning algorithm described in our
ASONAM-2010 paper) as well as
the <a href="http://www.cs.cmu.edu/~frank/data/anonam2010-data.zip">datasets</a>
we used in those experiments.
</ul>

<p>

<li><a href="http://minorthird.sourceforge.net">Minorthird</a> is an
open-source Java package of information extraction and text
classification learning tools.  This package is stable but not being actively maintained.
  <ul><li>
  I there is also a standalone tool, built on Minorthird, for
annotating biomedical text.  This is particularly aimed at annotating
figure captions but might be useful for other text as well.  The <a
href="slifTextComponent-v1.0.jar">jar file</a> for this is rather large
(17M), as it includes a Minorthird jar.  There is <a
href="SlifTextComponent.html">documentation available</a> for this,
and some <a href="captions.tgz">sample data</a>.

  <li>
My former student Vitor Carvalho distributes the poetically named <a
href="http://www.cs.cmu.edu/~vitor/codeAndData.html">Jangada</a> and
<a href="http://www.cs.cmu.edu/~vitor/codeAndData.html">Ciranda</a>,
which are also standalone apps built on top of Minorthird, to analyze
email messages.
  </ul>

<li>
<a href="http://secondstring.sourceforge.net">SecondString</a> is
another open-source Java package, of approximate string matching
techniques.
  <ul><li>SecondString includes a jar for part of an ancient version 
  of Minorthird.  For those that are interested in <a href="radar.tgz">the source behind 
  the mysterious cls.jar</a>, here it is.
  </ul>

<!---

<li><a href="slipper/">SLIPPER</a> and <a href="whirl/">WHIRL</a> are
now being distributed via Rutgers University.  They are free for research
purposes.

--->

<li><a href="slipper-linux.tgz/">SLIPPER</a> is an old old
rule-learning system Yoram Singer and I developed.  This code is
provided with absolutely no warranty, promise of support, or really,
any expectation that it will keep working.  You are totally on your
own with this one, friend.  

<li>WHIRL is another old system I wrote.  Currently, I am not
distributing it, but ask me if you're interested in reviving the
source code.

<li>To get a copy of RIPPER, please send mail to my evil twin brother,
wcohen -AT- gmail.com.  As an alternative to that ancient code: I haven't used it myself, but
I've heard good things about
J-RIP, a Ripper clone written for WEKA.
</ul>

<h3 class="sec"><a name="data">Datasets</a></h3 class="sec">

The following datasets are available for anyone to use for research
purposes:
<ul>

<li>Zhilin Yang is
distributing <a href="http://kimi.ml.cmu.edu/qa_ssl/">the data from our
ACL-2017 paper on semi-supervised QA<a>.

<li>Lidong Bing has
 distributed <a href="http://www.cs.cmu.edu/~lbing/#Datasets">two
 datasets from our joint work</a>: the data used in our EMNLP 2015
 paper, Improving Distant Supervision for Information Extraction Using
 Label Propagation Through List, and also the dataset used in our AAAI
 2016 paper, Distant IE by Bootstrapping Using Lists and Document
 Structure.  The <a href="http://curtis.ml.cmu.edu/gnat/biomed">data
 extracted by this system can also be browsed</a>.


<li>Ni Lao has distributed the labeled data from our EMNLP 2010 paper,
Random Walk Inference and Learning in A Large Scale Knowledge Base,
both <a href="http://www.cs.cmu.edu/~nlao/data/publish.amt.labels.tar.gz">Turker-labeled
data</a>
and <a href="http://www.cs.cmu.edu/~nlao/data/publish.distant.supervision.tar.gz">NELL
pseudo labels</a>.

<li><a href="http://rtw.ml.cmu.edu/wk/coordterm/syntactic/">Coordinate
terms extracted from a MALT-parsed corpus with 230B sentences</a>,
produced by Malcolm Greaves. (Corpus is ClueWeb 2009, Wikipedia from
November 2011, Project Gutenberg, and Citeseer.)

<li><a href="CrowdComp_MTurkData.tar.gz">Data sets</a> for my paper
"Crowdsourced Comprehension: Predicting Prerequisite Structure in
Wikipedia" with Partha Talukdar from BEA-2012.

<li><a href="http://rtw.ml.cmu.edu/wk/WebSets/wsdm_2012_online/index.html">Collections
of HTML Tables, hyponyms, as well as extracted entity clusters and MLT
evaluations</a>, all associated with
<a href="http://www.cs.cmu.edu/afs/cs/Web/People/bbd/">Bhavana
Dalvi</a>'s paper
on <a href="postscript/wsdm-2012-bdd.pdf">WebSets</a> from WSDM-2012.

<li>The <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">network
datasets</a> used in the experiments of our ICML-2010 paper
are on <a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page.

<li>
<a href="all-bibdata.tgz">100,000+ bibliography entries</a>, in the original BibTeX format, converted to an EndNote-like format, and in a featurized format, for experiments with matching (60M).

<li>
<a href="http://yeast.ml.cmu.edu/nies/data/icwsm_gene_paper_author_firstAuthorCitations.1950_2007.ghirl.zip">A 56k-node, 200k-edge graph containing data from SGD and PubMed</a>, used in Querendipity.

<li><a href="http://www.cs.cmu.edu/~vitor/codeAndData.html">617
messages from 20 Newsgroups, annotated for reply bodies and
signatures</a>, prepared by my former student <a
href="http://www.cs.cmu.edu/~vitor">Vitor Carvalho</a>

<li><a href="http://www.cs.cmu.edu/~einat/datasets.html">
Two subsets of the Enron data, annotated with person names</a>,
prepared by my student <a href="http://www.cs.cmu.edu/~einat">Einat
Minkov</a>.

<li><a href="http://www.cs.cmu.edu/~enron">Enron email dataset</a>
(400Mb, once you get there) contains 800,000+ emails from 150 users+
organized into 4700+ folders.

<li><a href="doj-email.xls">Some more email data</a>: about two
thousand messages released to the public as part of the ongoing <a
href="http://en.wikipedia.org/wiki/Bush_White_House_e-mail_controversy">investigation
of US Attorney firings at the Dept of Justice</a>.  This is very
strange data---the original email is released as scanned printouts in
PDF (?!), so most of the text is not available.  There are links to
copies of the PDF, some manually added annotations, and a (apparently
manually-reconstructed) social network graph.  About 1.5Mb (in Excel
format).  From <a
href="http://www.dailykos.com/storyonly/2007/5/21/12120/5682">Mark
Johnson, and a network of volunteers.</a>


<li><a href="repository.tgz">A collection of various extraction datasets
in Minorthird format</a> (6Mb), including about 1000 Enron emails tagged
for person names and temporal expressions.

<li><a href="classify.tar.gz">classify.tar.gz</a> (0.4Mb) contains
nine problems in which the goal is to classify short entity names.
This data was used in <i>Joins that Generalize: Text Classification
Using WHIRL</i> (KDD-98).

<li><a href="ranking-data.tar.gz">ranking.tar.gz</a> (8Mb) contains the
data used for the meta-search experiments in my JAIR paper <a
href="http://www.jair.org/abstracts/cohen99a.html">Learning to Order
Things</a> (with Rob Schapire and Yoram Singer).

<li><a href="match.tar.gz">match.tar.gz</a> (0.7Mb) contains a suite of
<i>labeled</i> entity-name matching and clustering problems
(i.e. problems for which the correct matches/clusters are provided),
in a single consistent format. In most cases WHIRL's performance is
given as a benchmark. (These are also distributed in the <a
href="http://www.cs.utexas.edu/users/ml/riddle/data.html">RIDDLE
Repository</a>.  Extraction-oriented versions of some of this data are
available on the <a
href="http://www.isi.edu/info-agents/RISE/repository.html">RISE
Repository</a>. (I.e., represented as a problem of extracting data from
a website, rather than matching two datasets).)

<li><a href="whirl-bench.tgz">whirl-bench.tgz</a> (1.1Mb) contains some
more WHIRL-format entity name matching problems.
</ul>

<h3 class="sec"><a name="talks">Talks and presentations</a></h3 class="sec">

<p>
<ul>

<li><a href="declarative-learning-workshop-2018.pptx">An invited talk
given at Third International Workshop on Declarative Learning Based
Programming</a> (DeLBP), at AAAI-2018.

<li><a href="snl-2017.pptx">An invited talk given at SNL-2017</a> (the 1st International Workshop on Symbolic-Neural Learning) in July 2017.

<li><a href="wakbc-2016.pptx">An invited talk given at WAKBC-2016</a> in June 2016.

<li>Tutorial on statistical relational learning given at NAACL 2016 with
William Wang (a shorter version of this was also presented at IJCAI 2016):
<ul>
<li><a href="naacl-2016-talk1-final.ppt">Part 1 - overview on logic, probability, MLNs, and probabilistic DDBs</a>
<li><a href="naacl-2016-talk2-final.pptx">Part 2 - ProPPR and applications</a>
<li><a href="naacl-2016-talk3-final.ppt">Part 3 - TensorLog, and other recent and current work</a>
</ul>


<li>Series of three lectures on probabilistic logic programs given at
Singapore Management University in Feb 2016:
 <ul>
   <li><a href="smu-2016-talk1.pptx">Background on logic and probabilistic models</a></li>
   <li><a href="smu-2016-talk2.pptx">Parameter learning and structure learning in ProPPR</a></li>
   <li><a href="smu-2016-talk3.pptx">Joint learning in ProPPR and comparing to neural approaches</a></li>
 </ul>

<li><a href="aaai-ss-2015.ppt">Can KR Represent Real-World Knowledge?</a>, invited talk given March 2015
at the AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches

<li><a href="nlu-2014.ppt">Learning to Reason with Extracted Information</a>, keynote talk given March 2014
at Google's Natural Language Understanding Workshop, Zurich, Switzerland.


<li><a href="ilp-2013.ppt">Learning to Construct and Reason with a
Large KB of Extracted Information</a>, invited talk given August 2013
at the Inductive Logic Programming Conference, in Rio de Janeiro,
Brazil.

<li><a href="aaai-fs-2012.ppt">Reasoning With Data Extracted from The Biomedical Literature</a>,
invited talk at a joint session of the AAAI Fall Symposia on Discovery Informatics, and
Information Retrieval and Knowledge Discovery in Biomedical Text.

<li><a href="cikm-2012.ppt">Learning Similarity Relations Based on Random Walks in Graphs</a>,
invited talk at CIKM 2012, October, 2012.
  <ul>
    <li>Earlier version of talk:<a href="mlg-aug-2011.ppt">Learning Relationships Defined by
	Linear Combinations of Constrained Random Walks</a>, invited talk at
      the <a href="http://www.cs.purdue.edu/mlg2011/">9th Workshop on
	Machine Learning and Graphs</a>, San Diego, CA, Aug 2011.
  </ul>

<li><a href="lti-colloq-2012.ppt">Fast Effective Clustering for Graphs and Documents</a>, given at CMU's LTI Colloquium Feb 10, 2012.
 <ul>
  <li>Earlier versions given
  at <a href="FastEffectiveClustering-v2.ppt">Virginia Tech in April
  2010</a> and
  <a href="FastEffectiveClustering.ppt">University of Pennsylvania
  in Feb 2010.</a>
 </ul> 

<li><a href="psc-11-cohen.ppt">Learning to Extract a Broad-Coverage
Knowledge Base from the Web</a>, invited talk at the Symposium on
Data-Intensive Analysis, Analytics, and Informatics, Pittsburgh, PA Apr 2011.

<li><a href="nfais-11-cohen.ppt">Open Information Extraction Methods:
Computers that Learn to Read</a>, invited talk at National Federation
of Advanced Information Services (NFAIS), Philadelpha, PA, Feb 2011.

<li><a href="umd-sep-2010.ppt">Learning Proximity Relations Defined by
Linear Combinations of Constrained Random Walks</a>, given at a
seminar at the University of Maryland in Sep 2010.

<li><a href="block-lda-icml-ws-2010.ppt">Modeling Entity-Entity Links
and Entity-Annotated Text</a>, given at the ICML 2010 Workshop on
Topic Modeling.

<li><a href="MSM-2009.ppt">Predictively Modeling Social Media</a>,
invited talk given at
<a href="http://www.socialgamingplatform.com/msm09/">the 1st International Workshop on Mining Social Media</a>, co-located with 13th Conference of the Spanish Association for Artificial Intelligence (CAEPIA-TTIA 2009). 

<li><a a href="IIWeb.ppt">Matching and clustering product descriptions
using learned similarity metrics</a>, invited talk given at
<a href="http://research.ihost.com/iiweb09/index.html">the IJCAI 2009 Workshop on Information Integration on the Web</a>, July 2009. (Powerpoint; 6.7M) 

<li>Open information extraction talks:
   <ul>
    <li><a href="openIE-spain-2009.ppt">Graph-Based Methods for Open Information Extraction</a>, talk given at Nov 2009 at MAVIR in Madrid, Spain.

    <li><a href="openIE-2009.ppt">Graph-Based Methods for Open Information Extraction</a>, earlier version of talk given at Stanford and Google March 2009. 

    <li><a href="nips-graph-ws-2008.ppt">Graph-Based Methods for Open Information Extraction</a>, 
  still earlier version of the same talk given at a 2008 NIPS workshop.
    <li>A <a href="nipsgraphs2008_workshop_skit.mov">QT video of highlights</a> from the workshop talks, including an incisive technical question addressed to me from my colleague <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>.</li>
  </ul>

<li><a href="sigmod-08.ppt">Embodied Cognition and Knowledge:
Integration of Heterogeneous Databases without Common Domains Using
Queries Based on Textual Similarity</a>, talk given for my 10-year
"Test of Time" Award at <a
href="http://www.sigmod08.org/">SIGMOD-2008</a>(Powerpoint; 11Mb)</li>

<li><a href="linkedData-2008.ppt">Using Machine Learning to Discover
and Understand Structured Data</a>, invited talk given at <a
href="http://www.linkeddataplanet.com">LinkedData
2008</a>. (Powerpoint; 6Mb)</li>

<li><a href="icmla-2007.ppt">Machine Learning for Personal Information
Management</a>, invited talk given at <a
href="http://www.icmla-conference.org/icmla07/icmla07.html">ICMLA-2007</a>. (Powerpoint; 8Mb)</li>

<li><a href="iqis.ppt">A Framework for Learning to Query Heterogeneous Data</a>,
invited talk given at <a href="http://queens.db.toronto.edu/iqis2006/">IQIS 2006</a>. (Powerpoint; 8Mb)</li>

<li><a href="dbirday-06.ppt">On Beyond Hypertext: Searching in Graphs
Containing Documents, Words, and Actual Data</a>, invited talk given
at <a href="http://dbirday2006.rutgers.edu/">DB/IR Day 2006.</a> (Powerpoint; 6Mb)</li>

<li><a href="webdb-talk.ppt">A Century Of Progress On Information
Integration: A Mid-Term Report</a>, an overview of information
integration</a>, focusing modestly on my own work, given as invited
talk at <a
href="http://webdb2005.uhasselt.be/">WebDB-2005</a>. (Powerpoint;
12Mb)</li>


<p>
<li>Tutorials:
<ul>

  <li><a href="ie-survey.ppt">Information extraction</a> (PowerPoint;
  4.8Mb), aimed at folks somewhat familiar with statistical NLP
  methods.  And thanks to Thierry Poibeau, there's also a version <a
  href="http://www-lipn.univ-paris13.fr/~poibeau/cours/fr_cohen_ie_tutorial.ppt"><i>en francais</i></a> (did I get that right, Thierry?) 
  Also, two earlier versions of this are also still around, both
  given with Andew McCallum at recent conferences, <a
  href="kdd2003-tutorial.ppt">KDD-2003</a>(PowerPoint; 6.8Mb) and <a
  href="nips-ie-tutorial.ppt">NIPS-2002</a>.

  <li><a href="text-cat-tutorial.ppt">Text classification</a>
  (PowerPoint; 3Mb), given at a CALD Summer Course.

  <li><a href="collab-filtering-tutorial.ppt">Collaborative
  filtering</a> (PowerPoint; 9.1Mb), given at a DIMACS workshop.

</ul> 


<p>
<li>A mini-course on record linkage and matching:
  <ul>
  <li><a href="Matching-1.ppt">Overview of record linkage methods</a>(PowerPoint; 250kb).
  <li><a href="Matching-2.ppt">Overview of distance metrics for strings</a>(PowerPoint; 530kb).
  <li><a href="Matching-3.ppt">Overview of using HMMs for normalizing
text in record linkage tasks</a>(PowerPoint; 640kb). <br>
  It's not a presentation, but I have also put together a <a
  href="matching/">short annotated bibliography of record linkage and
  matching papers</a>.

  <li>William Hayes has a nice summary of <a href="http://blog.williamhayes.org/2012/07/string-similarity.html">an extended discussion
      of string-matching tools</a> on the BioNLP mailing list (July 2012).
  </ul>

  
<p>
<li>Other technical talks:
<ul>
  <li><a href="ijcai-2005.ppt">A presentation of my IJCAI-2005 results</a>
on "stacked sequential learning", presented in Edinburgh in August, 2005.
  <li><a href="nips-2002.ppt">A presentation of my NIPS-2002 results</a>
on using bootstrapping techniques to improve web page classification,
given at CMU in October 2002. (PowerPoint; 3.2mb).
  <li><a href="www-2002.pdf">A presentation of my WWW-2002 results</a>
on wrapper learning,
presented in April 2002. (PDF; 170kb).
  <li><a href="whirl-talk.pdf">An overview of experiments with WHIRL.</a> (PDF; 800kb).
</ul>
</ul>


<h3 class="sec"><a name="teach">Teaching</a></h3 class="sec">

<ul>
<li>Spring 2018: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-405_in_Spring_2018">Undergraduate Level Machine Learning with Large Datasets</a>, 10-405, Mon-Wed 3:30-4:20 in GHC 4307

<li>Fall 2018: 10-605/10-805 will be taught
  by <a href="http://www.cs.cmu.edu/~bapoczos/">Barnabas Poczos<a>
</ul>
</ul>

Past courses:
<ul>
<li>Fall 2017: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2017">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tues-Thus 1:30-2:50pm, PH 100.
<li>Fall
2016: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016">Machine
Learning with Large Datasets, 10-605 and 10-805</a>, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
<li>Spring 2016: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Spring_2016">Machine Learning 10-601</a>, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
<li>Fall 2015: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2015">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
<li>Spring 2015: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2015">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tu-Thu 10:30-11:50am in BH A51
<li>Fall 2014: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Fall_2014">10-601 Machine Learning</a>, Tu-Thu 1:30-2:50, Wean 7500
<li>Spring 2014: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2014">10-605 Machine Learning with Large Datasets</a>, Mon-Wed 1:30-2:50,  Dougherty Hall 1112
<li>Fall 2013: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Fall_2013">10-601 Machine Learning</a>, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
<li>Spring 2013: <a href="http://malt.ml.cmu.edu/mw/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2013">Machine Learning with Large Datasets</a>, Mon-Wed 1:30-2:50, 4307 GHC 
<li>Fall 2012: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Fall_2012">ML 10-802 and LTI 11-772 (Analysis of Social Media)</a>, 10:30-11:50pm Tues &amp; Thus, 4303 Gates Building.
<li>Fall 2012: <a
   href="http://www.cs.cmu.edu/~journalclub">10-915, the MLD Journal Club</a>, 12-1:20pm Tue &amp; Thu, 4101 Gates Building (with Roy Maxion).
<li>Spring 2012: <a href="http://malt.ml.cmu.edu/mw/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2012">Machine Learning with Large Datasets</a>, Tues-Thurs 1:30-2:50pm, NSH 1305
<li>Fall 2011: <a
href="http://malt.ml.cmu.edu/mw/index.php/Structured_Prediction_10-710_in_Fall_2011">Structured
Prediction for Language and Other Discrete Data (SPLODD-2011)</a>, ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2010">Information
Extraction</a> and some from <a
href="http://www.cs.cmu.edu/~nasmith/LS2">Language and Stats 2</a>.  A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
<li>Spring 2011: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Spring_2011">ML 10-802 and LTI 11-772 (Analysis of Social Media)</a>, 10:30-11:50pm Tues &amp; Thus, 4303 Gates Building.  
<li>Spring 2011: <a
href="https://docs.google.com/document/pub?id=1-XEqDHRCiikdPj-LWiYSPjxvxYyFoJ0lNPbt6ym4VZw">10-915, the MLD Journal Club</a>, 3-4pm Mon &amp; Wed, 4101 Gates Building.
<li>Fall 2010: <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2010">10-707
(Information Extraction - cross-listed in LTI as 11-748)</a>,
1:30-2:50pm Mon &amp; Wed, Gates 4101.  The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
courses.
<li>Spring 2010: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Spring_2010">10-802 (Analysis of Social Media)</a>.
<li>Fall 2009: <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2009">10-707
(Information Extraction)</a>, 1:30-2:50pm Mon &amp; Wed, 5222 Gates
Building.  
<li>Spring 2008: <a
href="http://www.cs.cmu.edu/~tom/10601">10-601 (Machine Learning)</a>
with <a href="http://www.cs.cmu.edu/~tom">Tom Mitchell</a>, on 3-4:30
Mon &amp; Wed in Wean Hall 5409.  
<li>Fall 2007: <a href="10-802/fixed/Main_Page.html">Analysis of Social
Media</a>, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course.  4:30-6:30
Tuesdays in Wean Hall 4623.
  <ul><li>Note: This site is the shattered remains of a once-beautiful wiki,
  created by the students of 10-802, generously hosted for free by
  <a href="http://scribblewiki.com">ScribbleWiki</a>, tragically lost (due
  a combination of RAID drive failures and low-bidder backup schemes),
  and then largely recovered using 
  <a href="http://warrick.cs.odu.edu">Warrick</a>
  from various internel caches and archives.
  </ul>
<li>Fall 2007: <a href="http://www.compbio.cmu.edu/Jclub/">Current Topics
in Computational Biology (Journal Club)</a>, 02-701. (<a
href="02-701/">Announcements</a>). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell &amp; Systems Modeling).
<li>Spring 2007: <a href="10-707">Information Extraction</a>, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
<li>Fall 2006: <a href="http://www.compbio.cmu.edu/Jclub/">Current Topics in Computational Biology (Journal Club)</a>, 02-701.
(<a href="02-701/">Announcements</a>)
<li>Spring 2006: <a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-21/www/index.html">Read the Web</a>, CALD 10-709.
<li>June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
<ul>
<li><a href="day1.tgz">Slides, notes, and sample files from first
day's lecture</a>.
<li><a href="day2.tgz">Slides, notes, and sample files from second
day's lecture</a>.
<li><a href="day3.ppt">Powerpoint slides from third
day's lecture</a>.
<li><a href="minorthird.jar">Jar file for minorThird</a>, if you
only want to run the code, not compile it or read it.
The installation process here is:
  <ol>
  <li>Install Java 1.4 or higher (actually, JRE is all you need).
  <li>Download the <a href="minorthird.jar">jar for minorThird</a>
    and stick it in some directory.
  <li>Optionally, download the <a href="repository.tgz">sample data
  repository</a> and unpack it into the same directory.
  <li>Change to that same directory and
  then run Minorthird with the command <br>
  <code>java -Xmx500M -jar minorthird.jar</code>

  <p>
  What will pop up will be a small launch pad that can be used to 
  start any of the UI programs.  You can also start a particular
  main by specifying minorthird.jar as your classpath, for 
  instance: <br>

  <code>java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help</code>
  </ol>

<li>If you want to do a real install here's the <a
href="http://minorthird.sourceforge.net">home page on Sourceforge</a>, and
a document on <a href="10-707/QUICKSTART.txt">how to do a CVS
install Minorthird</a>.
</ul>

<li>Spring 2004: <a href="10-707/index-2004.html">"Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration"</a>, CALD 10-707 and LTI 11-748.
</ul>


<h3 class="sec"><a name="pubs">Publications</a></h3 class="sec">

<ul>
<li> Here's an <a href="pubs-atom.xml">RSS feed of my papers</a>. (Note: the feed I had created with Dapper seems spam-infested now.)
Here's a pointer to <a href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cohen:William_W=.html">my DBLP page</a>.

<li><i>A Computer Scientist's Guide To Biology</i> is no longer
available from this web page, but is now <a
href="http://www.springer.com/west/home/generic/search/results?SGWID=4-40109-22-173702304-0">available from Springer</a>.  Here is a <a
href="GuideToBiology-sampleChapter-release1.4.pdf">the TOC,
introduction, index, and a sample chapter</a>, from a late draft of
the book; and also <a
href="GuideToBiology-pictures-color-release1.5.ppt">all the figures
from the book in PowerPoint</a> and <a
href="GuideToBiology-pictures-color-release1.5.pdf">all the figures in
PDF</a>.  (The figures are a little prettier than the ones in the
final book, which is black and white, not color).


<li><a
href="http://shop.omnipress.com/index.asp?PageAction=VIEWPROD&ProdID=33">ICML
2006 Proceedings</a> are available in print, for the true afficianado
of fine learning-related research.  It's well worth the money for the
cover art alone (of course, all the papers are also available <a
href="http://www.autonlab.org/icml2006/technical/accepted.html">on-line
for free</a>.)

<li><a href="pubs-s.html">Recent and selected publications</a>.  These
are some representative publications for which on-line copies can be
distributed.

<li><a href="pubs.html">All publications</a>. Here is an more-or-less
complete chronological list of my publications.  The bibliography
includes pointers to on-line versions when I can provide them, but
unfortunately copyright restrictions don't allow me to make all of my
publications available on-line.  Of course, reprints are always
available from me on request.

<li>Publications by topic:<img 
src="cover.png" height=200 width=150 align="right"/><img height=200 src="icml-cover.png" align="right"/>
  <ul>
   <li><a href="pubs-m.html">Matching/Data Integration</a>
   <li><a href="pubs-t.html">Text categorization</a>
   <li><a href="pubs-x.html">Information Extraction</a>
   <li><a href="pubs-r.html">Rule Learning</a>
   <li><a href="pubs-c.html">Collaborative Filtering</a>
   <li><a href="pubs-a.html">Applications</a>
   <li><a href="pubs-f.html">Formal Results</a>
   <li><a href="pubs-i.html">Inductive Logic Programming</a>
   <li><a href="pubs-e.html">Explanation-Based Learning</a>
  </ul>
</ul>

Recent papers I'm keeping in HTML or PDF (which requires <a
href="http://www.adobe.com/prodindex/acrobat/readstep.html">Adobe
Acrobat Reader</a> to view).  Older papers are mostly in Postscript.
For Windows, I use the <a
href="http://www.cs.wisc.edu/~ghost/gsview/">GSView</a> reader for
postscript.  Most of these papers are viewable in several formats in
<a href="http://www.researchindex.com">ResearchIndex</a>.

<h3 class="sec"><a name="buddies">Students and other colleagues</a></h3 class="sec">

<!-- Other: -->

<ul>
<li>
<a href="http://www.cs.cmu.edu/~krivard/">Katie Rivard Mazaitis</a>, research programmer/analyst
</ul>

<!-- Students: -->
<ul>
<li><a href="https://sites.google.com/site/rosecatherinek/home">Rose Catherine Kanjirathinkal</a>, LTI PhD student.
<li><a href="http://kimiyoung.github.io/">Zhilin Yang</a>, LTI PhD student, co-advised with Ruslan Salakhutdinov.
<li><a href="http://www.cs.cmu.edu/~bdhingra/">Bhuwan Dhingra</a>, LTI PhD student, co-advised with Ruslan Salakhutdinov.
<li><a href="http://www.cs.cmu.edu/~yifengt/">Yifeng Tao</a>, CMU Comp Bio PhD student, co-supervised with Xinghua Lu.
<li><a href="http://www.cs.cmu.edu/~fanyang1/">Fan Yang</a>, MLD PhD student.
<li>Daniel Spokoyny, LTI PhD student, co-supervised with Taylog Berg-Kirkpatrick.
<p>
<li>Haitian Sun, MLD MS student.
<li><a href="https://andy-jqa.github.io/">Qiao Jin</a>, School of Medicine, Tsinghua University 
</ul>

Alumni:

<ul>
<li><a href="http://www.cs.cmu.edu/~yww/">William Yang Wang</a> (former LTI PhD student, now at UCSB).
<li><a href="http://www.cs.cmu.edu/afs/cs/Web/People/dmovshov/">Dana Movshovitz-Attias</a> (former CSD PhD student,
 now at Google).
<li><a href="http://www.cs.cmu.edu/afs/cs/Web/People/bbd/">Bhavana Dalvi Mishra</a> (former LTI PhD student
(co-advised with <a href="http://www.cs.cmu.edu/~callan/">Jamie Callan</a>, now at AI2)
<li><a href="http://www.cs.cmu.edu/~taey/">Tae Yano</a>, (former LTI
PhD student, co-advised
with <a href="http://www.cs.cmu.edu/~nasmith/">Noah Smith</a>, now at Microsoft)
<li><a href="http://www.cs.cmu.edu/~nli1">Nan Li</a>, (former CSD PhD
student, co-advised
with <a href="http://pact.cs.cmu.edu/koedinger.html">Ken
Koedinger</a>, now at D. E. Shaw)
<li><a href="http://www.cs.cmu.edu/~rbalasub/">Ramnath Balasubramanyan</a>, (LTI PhD student, now at Twitter)
<li><a href="http://www.cs.cmu.edu/~maheshj/">Mahesh Joshi</a>, (former LTI PhD student,
co-advised with <a href="http://www.cs.cmu.edu/~cprose/">Carolyn Ros&eacute;</a>, now at EBay)
<li><a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>, (former LTI PhD student, now at AirBnB)
<li><a href="http://www.cs.cmu.edu/~nlao/">Ni Lao</a> (former LTI PhD student, now at Google)
<li><a href="http://www.cs.cmu.edu/~rcwang">Richard C. Wang</a>,
(former LTI PhD student co-advised with <a
href="http://www.cs.cmu.edu/~ref/">Bob Frederking</a>, now at Baidu).
<li><a href="http://www.cs.cmu.edu/~aarnold/">Andrew Arnold</a>
(former MLD PhD student, now at Point 72 Asset Management)
<li><a href="http://www.cs.cmu.edu/~einat">Einat Minkov</a>
(former LTI PhD student, now at Haifa University)
<li><a href="http://www.cs.cmu.edu/~vitor">Vitor Rocha de Carvalho</a> (former LTI PhD student, now at QualComm)
<li><a href="http://www.cs.cmu.edu/~woomy/">Zhenzhen Kou</a> (former MLD PhD student, now at Google)

<p>

<li>Ezra Winston, MLD Master's student.
<li>Lanxio (Karen) Xu, MLD Master's student.
<li>Yuxing Zhang, MLD Master's student.
<li>Jakob Bauer, MLD 5th-year Master's student
.<li>Kavya Srinet, MCDS Master's student.
<li>Bhawna Juneja, MCDS Master's student.
<li>Tom Shen, CMU CSD undergrad
<li>Yu-Hsin Allen Kuo</a>, LTI MLT student, formerly co-advised with <a href="http://www.cs.cmu.edu/~nmiskov/Natasas_website/Home.html">Natasa Miskov-Zivanov</a>
<li>Rahul Goutam</a>, former LTI MLT student, co-advised with <a href="http://www.cs.cmu.edu/~nmiskov/Natasas_website/Home.html">Natasa Miskov-Zivanov</a>
<li><a href="https://plus.google.com/102262489142071513958/posts">Malcolm Greaves</a>, former CSD master's student.
<li><a href="http://www.cs.cmu.edu/~eairoldi">Edoardo Airoldi</a>
(former MLD/Stats PhD student, co-advised with <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>)
<li><a href="http://www.csie.ncu.edu.tw/~chia/">Ja-Hui Chang</a>
(visiting faculty from National Central University, Taiwan, 2007-2008)
<li>Wen Haw Chong (PhD student at Singapore Management University,
visted CMU in 2015-2016).
<li><a href="http://www2.sis.smu.edu.sg/students/phd/class10/10_hoang_tuananh.asp">Tuan
Ahn Hoang</a>, (PhD student at Singapore Management University,
visited CMU for 2012-2013 academic year in my group).
<li><a href="http://freddychua.com/">Freddy
Chong Tat Chua</a> (PhD student at Singapore Management University,
visited CMU for the academic year 2011-2012 in my group.)
<li><a href="http://www.optimizelife.com/">Gustavo Lacerda</a>
(former research assistant, co-supervised with Noboru Matsuda and Ken Koedinger, now at UBC)
<li><a href="http://www.cs.cmu.edu/~lbing/">Lidong Bing</a>, former
postdoc, now at Tencent.
<li><a href="https://sites.google.com/site/rameshnallapati/">Ramesh Nallapati</a>
(former postdoc, co-supervised with <a
href="http://www.cs.cmu.edu/~lafferty/">John Lafferty</a>, now at IBM Watson)
<li><a href="http://www.cs.cmu.edu/~mazda">Noboru Matsuda</a>
(former postdoc, co-supervised with <a href="http://pact.cs.cmu.edu/koedinger.html">Ken Koedinger</a>,
now System Scientist in CMU's HCII)
<li><a href="http://www.cs.cmu.edu/~pradeepr">Pradeep Ravikumar</a>
(former MLD PhD student, co-advised with <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>)

<!-- External members -->

<p>

<li>I have been an external committee member for the PhD theses of 
<ul>
<li><a href="http://mcsp.wartburg.edu/zelle/">John Zelle</a> (degree
from U Texas)
<li><a href="http://research.microsoft.com/en-us/um/people/mbilenko/">Misha
Bilenko</a> (from U Texas)
<li><a href="http://www-users.cs.york.ac.uk/~kudenko/">Daniel Kudenko</a>
(Rutgers)
<li>Chumki Basu (Rutgers)
<li>Ananlada Chotimongkol (CMU)
<li>Wei-Hao Lin (CMU)
<li>Cenk Gazen (CMU)
<li>David Nadeau (U Ottowa)
<li><a href="http://cs.cmu.edu/~htong">Hanghang Tong</a> (CMU)
<li>Ben van Durme (Rochester)
<li><a href="http://www.cis.upenn.edu/~partha/">Partha Talukdar</a> (U Penn)
<li><a href="http://www.cs.cmu.edu/~acarlson/">Andy Carlson</a> (CMU)
<li><a href="http://www.cs.cmu.edu/~hyifen/">Yifen Huang</a> (CMU)
<li><a href="http://www.cs.pitt.edu/~swapna/Main.html">Swapna Sundaran</a> (U
Pitt)</a> 
<li><a
href="http://www.cs.cmu.edu/~mheilman/">Michael Heilman</a> (CMU) 
<li><a
href="http://www.cs.cmu.edu/~jelsas/">Jon Elsas</a> (CMU) 
<li><a href="http://www.cs.cmu.edu/~dipanjan/Home.html">Dipanjan Das</a> (CMU)
<li><a href="http://www.cs.cmu.edu/~fanguo/">Fan Guo</a> (CMU) 
<li><a href="http://www.andrew.cmu.edu/user/jdiesner/">Jana Diesner</a> (CMU)
<li><a href="http://freddychua.com/">Freddy Chong Tat Chua</a> (Singapore Management University).
<li><a href="https://sites.google.com/site/hoqirong/">Qirong Ho</a> (CMU)
<li>Danai Koutra (CMU)
<li>Reyyan Yeniterzi (CMU)
<li>YiChi Wang (CMU)
<li>Steven Gardiner (CMU) 
<li>Jay Pujara (Univ Maryland)
<li>Derry Wijaya (CMU)
<li>Lingjia Deng (Univ of Pittsburgh)
<li>Chenyan Xiong (CMU)
</ul>
I have also been an external committee member for the Master's theses of 
<a href="http://www.cs.cmu.edu/~mehrbod/">Mehrbod Sharifi</a> (CMU) and
Weam Abu-Zaki (CMU).  
<p>
I am currently an external committee member for Tiancheng Zhao,
Shashank Srivastava, Pradeep Dasigi, and Abulhair Saparov.

<!-- Other: -->


<h3 class="sec"><a name="contact">Contact Info</a></h3 class="sec">

<p>
William W. Cohen<br>
Professor, Machine Learning Department<br>
Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213<br>
8217 Gates Hillman Complex<br>
(shipping address: 6105 Gates Hillman Complex)<br>
voice: 412-268-7664 / fax: 412-268-2205 <br>
Assistant: Dorothy Holland-Minkley, GHC 8001, dfh@andrew.cmu.edu<br>

<!-- <p><a href="http://people.cs.cmu.edu/person/49142.html">Official CMU Contact Info</a> -->

<p>My preferred email address is: <font color=blue>wcohen AT cs DOT cmu DOT edu</font>


<h3 class="sec"><a name="misc">Other Stuff</a></h3 class="sec">

<p>Obscure fact: In 2016, I'm listed as one of top 50 Most Influential
Scholars in Machine Learning according
to <a href="https://aminer.org/mostinfluentialscholar/ml">AMiner's
listing</a>, using a ranking which, they helpfully point out, is
"automatically determined by a computer algorithm" (no electoral
college!).  I was also listed as one of
the <a href="http://academic.research.microsoft.com/RankList?entitytype=2&topDomainID=2&subDomainID=0&last=0&start=401&end=500">500
most-cited authors in computer science</a> (as of Sept 2014).


<p>For those many friends whose research I have built on, be warned.
My full name, "William Weston Cohen", is an anagram of the phrase "I
now cite shallow men".  (From <a
href="http://iew3.technion.ac.il/~sarac/">Sara Cohen</a> - no
relation! - comes this warning: "Women's rights activists would
probably request you to use the following anagram instead: 'I shall
now cite women'".)

<p>I am often praised for my highly artistic and functional web site
designs.  An example is the site for <a
href="http://www.scindexing.com">SC Indexing, a professional book
indexer</a>.  However, I accept few clients - this one happens to be
<a href="http://www.scindexing.com">my wife</a>.

<p>Through my advisor, Alex Borgida, I can trace my <a
href="lineage.html">"academic lineage"</a> back to luminaries like
Leibniz, Newton and Alfred Whitehead.

<p>In 2014 I unearthed a strange relic from the past, a sort of
game/website I wrote for my son Charlie back in...I'm gonna say, 1994,
1995, something like that, and I sort of made it work again, although
JavaScript has changed a bit in the last couple of decades. (The main
bugs have to do with sound-file presentation - in 1994 these were
played by mime-file configured helper programs, not natively by the
browser, so now you need to hit 'back' about 1/2 the time after a
sound plays.)  <a href="dict/stuff">Historically interesting? You
decide!</a>

<p>When I'm not working my day job, I avoid productive behavior by <a
href="music/">playing music</a>.

<p><a href="hp.html">Poetry anyone?</a>

<hr>

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-2090677-1";
urchinTracker();
</script>

</BODY>
</HTML>