-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html~
executable file
·1062 lines (881 loc) · 49.6 KB
/
index.html~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<html>
<head>
<title>William W. Cohen</title>
<!-- script type="text/javascript" src="http://shots.snap.com/snap_shots.js?ap=1&key=f189a52ff115e29092c9f9bb3678047a&sb=1&th=orange&cl=0&si=0&po=0&df=0&oi=0&lang=en-us&domain=wcohen.com"></script -->
<link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body bgcolor="white">
<table>
<tr>
<td>
<img align=center src="william-at-whiteboard-small.JPG" height="auto" width="75%" alt="Picture of William Cohen">
</td>
<td>
<h2 class="name">William W. Cohen</h2>
<h3 class="title">Director, Research Engineering, <a href="http://ai.google.com">Google AI</a></h3>
<b>News:</b> I have moved to Google! Starting June 2018 I will be
starting up and leading a new research group in AI/ML that will be
located in Pittsburgh in Google's Bakery Square location.
</td>
</table>
<table><tr><td>
<p>[</font>
<a class="nav" href="#bio">Bio</a> |
<a class="nav" href="#announce">Announcements and FAQs</a> |
<a class="nav" href="#teach">Teaching</a> |
<!-- <a class="nav" href="#proj">Projects</a> | -->
<a class="nav" href="#pubs">Publications</a> (<a class="nav" href="pubs-s.html">recent</a>, <a class="nav" href="pubs.html">all</a>) |
<a class="nav" href="#sw">Software</a> |
<a class="nav" href="#data">Datasets</a> |
<a class="nav" href="#talks">Talks</a> |
<a class="nav" href="#buddies">Students & Colleagues</a> |
<a class="nav" href="http://wcohen.blogspot.com">Blog</a> |
<a class="nav" href="#contact">Contact Info</a> |
<a class="nav" href="#misc">Other Stuff</a>
]
<tr><td>Prospective visitors/students: see <a href="#announce">announcements</>
</table>
<h3 class="sec"><a name="bio"></a>Biography</h3 class="sec">
William Cohen is a Director of Research & Engineering at Google AI,
and is based in Google's Pittsburgh office. He received his bachelor's
degree in Computer Science from
<a href="http://www.duke.edu">Duke University</a> in 1984, and a PhD
in Computer Science from <a href="http://www.rutgers.edu">Rutgers
University</a> in 1990. From 1990 to 2000 Dr. Cohen worked at
AT&T <a href="http://www.bell-labs.com/">Bell Labs</a> and
later <a href="http://www.research.att.com">AT&T Labs-Research</a>,
and from April 2000 to May 2002 Dr. Cohen worked
at <a href="http://www.whizbang.com">Whizbang Labs</a>, a company
specializing in extracting information from the web. From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the <a href="http://www.ml.cmu.edu">Machine Learning Department</a>,
with a joint appointment in
the <a href="http://www.lti.cs.cmu.edu">Language Technology
Institute</a>, as an Associate Research Professor, a Research
Professor, and a Professor. Dr. Cohen also was the Director of the
Undergraduate Minor in Machine Learning at CMU and co-Director of the
Master of Science in ML Program.
<p>
Dr. Cohen is a past president of
the <a href="http://www.machinelearning.org/">International Machine
Learning Society</a>. In the past he has also served as an action
editor for the
the <a href="http://secure.aidcvt.com/mcp/searchresult.asp?INPUT=AI&Type=Pass&PCS=MCP">AI
and Machine Learning</a> series of books published
by <a href="http://www.morganclaypool.com/">Morgan Claypool</a>, for
the
journal <a href="http://pages.stern.nyu.edu/~fprovost/MLJ/"><i>Machine
Learning</i></a>, the
journal <a href="http://www.elsevier.com/locate/artint"><i>Artificial
Intelligence</i></a>, the <a href="http://www.jmlr.org"><i>Journal of
Machine Learning Research</i></a>, and
the <a href="http://www.jair.org"><i>Journal of Artificial
Intelligence Research</i></a>. He was General Chair for
the <a href="http://icml2008.cs.helsinki.fi/">2008 International
Machine Learning Conference</a>, held July 6-9 at
the <a href="http://www.helsinki.fi/university">University of
Helsinki</a>,
in <a href="http://cc.oulu.fi/~thu/personal/Finland.html">Finland</a>;
Program Co-Chair of
the <a href="http://www.autonlab.org/icml2006/home.html">2006
International Machine Learning Conference</a>; and Co-Chair of
the <a href="http://www.cs.rutgers.edu/pub/learning94/learning94.html">1994
International Machine Learning Conference</a>. Dr. Cohen was also the
co-Chair for the <a href="http://www.icwsm.org/2009/index.shtml">3rd
Int'l AAAI Conference on Weblogs and Social Media</a>, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the <a href="http://www.icwsm.org/2010/index.shtml">4rd Int'l AAAI
Conference on Weblogs and Social Media</a>. He is
a <a href="http://www.aaai.org/Awards/fellows-list.php">AAAI
Fellow</a>, and was a winner of the 2008
the <a href="http://www.sigmod.org/sigmod-awards/sigmod-awards#time">SIGMOD
"Test of Time" Award</a> for the most influential SIGMOD paper of
1998, and the
2014 <a href="http://sigir.org/sigir-2014-best-paper-awards/"> SIGIR
"Test of Time" Award</a> for the most influential SIGIR paper of
2002-2004.
<p>
Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets. He has a
long-standing interest in statistical relational learning and learning
models, or learning from data, that display non-trivial structure.
He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 200 publications.
<!-- <h3 class="sec"><a name="cv">Curriculum vita</cv></h3 class="sec">
<ul>
<li><a href="cv.pdf">My c.v. in PDF.</a>
</ul>
-->
<h3 class="sec"><a name="announce"></a>Announcements and FAQs</h3 class="sec">
<ul>
<li><b>I have moved to Google.</b> After the spring 2018 semester ends,
I will move from CMU to Google. I will be leading a new research
group in AI/ML that will be located in Pittsburgh in Google's Bakery
Square location. And case you're wondering - yes, we will be hiring!
<p><a href="http://www.cs.cmu.edu/~mgormley/">Matt Gormley</a> is the
new Director of the Undergraduate Minor in ML. The new co-Directors
of the MS in ML Program will
be <a href="http://www.cs.cmu.edu/~ninamf/">Nina Balcan</a>
and <a href="http://www.cs.cmu.edu/~rsalakhu/">Ruslan
Salakhutdinov</a>
<p>
<li><b>I'll be an invited speaker at ILP-2018</b> - that's
the <a href="http://ilp2018.unife.it/">28th International Conference
on Inductive Logic Programming</a> on September 2nd - 4th 2018, in
Ferrara, Italy.
<p>
<li><b>I'll be an invited speaker at KR-2018</b> - that's
the <a href="http://reasoning.eas.asu.edu/kr2018/">16th International
Conference on Principles of Knowledge Representation and Reasoning</a>
to be held in Tempe, Arizona (USA) on October 30-November 2, 2018.
<p>
<li><b>Can I visit CMU and work with you?, or can I apply to CMU and
work with you as a grad student?</b> I will continue to advise my
current students as needed through May 2019, but I will not be taking
any new students or hosting any visitors.
<p>
<li><b>Can I take 10-605 or 10-805 this fall?</b> Yes: 10-605/10-805
will be taught in fall 2018
by <a href="http://www.cs.cmu.edu/~bapoczos/">Barnabas Poczos<a>.
<p>
<li><b>What's the difference between 10-601 and ...?</b>
If you're having trouble with the MLD's growing menu of intro ML courses here's
<a href="https://docs.google.com/document/d/17IP9WLWAE7h6ShEF4CHQFQNFL-u2tJYdgw3wMLK7Xig/edit?usp=sharing">a
draft of a document that explains the differences.</a>
If you're not sure if you're qualified,
the <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Spring_2016#Prerequisites">prereqs
for the course</a> are listed on the course home page, and we're
fairly strict about enforcing them for undergrads. Grad students
should have equivalent experience: good programming skills - the
equivalent of a one-semester college course - and some mathematical
maturity, including prior exposure to calculus, probability and linear
algebra. If you're not sure about your background there is
a <a href="http://www.cs.cmu.edu/~wcohen/10-601/self-assessment/Intro_ML_Self_Evaluation.pdf">self-assessment</a>
test you can take.
</ul>
<!--
<h3 class="sec"><a name="proj">Projects</a></h3 class="sec">
Projects I'm currently involved with include:
<ul>
<li><a href="http://curtis.ml.cmu.edu/gnat/">GNAT is an automatic KB
construction toolkit</a> that has been used to build KBs for several
different domains,
including <a href="http://curtis.ml.cmu.edu/gnat/biomed">consumer
health information</a>
and <a href="http://curtis.ml.cmu.edu/gnat/software">software</a>.
<li><a href="http://rtw.ml.cmu.edu/rtw/">NELL</a> is a web-scale
information extraction system.
</ul>
-->
<!-- <li><a href="http://sites.google.com/site/simstudentprojectweb/">SimStudent</a>, a project that adds learning-by-demonstration to <a href="http://ctat.pact.cs.cmu.edu/">CTAT</a>. -->
<!-- <li><a href="querendipity/">Querendipity</a>, an adaptive personal information management system for biologists. -->
<!--
<li><a href="http://boowa.com">SEAL</a>, a Google-Sets-like bootstrapping tool written by my former student, <a
href="http://rcwang.com">Richard Wang</a>. -->
<!--
<li><a href="http://murphylab.web.cmu.edu/services/SLIF2/">SLIF</a>, a system that analyzes the text and images
in online journal articles to find information about the subcellular localization of proteins. -->
<!--
<li><a href="http://teamcohen.github.com/MinorThird/">Minorthird</a>,
an open-source Java package of information extraction software. (Note: we've
migrated the code now from SourceForge to GitHub.)
-->
<h3 class="sec"><a name="sw">Software and demos</a></h3 class="sec">
<!--
<b>Demos:</b>
<ul>
<li>
Measure twice, cut once - <a
href="http://www.cs.cmu.edu/~vitor/">Vitor</a> and <a
href="http://www.cs.cmu.edu/rbalasub">Ramnath</a> have developed a <a
href="http://www.cs.cmu.edu/~vitor/cutonce/cutOnce.html">Thunderbird
plugin</a> that implements <a
href="http://www.cs.cmu.edu/~wcohen/postscript/ecir2008.pdf">recipient
recommendation</a> and <a
href="http://www.cs.cmu.edu/~wcohen/postscript/sdm-2007-leak.pdf">leak
detection</a> for email. It modifies Thunderbird by adding an
additional pane that pops up after you send a message, giving you one
final chance to fix any errors in your recipient list. There's a
brief <a href="cutonce.pdf">writeup on how to use it,</a> but it's
pretty self-explanatory: just download it, open Thunderbird, and go to
the tools->addon menu to install. After you've installed it, you
train by opening your folder of "Sent" mail and pressing the "train"
button. (This took about an hour for my 9000+ old messages.)
<li>
<a href="http://www.cs.cmu.edu/~nmramesh/">Ramesh
Nallapati</a> has put together two nice demos of his <a
href="http://www.cs.cmu.edu/~wcohen/postscript/topic-tomography-submitted.pdf">multiscale topic tomography</a> topic-modeling technique, one
for articles from <a
href="http://www.cs.cmu.edu/~nmramesh/science_demo/multiscale_home.html">Science</a>,
and one with <a
href="http://www.cs.cmu.edu/~nmramesh/cancer_demo/multiscale_home.html">cancer-related
articles from PubMed</a>.
<li>
Here are two movies that demo SimStudent, a programming-by-demonstration
system for constructing cognitive tutors, built by <a href="http://www.cs.cmu.edu/~mazda/">Noboru Matsuda</a>.
<ul>
<li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Interactive/2x+3_5.mov">Interactive mode</a> (solves problems proactively, as way of posing queries)</li>
<li><a href="http://www.cs.cmu.edu/~mazda/CTAT/Video/Non-interactive/3x_9.mov">Non-interactive mode</a></li>
</ul>
</ul>
-->
<ul>
<li><a href="https://github.com/TeamCohen/TensorLog/wiki">TensorLog is
a probabilistic first-order logic which is fully differentiable.
<li><a href="https://github.com/TeamCohen/ProPPR/wiki">ProPPR</a> is an older
"locally groundable" probabilistic first-order
logic.
<li><a href="https://github.com/TeamCohen/GuineaPig">Guinea Pig</a> is
a pure Python workflow language for Hadoop.
<p>
<li>Bhuwan Dhingra is
distributing <a href="https://github.com/bdhingra/ga-reader">an
updated version of the Gated Attention Reader</a> via Github. As of
Dec 2016 the GA Reader is obtaining state-of-the-art results on
several of the standard benchmarks for answering cloze questions.
<li>Here is <a href="http://www.cs.cmu.edu/afs/cs/Web/People/dmovshov/software.html">a comment-completion Plugin for Eclipse</a>, from Dana Movshovitz-Attias.
<li>Here is <a href="https://github.com/rbalasub/jigsaw.git">Ramnath Balasubramanyan's BlockLDA</a> code, as well as some of the other algorithms from his thesis, is available on GitHub.
<li>Code for <a href="http://www.cs.cmu.edu/~nlao/code/2010.pra.gz">Ni
Lao's PRA method</a> (described in
our <a href="http://www.cs.cmu.edu/~wcohen/postscript/ecml-2010-ni.pdf">ECML
paper</a>) is available.
<li>
<a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page contains
<ul>
<li>the <a href="http://www.cs.cmu.edu/~frank/code/icml2010-code.zip">code</a>
for power iteration clustering (the algorithm described in our
ICML-2010 paper) as well as
the <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">datasets</a>
we used in the experiments.
<li>the <a href="http://www.cs.cmu.edu/~frank/code/asonam2010-code.zip">code</a>
for MultiRandomWalk (the semi-supervised learning algorithm described in our
ASONAM-2010 paper) as well as
the <a href="http://www.cs.cmu.edu/~frank/data/anonam2010-data.zip">datasets</a>
we used in those experiments.
</ul>
<p>
<li><a href="http://minorthird.sourceforge.net">Minorthird</a> is an
open-source Java package of information extraction and text
classification learning tools. This package is stable but not being actively maintained.
<ul><li>
I there is also a standalone tool, built on Minorthird, for
annotating biomedical text. This is particularly aimed at annotating
figure captions but might be useful for other text as well. The <a
href="slifTextComponent-v1.0.jar">jar file</a> for this is rather large
(17M), as it includes a Minorthird jar. There is <a
href="SlifTextComponent.html">documentation available</a> for this,
and some <a href="captions.tgz">sample data</a>.
<li>
My former student Vitor Carvalho distributes the poetically named <a
href="http://www.cs.cmu.edu/~vitor/codeAndData.html">Jangada</a> and
<a href="http://www.cs.cmu.edu/~vitor/codeAndData.html">Ciranda</a>,
which are also standalone apps built on top of Minorthird, to analyze
email messages.
</ul>
<li>
<a href="http://secondstring.sourceforge.net">SecondString</a> is
another open-source Java package, of approximate string matching
techniques.
<ul><li>SecondString includes a jar for part of an ancient version
of Minorthird. For those that are interested in <a href="radar.tgz">the source behind
the mysterious cls.jar</a>, here it is.
</ul>
<!---
<li><a href="slipper/">SLIPPER</a> and <a href="whirl/">WHIRL</a> are
now being distributed via Rutgers University. They are free for research
purposes.
--->
<li><a href="slipper-linux.tgz/">SLIPPER</a> is an old old
rule-learning system Yoram Singer and I developed. This code is
provided with absolutely no warranty, promise of support, or really,
any expectation that it will keep working. You are totally on your
own with this one, friend.
<li>WHIRL is another old system I wrote. Currently, I am not
distributing it, but ask me if you're interested in reviving the
source code.
<li>To get a copy of RIPPER, please send mail to my evil twin brother,
wcohen -AT- gmail.com. As an alternative to that ancient code: I haven't used it myself, but
I've heard good things about
J-RIP, a Ripper clone written for WEKA.
</ul>
<h3 class="sec"><a name="data">Datasets</a></h3 class="sec">
The following datasets are available for anyone to use for research
purposes:
<ul>
<li>Zhilin Yang is
distributing <a href="http://kimi.ml.cmu.edu/qa_ssl/">the data from our
ACL-2017 paper on semi-supervised QA<a>.
<li>Lidong Bing has
distributed <a href="http://www.cs.cmu.edu/~lbing/#Datasets">two
datasets from our joint work</a>: the data used in our EMNLP 2015
paper, Improving Distant Supervision for Information Extraction Using
Label Propagation Through List, and also the dataset used in our AAAI
2016 paper, Distant IE by Bootstrapping Using Lists and Document
Structure. The <a href="http://curtis.ml.cmu.edu/gnat/biomed">data
extracted by this system can also be browsed</a>.
<li>Ni Lao has distributed the labeled data from our EMNLP 2010 paper,
Random Walk Inference and Learning in A Large Scale Knowledge Base,
both <a href="http://www.cs.cmu.edu/~nlao/data/publish.amt.labels.tar.gz">Turker-labeled
data</a>
and <a href="http://www.cs.cmu.edu/~nlao/data/publish.distant.supervision.tar.gz">NELL
pseudo labels</a>.
<li><a href="http://rtw.ml.cmu.edu/wk/coordterm/syntactic/">Coordinate
terms extracted from a MALT-parsed corpus with 230B sentences</a>,
produced by Malcolm Greaves. (Corpus is ClueWeb 2009, Wikipedia from
November 2011, Project Gutenberg, and Citeseer.)
<li><a href="CrowdComp_MTurkData.tar.gz">Data sets</a> for my paper
"Crowdsourced Comprehension: Predicting Prerequisite Structure in
Wikipedia" with Partha Talukdar from BEA-2012.
<li><a href="http://rtw.ml.cmu.edu/wk/WebSets/wsdm_2012_online/index.html">Collections
of HTML Tables, hyponyms, as well as extracted entity clusters and MLT
evaluations</a>, all associated with
<a href="http://www.cs.cmu.edu/afs/cs/Web/People/bbd/">Bhavana
Dalvi</a>'s paper
on <a href="postscript/wsdm-2012-bdd.pdf">WebSets</a> from WSDM-2012.
<li>The <a href="http://www.cs.cmu.edu/~frank/data/icml2010-data.zip">network
datasets</a> used in the experiments of our ICML-2010 paper
are on <a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>'s home page.
<li>
<a href="all-bibdata.tgz">100,000+ bibliography entries</a>, in the original BibTeX format, converted to an EndNote-like format, and in a featurized format, for experiments with matching (60M).
<li>
<a href="http://yeast.ml.cmu.edu/nies/data/icwsm_gene_paper_author_firstAuthorCitations.1950_2007.ghirl.zip">A 56k-node, 200k-edge graph containing data from SGD and PubMed</a>, used in Querendipity.
<li><a href="http://www.cs.cmu.edu/~vitor/codeAndData.html">617
messages from 20 Newsgroups, annotated for reply bodies and
signatures</a>, prepared by my former student <a
href="http://www.cs.cmu.edu/~vitor">Vitor Carvalho</a>
<li><a href="http://www.cs.cmu.edu/~einat/datasets.html">
Two subsets of the Enron data, annotated with person names</a>,
prepared by my student <a href="http://www.cs.cmu.edu/~einat">Einat
Minkov</a>.
<li><a href="http://www.cs.cmu.edu/~enron">Enron email dataset</a>
(400Mb, once you get there) contains 800,000+ emails from 150 users+
organized into 4700+ folders.
<li><a href="doj-email.xls">Some more email data</a>: about two
thousand messages released to the public as part of the ongoing <a
href="http://en.wikipedia.org/wiki/Bush_White_House_e-mail_controversy">investigation
of US Attorney firings at the Dept of Justice</a>. This is very
strange data---the original email is released as scanned printouts in
PDF (?!), so most of the text is not available. There are links to
copies of the PDF, some manually added annotations, and a (apparently
manually-reconstructed) social network graph. About 1.5Mb (in Excel
format). From <a
href="http://www.dailykos.com/storyonly/2007/5/21/12120/5682">Mark
Johnson, and a network of volunteers.</a>
<li><a href="repository.tgz">A collection of various extraction datasets
in Minorthird format</a> (6Mb), including about 1000 Enron emails tagged
for person names and temporal expressions.
<li><a href="classify.tar.gz">classify.tar.gz</a> (0.4Mb) contains
nine problems in which the goal is to classify short entity names.
This data was used in <i>Joins that Generalize: Text Classification
Using WHIRL</i> (KDD-98).
<li><a href="ranking-data.tar.gz">ranking.tar.gz</a> (8Mb) contains the
data used for the meta-search experiments in my JAIR paper <a
href="http://www.jair.org/abstracts/cohen99a.html">Learning to Order
Things</a> (with Rob Schapire and Yoram Singer).
<li><a href="match.tar.gz">match.tar.gz</a> (0.7Mb) contains a suite of
<i>labeled</i> entity-name matching and clustering problems
(i.e. problems for which the correct matches/clusters are provided),
in a single consistent format. In most cases WHIRL's performance is
given as a benchmark. (These are also distributed in the <a
href="http://www.cs.utexas.edu/users/ml/riddle/data.html">RIDDLE
Repository</a>. Extraction-oriented versions of some of this data are
available on the <a
href="http://www.isi.edu/info-agents/RISE/repository.html">RISE
Repository</a>. (I.e., represented as a problem of extracting data from
a website, rather than matching two datasets).)
<li><a href="whirl-bench.tgz">whirl-bench.tgz</a> (1.1Mb) contains some
more WHIRL-format entity name matching problems.
</ul>
<h3 class="sec"><a name="talks">Talks and presentations</a></h3 class="sec">
<p>
<ul>
<li><a href="declarative-learning-workshop-2018.pptx">An invited talk
given at Third International Workshop on Declarative Learning Based
Programming</a> (DeLBP), at AAAI-2018.
<li><a href="snl-2017.pptx">An invited talk given at SNL-2017</a> (the 1st International Workshop on Symbolic-Neural Learning) in July 2017.
<li><a href="wakbc-2016.pptx">An invited talk given at WAKBC-2016</a> in June 2016.
<li>Tutorial on statistical relational learning given at NAACL 2016 with
William Wang (a shorter version of this was also presented at IJCAI 2016):
<ul>
<li><a href="naacl-2016-talk1-final.ppt">Part 1 - overview on logic, probability, MLNs, and probabilistic DDBs</a>
<li><a href="naacl-2016-talk2-final.pptx">Part 2 - ProPPR and applications</a>
<li><a href="naacl-2016-talk3-final.ppt">Part 3 - TensorLog, and other recent and current work</a>
</ul>
<li>Series of three lectures on probabilistic logic programs given at
Singapore Management University in Feb 2016:
<ul>
<li><a href="smu-2016-talk1.pptx">Background on logic and probabilistic models</a></li>
<li><a href="smu-2016-talk2.pptx">Parameter learning and structure learning in ProPPR</a></li>
<li><a href="smu-2016-talk3.pptx">Joint learning in ProPPR and comparing to neural approaches</a></li>
</ul>
<li><a href="aaai-ss-2015.ppt">Can KR Represent Real-World Knowledge?</a>, invited talk given March 2015
at the AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches
<li><a href="nlu-2014.ppt">Learning to Reason with Extracted Information</a>, keynote talk given March 2014
at Google's Natural Language Understanding Workshop, Zurich, Switzerland.
<li><a href="ilp-2013.ppt">Learning to Construct and Reason with a
Large KB of Extracted Information</a>, invited talk given August 2013
at the Inductive Logic Programming Conference, in Rio de Janeiro,
Brazil.
<li><a href="aaai-fs-2012.ppt">Reasoning With Data Extracted from The Biomedical Literature</a>,
invited talk at a joint session of the AAAI Fall Symposia on Discovery Informatics, and
Information Retrieval and Knowledge Discovery in Biomedical Text.
<li><a href="cikm-2012.ppt">Learning Similarity Relations Based on Random Walks in Graphs</a>,
invited talk at CIKM 2012, October, 2012.
<ul>
<li>Earlier version of talk:<a href="mlg-aug-2011.ppt">Learning Relationships Defined by
Linear Combinations of Constrained Random Walks</a>, invited talk at
the <a href="http://www.cs.purdue.edu/mlg2011/">9th Workshop on
Machine Learning and Graphs</a>, San Diego, CA, Aug 2011.
</ul>
<li><a href="lti-colloq-2012.ppt">Fast Effective Clustering for Graphs and Documents</a>, given at CMU's LTI Colloquium Feb 10, 2012.
<ul>
<li>Earlier versions given
at <a href="FastEffectiveClustering-v2.ppt">Virginia Tech in April
2010</a> and
<a href="FastEffectiveClustering.ppt">University of Pennsylvania
in Feb 2010.</a>
</ul>
<li><a href="psc-11-cohen.ppt">Learning to Extract a Broad-Coverage
Knowledge Base from the Web</a>, invited talk at the Symposium on
Data-Intensive Analysis, Analytics, and Informatics, Pittsburgh, PA Apr 2011.
<li><a href="nfais-11-cohen.ppt">Open Information Extraction Methods:
Computers that Learn to Read</a>, invited talk at National Federation
of Advanced Information Services (NFAIS), Philadelpha, PA, Feb 2011.
<li><a href="umd-sep-2010.ppt">Learning Proximity Relations Defined by
Linear Combinations of Constrained Random Walks</a>, given at a
seminar at the University of Maryland in Sep 2010.
<li><a href="block-lda-icml-ws-2010.ppt">Modeling Entity-Entity Links
and Entity-Annotated Text</a>, given at the ICML 2010 Workshop on
Topic Modeling.
<li><a href="MSM-2009.ppt">Predictively Modeling Social Media</a>,
invited talk given at
<a href="http://www.socialgamingplatform.com/msm09/">the 1st International Workshop on Mining Social Media</a>, co-located with 13th Conference of the Spanish Association for Artificial Intelligence (CAEPIA-TTIA 2009).
<li><a a href="IIWeb.ppt">Matching and clustering product descriptions
using learned similarity metrics</a>, invited talk given at
<a href="http://research.ihost.com/iiweb09/index.html">the IJCAI 2009 Workshop on Information Integration on the Web</a>, July 2009. (Powerpoint; 6.7M)
<li>Open information extraction talks:
<ul>
<li><a href="openIE-spain-2009.ppt">Graph-Based Methods for Open Information Extraction</a>, talk given at Nov 2009 at MAVIR in Madrid, Spain.
<li><a href="openIE-2009.ppt">Graph-Based Methods for Open Information Extraction</a>, earlier version of talk given at Stanford and Google March 2009.
<li><a href="nips-graph-ws-2008.ppt">Graph-Based Methods for Open Information Extraction</a>,
still earlier version of the same talk given at a 2008 NIPS workshop.
<li>A <a href="nipsgraphs2008_workshop_skit.mov">QT video of highlights</a> from the workshop talks, including an incisive technical question addressed to me from my colleague <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>.</li>
</ul>
<li><a href="sigmod-08.ppt">Embodied Cognition and Knowledge:
Integration of Heterogeneous Databases without Common Domains Using
Queries Based on Textual Similarity</a>, talk given for my 10-year
"Test of Time" Award at <a
href="http://www.sigmod08.org/">SIGMOD-2008</a>(Powerpoint; 11Mb)</li>
<li><a href="linkedData-2008.ppt">Using Machine Learning to Discover
and Understand Structured Data</a>, invited talk given at <a
href="http://www.linkeddataplanet.com">LinkedData
2008</a>. (Powerpoint; 6Mb)</li>
<li><a href="icmla-2007.ppt">Machine Learning for Personal Information
Management</a>, invited talk given at <a
href="http://www.icmla-conference.org/icmla07/icmla07.html">ICMLA-2007</a>. (Powerpoint; 8Mb)</li>
<li><a href="iqis.ppt">A Framework for Learning to Query Heterogeneous Data</a>,
invited talk given at <a href="http://queens.db.toronto.edu/iqis2006/">IQIS 2006</a>. (Powerpoint; 8Mb)</li>
<li><a href="dbirday-06.ppt">On Beyond Hypertext: Searching in Graphs
Containing Documents, Words, and Actual Data</a>, invited talk given
at <a href="http://dbirday2006.rutgers.edu/">DB/IR Day 2006.</a> (Powerpoint; 6Mb)</li>
<li><a href="webdb-talk.ppt">A Century Of Progress On Information
Integration: A Mid-Term Report</a>, an overview of information
integration</a>, focusing modestly on my own work, given as invited
talk at <a
href="http://webdb2005.uhasselt.be/">WebDB-2005</a>. (Powerpoint;
12Mb)</li>
<p>
<li>Tutorials:
<ul>
<li><a href="ie-survey.ppt">Information extraction</a> (PowerPoint;
4.8Mb), aimed at folks somewhat familiar with statistical NLP
methods. And thanks to Thierry Poibeau, there's also a version <a
href="http://www-lipn.univ-paris13.fr/~poibeau/cours/fr_cohen_ie_tutorial.ppt"><i>en francais</i></a> (did I get that right, Thierry?)
Also, two earlier versions of this are also still around, both
given with Andew McCallum at recent conferences, <a
href="kdd2003-tutorial.ppt">KDD-2003</a>(PowerPoint; 6.8Mb) and <a
href="nips-ie-tutorial.ppt">NIPS-2002</a>.
<li><a href="text-cat-tutorial.ppt">Text classification</a>
(PowerPoint; 3Mb), given at a CALD Summer Course.
<li><a href="collab-filtering-tutorial.ppt">Collaborative
filtering</a> (PowerPoint; 9.1Mb), given at a DIMACS workshop.
</ul>
<p>
<li>A mini-course on record linkage and matching:
<ul>
<li><a href="Matching-1.ppt">Overview of record linkage methods</a>(PowerPoint; 250kb).
<li><a href="Matching-2.ppt">Overview of distance metrics for strings</a>(PowerPoint; 530kb).
<li><a href="Matching-3.ppt">Overview of using HMMs for normalizing
text in record linkage tasks</a>(PowerPoint; 640kb). <br>
It's not a presentation, but I have also put together a <a
href="matching/">short annotated bibliography of record linkage and
matching papers</a>.
<li>William Hayes has a nice summary of <a href="http://blog.williamhayes.org/2012/07/string-similarity.html">an extended discussion
of string-matching tools</a> on the BioNLP mailing list (July 2012).
</ul>
<p>
<li>Other technical talks:
<ul>
<li><a href="ijcai-2005.ppt">A presentation of my IJCAI-2005 results</a>
on "stacked sequential learning", presented in Edinburgh in August, 2005.
<li><a href="nips-2002.ppt">A presentation of my NIPS-2002 results</a>
on using bootstrapping techniques to improve web page classification,
given at CMU in October 2002. (PowerPoint; 3.2mb).
<li><a href="www-2002.pdf">A presentation of my WWW-2002 results</a>
on wrapper learning,
presented in April 2002. (PDF; 170kb).
<li><a href="whirl-talk.pdf">An overview of experiments with WHIRL.</a> (PDF; 800kb).
</ul>
</ul>
<h3 class="sec"><a name="teach">Teaching</a></h3 class="sec">
<ul>
<li>Spring 2018: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-405_in_Spring_2018">Undergraduate Level Machine Learning with Large Datasets</a>, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
<li>Fall 2018: 10-605/10-805 will be taught
by <a href="http://www.cs.cmu.edu/~bapoczos/">Barnabas Poczos<a>
</ul>
</ul>
Past courses:
<ul>
<li>Fall 2017: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2017">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tues-Thus 1:30-2:50pm, PH 100.
<li>Fall
2016: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016">Machine
Learning with Large Datasets, 10-605 and 10-805</a>, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
<li>Spring 2016: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Spring_2016">Machine Learning 10-601</a>, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
<li>Fall 2015: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Fall_2015">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
<li>Spring 2015: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2015">Machine Learning with Large Datasets, 10-605 and 10-805</a>, Tu-Thu 10:30-11:50am in BH A51
<li>Fall 2014: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Fall_2014">10-601 Machine Learning</a>, Tu-Thu 1:30-2:50, Wean 7500
<li>Spring 2014: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2014">10-605 Machine Learning with Large Datasets</a>, Mon-Wed 1:30-2:50, Dougherty Hall 1112
<li>Fall 2013: <a href="http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_10-601_in_Fall_2013">10-601 Machine Learning</a>, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
<li>Spring 2013: <a href="http://malt.ml.cmu.edu/mw/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2013">Machine Learning with Large Datasets</a>, Mon-Wed 1:30-2:50, 4307 GHC
<li>Fall 2012: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Fall_2012">ML 10-802 and LTI 11-772 (Analysis of Social Media)</a>, 10:30-11:50pm Tues & Thus, 4303 Gates Building.
<li>Fall 2012: <a
href="http://www.cs.cmu.edu/~journalclub">10-915, the MLD Journal Club</a>, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
<li>Spring 2012: <a href="http://malt.ml.cmu.edu/mw/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2012">Machine Learning with Large Datasets</a>, Tues-Thurs 1:30-2:50pm, NSH 1305
<li>Fall 2011: <a
href="http://malt.ml.cmu.edu/mw/index.php/Structured_Prediction_10-710_in_Fall_2011">Structured
Prediction for Language and Other Discrete Data (SPLODD-2011)</a>, ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2010">Information
Extraction</a> and some from <a
href="http://www.cs.cmu.edu/~nasmith/LS2">Language and Stats 2</a>. A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
<li>Spring 2011: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Spring_2011">ML 10-802 and LTI 11-772 (Analysis of Social Media)</a>, 10:30-11:50pm Tues & Thus, 4303 Gates Building.
<li>Spring 2011: <a
href="https://docs.google.com/document/pub?id=1-XEqDHRCiikdPj-LWiYSPjxvxYyFoJ0lNPbt6ym4VZw">10-915, the MLD Journal Club</a>, 3-4pm Mon & Wed, 4101 Gates Building.
<li>Fall 2010: <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2010">10-707
(Information Extraction - cross-listed in LTI as 11-748)</a>,
1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
courses.
<li>Spring 2010: <a
href="http://malt.ml.cmu.edu/mw/index.php/Social_Media_Analysis_10-802_in_Spring_2010">10-802 (Analysis of Social Media)</a>.
<li>Fall 2009: <a
href="http://malt.ml.cmu.edu/mw/index.php/Information_Extraction_10-707_in_Fall_2009">10-707
(Information Extraction)</a>, 1:30-2:50pm Mon & Wed, 5222 Gates
Building.
<li>Spring 2008: <a
href="http://www.cs.cmu.edu/~tom/10601">10-601 (Machine Learning)</a>
with <a href="http://www.cs.cmu.edu/~tom">Tom Mitchell</a>, on 3-4:30
Mon & Wed in Wean Hall 5409.
<li>Fall 2007: <a href="10-802/fixed/Main_Page.html">Analysis of Social
Media</a>, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30
Tuesdays in Wean Hall 4623.
<ul><li>Note: This site is the shattered remains of a once-beautiful wiki,
created by the students of 10-802, generously hosted for free by
<a href="http://scribblewiki.com">ScribbleWiki</a>, tragically lost (due
a combination of RAID drive failures and low-bidder backup schemes),
and then largely recovered using
<a href="http://warrick.cs.odu.edu">Warrick</a>
from various internel caches and archives.
</ul>
<li>Fall 2007: <a href="http://www.compbio.cmu.edu/Jclub/">Current Topics
in Computational Biology (Journal Club)</a>, 02-701. (<a
href="02-701/">Announcements</a>). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell & Systems Modeling).
<li>Spring 2007: <a href="10-707">Information Extraction</a>, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
<li>Fall 2006: <a href="http://www.compbio.cmu.edu/Jclub/">Current Topics in Computational Biology (Journal Club)</a>, 02-701.
(<a href="02-701/">Announcements</a>)
<li>Spring 2006: <a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-21/www/index.html">Read the Web</a>, CALD 10-709.
<li>June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
<ul>
<li><a href="day1.tgz">Slides, notes, and sample files from first
day's lecture</a>.
<li><a href="day2.tgz">Slides, notes, and sample files from second
day's lecture</a>.
<li><a href="day3.ppt">Powerpoint slides from third
day's lecture</a>.
<li><a href="minorthird.jar">Jar file for minorThird</a>, if you
only want to run the code, not compile it or read it.
The installation process here is:
<ol>
<li>Install Java 1.4 or higher (actually, JRE is all you need).
<li>Download the <a href="minorthird.jar">jar for minorThird</a>
and stick it in some directory.
<li>Optionally, download the <a href="repository.tgz">sample data
repository</a> and unpack it into the same directory.
<li>Change to that same directory and
then run Minorthird with the command <br>
<code>java -Xmx500M -jar minorthird.jar</code>
<p>
What will pop up will be a small launch pad that can be used to
start any of the UI programs. You can also start a particular
main by specifying minorthird.jar as your classpath, for
instance: <br>
<code>java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help</code>
</ol>
<li>If you want to do a real install here's the <a
href="http://minorthird.sourceforge.net">home page on Sourceforge</a>, and
a document on <a href="10-707/QUICKSTART.txt">how to do a CVS
install Minorthird</a>.
</ul>
<li>Spring 2004: <a href="10-707/index-2004.html">"Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration"</a>, CALD 10-707 and LTI 11-748.
</ul>
<h3 class="sec"><a name="pubs">Publications</a></h3 class="sec">
<ul>
<li> Here's an <a href="pubs-atom.xml">RSS feed of my papers</a>. (Note: the feed I had created with Dapper seems spam-infested now.)
Here's a pointer to <a href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cohen:William_W=.html">my DBLP page</a>.
<li><i>A Computer Scientist's Guide To Biology</i> is no longer
available from this web page, but is now <a
href="http://www.springer.com/west/home/generic/search/results?SGWID=4-40109-22-173702304-0">available from Springer</a>. Here is a <a
href="GuideToBiology-sampleChapter-release1.4.pdf">the TOC,
introduction, index, and a sample chapter</a>, from a late draft of
the book; and also <a
href="GuideToBiology-pictures-color-release1.5.ppt">all the figures
from the book in PowerPoint</a> and <a
href="GuideToBiology-pictures-color-release1.5.pdf">all the figures in
PDF</a>. (The figures are a little prettier than the ones in the
final book, which is black and white, not color).
<li><a
href="http://shop.omnipress.com/index.asp?PageAction=VIEWPROD&ProdID=33">ICML
2006 Proceedings</a> are available in print, for the true afficianado
of fine learning-related research. It's well worth the money for the
cover art alone (of course, all the papers are also available <a
href="http://www.autonlab.org/icml2006/technical/accepted.html">on-line
for free</a>.)
<li><a href="pubs-s.html">Recent and selected publications</a>. These
are some representative publications for which on-line copies can be
distributed.
<li><a href="pubs.html">All publications</a>. Here is an more-or-less
complete chronological list of my publications. The bibliography
includes pointers to on-line versions when I can provide them, but
unfortunately copyright restrictions don't allow me to make all of my
publications available on-line. Of course, reprints are always
available from me on request.
<li>Publications by topic:<img
src="cover.png" height=200 width=150 align="right"/><img height=200 src="icml-cover.png" align="right"/>
<ul>
<li><a href="pubs-m.html">Matching/Data Integration</a>
<li><a href="pubs-t.html">Text categorization</a>
<li><a href="pubs-x.html">Information Extraction</a>
<li><a href="pubs-r.html">Rule Learning</a>
<li><a href="pubs-c.html">Collaborative Filtering</a>
<li><a href="pubs-a.html">Applications</a>
<li><a href="pubs-f.html">Formal Results</a>
<li><a href="pubs-i.html">Inductive Logic Programming</a>
<li><a href="pubs-e.html">Explanation-Based Learning</a>
</ul>
</ul>
Recent papers I'm keeping in HTML or PDF (which requires <a
href="http://www.adobe.com/prodindex/acrobat/readstep.html">Adobe
Acrobat Reader</a> to view). Older papers are mostly in Postscript.
For Windows, I use the <a
href="http://www.cs.wisc.edu/~ghost/gsview/">GSView</a> reader for
postscript. Most of these papers are viewable in several formats in
<a href="http://www.researchindex.com">ResearchIndex</a>.
<h3 class="sec"><a name="buddies">Students and other colleagues</a></h3 class="sec">
<!-- Other: -->
<ul>
<li>
<a href="http://www.cs.cmu.edu/~krivard/">Katie Rivard Mazaitis</a>, research programmer/analyst
</ul>
<!-- Students: -->
<ul>
<li><a href="https://sites.google.com/site/rosecatherinek/home">Rose Catherine Kanjirathinkal</a>, LTI PhD student.
<li><a href="http://kimiyoung.github.io/">Zhilin Yang</a>, LTI PhD student, co-advised with Ruslan Salakhutdinov.
<li><a href="http://www.cs.cmu.edu/~bdhingra/">Bhuwan Dhingra</a>, LTI PhD student, co-advised with Ruslan Salakhutdinov.
<li><a href="http://www.cs.cmu.edu/~yifengt/">Yifeng Tao</a>, CMU Comp Bio PhD student, co-supervised with Xinghua Lu.
<li><a href="http://www.cs.cmu.edu/~fanyang1/">Fan Yang</a>, MLD PhD student.
<li>Daniel Spokoyny, LTI PhD student, co-supervised with Taylog Berg-Kirkpatrick.
<p>
<li>Haitian Sun, MLD MS student.
<li><a href="https://andy-jqa.github.io/">Qiao Jin</a>, School of Medicine, Tsinghua University
</ul>
Alumni:
<ul>
<li><a href="http://www.cs.cmu.edu/~yww/">William Yang Wang</a> (former LTI PhD student, now at UCSB).
<li><a href="http://www.cs.cmu.edu/afs/cs/Web/People/dmovshov/">Dana Movshovitz-Attias</a> (former CSD PhD student,
now at Google).
<li><a href="http://www.cs.cmu.edu/afs/cs/Web/People/bbd/">Bhavana Dalvi Mishra</a> (former LTI PhD student
(co-advised with <a href="http://www.cs.cmu.edu/~callan/">Jamie Callan</a>, now at AI2)
<li><a href="http://www.cs.cmu.edu/~taey/">Tae Yano</a>, (former LTI
PhD student, co-advised
with <a href="http://www.cs.cmu.edu/~nasmith/">Noah Smith</a>, now at Microsoft)
<li><a href="http://www.cs.cmu.edu/~nli1">Nan Li</a>, (former CSD PhD
student, co-advised
with <a href="http://pact.cs.cmu.edu/koedinger.html">Ken
Koedinger</a>, now at D. E. Shaw)
<li><a href="http://www.cs.cmu.edu/~rbalasub/">Ramnath Balasubramanyan</a>, (LTI PhD student, now at Twitter)
<li><a href="http://www.cs.cmu.edu/~maheshj/">Mahesh Joshi</a>, (former LTI PhD student,
co-advised with <a href="http://www.cs.cmu.edu/~cprose/">Carolyn Rosé</a>, now at EBay)
<li><a href="http://www.cs.cmu.edu/~frank/">Frank Lin</a>, (former LTI PhD student, now at AirBnB)
<li><a href="http://www.cs.cmu.edu/~nlao/">Ni Lao</a> (former LTI PhD student, now at Google)
<li><a href="http://www.cs.cmu.edu/~rcwang">Richard C. Wang</a>,
(former LTI PhD student co-advised with <a
href="http://www.cs.cmu.edu/~ref/">Bob Frederking</a>, now at Baidu).
<li><a href="http://www.cs.cmu.edu/~aarnold/">Andrew Arnold</a>
(former MLD PhD student, now at Point 72 Asset Management)
<li><a href="http://www.cs.cmu.edu/~einat">Einat Minkov</a>
(former LTI PhD student, now at Haifa University)
<li><a href="http://www.cs.cmu.edu/~vitor">Vitor Rocha de Carvalho</a> (former LTI PhD student, now at QualComm)
<li><a href="http://www.cs.cmu.edu/~woomy/">Zhenzhen Kou</a> (former MLD PhD student, now at Google)
<p>
<li>Ezra Winston, MLD Master's student.
<li>Lanxio (Karen) Xu, MLD Master's student.
<li>Yuxing Zhang, MLD Master's student.
<li>Jakob Bauer, MLD 5th-year Master's student
.<li>Kavya Srinet, MCDS Master's student.
<li>Bhawna Juneja, MCDS Master's student.
<li>Tom Shen, CMU CSD undergrad
<li>Yu-Hsin Allen Kuo</a>, LTI MLT student, formerly co-advised with <a href="http://www.cs.cmu.edu/~nmiskov/Natasas_website/Home.html">Natasa Miskov-Zivanov</a>
<li>Rahul Goutam</a>, former LTI MLT student, co-advised with <a href="http://www.cs.cmu.edu/~nmiskov/Natasas_website/Home.html">Natasa Miskov-Zivanov</a>
<li><a href="https://plus.google.com/102262489142071513958/posts">Malcolm Greaves</a>, former CSD master's student.
<li><a href="http://www.cs.cmu.edu/~eairoldi">Edoardo Airoldi</a>
(former MLD/Stats PhD student, co-advised with <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>)
<li><a href="http://www.csie.ncu.edu.tw/~chia/">Ja-Hui Chang</a>
(visiting faculty from National Central University, Taiwan, 2007-2008)
<li>Wen Haw Chong (PhD student at Singapore Management University,
visted CMU in 2015-2016).
<li><a href="http://www2.sis.smu.edu.sg/students/phd/class10/10_hoang_tuananh.asp">Tuan
Ahn Hoang</a>, (PhD student at Singapore Management University,
visited CMU for 2012-2013 academic year in my group).
<li><a href="http://freddychua.com/">Freddy
Chong Tat Chua</a> (PhD student at Singapore Management University,
visited CMU for the academic year 2011-2012 in my group.)
<li><a href="http://www.optimizelife.com/">Gustavo Lacerda</a>
(former research assistant, co-supervised with Noboru Matsuda and Ken Koedinger, now at UBC)
<li><a href="http://www.cs.cmu.edu/~lbing/">Lidong Bing</a>, former
postdoc, now at Tencent.
<li><a href="https://sites.google.com/site/rameshnallapati/">Ramesh Nallapati</a>
(former postdoc, co-supervised with <a
href="http://www.cs.cmu.edu/~lafferty/">John Lafferty</a>, now at IBM Watson)
<li><a href="http://www.cs.cmu.edu/~mazda">Noboru Matsuda</a>
(former postdoc, co-supervised with <a href="http://pact.cs.cmu.edu/koedinger.html">Ken Koedinger</a>,
now System Scientist in CMU's HCII)
<li><a href="http://www.cs.cmu.edu/~pradeepr">Pradeep Ravikumar</a>
(former MLD PhD student, co-advised with <a href="http://www.stat.cmu.edu/~fienberg/">Steve Fienberg</a>)
<!-- External members -->
<p>
<li>I have been an external committee member for the PhD theses of
<ul>
<li><a href="http://mcsp.wartburg.edu/zelle/">John Zelle</a> (degree
from U Texas)
<li><a href="http://research.microsoft.com/en-us/um/people/mbilenko/">Misha
Bilenko</a> (from U Texas)
<li><a href="http://www-users.cs.york.ac.uk/~kudenko/">Daniel Kudenko</a>
(Rutgers)
<li>Chumki Basu (Rutgers)
<li>Ananlada Chotimongkol (CMU)
<li>Wei-Hao Lin (CMU)
<li>Cenk Gazen (CMU)
<li>David Nadeau (U Ottowa)
<li><a href="http://cs.cmu.edu/~htong">Hanghang Tong</a> (CMU)
<li>Ben van Durme (Rochester)
<li><a href="http://www.cis.upenn.edu/~partha/">Partha Talukdar</a> (U Penn)
<li><a href="http://www.cs.cmu.edu/~acarlson/">Andy Carlson</a> (CMU)
<li><a href="http://www.cs.cmu.edu/~hyifen/">Yifen Huang</a> (CMU)
<li><a href="http://www.cs.pitt.edu/~swapna/Main.html">Swapna Sundaran</a> (U
Pitt)</a>
<li><a
href="http://www.cs.cmu.edu/~mheilman/">Michael Heilman</a> (CMU)
<li><a
href="http://www.cs.cmu.edu/~jelsas/">Jon Elsas</a> (CMU)
<li><a href="http://www.cs.cmu.edu/~dipanjan/Home.html">Dipanjan Das</a> (CMU)
<li><a href="http://www.cs.cmu.edu/~fanguo/">Fan Guo</a> (CMU)
<li><a href="http://www.andrew.cmu.edu/user/jdiesner/">Jana Diesner</a> (CMU)
<li><a href="http://freddychua.com/">Freddy Chong Tat Chua</a> (Singapore Management University).
<li><a href="https://sites.google.com/site/hoqirong/">Qirong Ho</a> (CMU)
<li>Danai Koutra (CMU)
<li>Reyyan Yeniterzi (CMU)
<li>YiChi Wang (CMU)
<li>Steven Gardiner (CMU)
<li>Jay Pujara (Univ Maryland)
<li>Derry Wijaya (CMU)
<li>Lingjia Deng (Univ of Pittsburgh)
<li>Chenyan Xiong (CMU)
</ul>
I have also been an external committee member for the Master's theses of
<a href="http://www.cs.cmu.edu/~mehrbod/">Mehrbod Sharifi</a> (CMU) and
Weam Abu-Zaki (CMU).
<p>
I am currently an external committee member for Tiancheng Zhao,
Shashank Srivastava, Pradeep Dasigi, and Abulhair Saparov.
<!-- Other: -->
<h3 class="sec"><a name="contact">Contact Info</a></h3 class="sec">
<p>
William W. Cohen<br>
Professor, Machine Learning Department<br>
Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213<br>
8217 Gates Hillman Complex<br>
(shipping address: 6105 Gates Hillman Complex)<br>
voice: 412-268-7664 / fax: 412-268-2205 <br>
Assistant: Dorothy Holland-Minkley, GHC 8001, [email protected]<br>