+where -Xmx500M allocates additional memory for the Java heap, and the additional arguments are as follows:
+
+
+
DIR is a directory containing some number of text files to
+annotate.
+
+
A sample
+directory of files is available as a compressed tarfile. These
+captions were all taken from PubMedCentral papers, e.g., the file
+"p9770486-fig_4_2" is from Figure 4 of the paper with the PubMed
+Id of 9770486.
+
+
FILE is where annotations should be placed. These are output in 'minorthird format',
+which is explained below.
+
+
COMPONENT1,... are the names of 'text components' to use to
+
+
+The components available are:
+
+
+
CellLine: marks spans that are predicted to be the names
+ of cell lines with 'cellLine', using an entity-tagger trained using
+ the Genia corpus.
+
+
CRFonYapex: marks spans that are predicted to be the
+ names of genes or proteins with 'protein', and also
+ 'proteinFromCRFonYapex', using a gene-taggertrained using the YAPEX
+ corpus using the CRF algorithm, as described by Kou,
+ Cohen and Murphy (2005).
+
+
CRFonTexas, CRFonGenia: analogous to CRFonYapex, but
+ using gene-taggers trained on different corpora (as outlined in the
+ Kou et al paper.)
+
+
SemiCRFOnYapex, SemiCRFOnTexas, SemiCRFOnGenia: analogous
+ to the CRFon* components, but trained with the SemiCRF algorithm.
+
+
DictHMMOnYapex, DictHMMOnTexas, DictHMMOnGenia: analogous
+ to the CRFon* components, but trained with the DictHMM algorithm.
+
+
Spans marked as 'imagePointer' are predicted to be image
+ pointers. For a definition of image pointers, see Cohen,
+ Murphy, and Wang (2002).
+
+
Spans marked as 'bulletStyle' and 'citationStyle' are
+ predicted to be bullet-style and citation-style image pointers,
+ respectively.
+
+
Spans marked as 'bulletScope' and 'localScope' are predicted
+ to be the scopes of bullet-style and citation-style
+ image pointers, respectively.
+
+
Spans marked as 'globalScope' are text assumed to pertain
+ to the entire associated image.
+
+
Spans marked as either 'bulletScope', 'localScope', or
+ 'globalScope' are marked as 'scope'.
+
+
Every 'scope' span is associated with a span
+ property called its 'semantics'. The 'semantics' of a span
+ is the concatenation of all the image pointers associated with
+ that span.
+
+
+
+ Additionally, the span labels 'regional' and 'local' are synonyms
+ for bullet-style and citation-style, respectively.
+
+
+ Briefly, to find out what parts of an image some span
+ S might refer to, you need to (1) find out what 'scope'
+ spans S is inside of and (2) find out what the 'semantics'
+ of these scope spans are. For instance, if the span 'RAS4' is
+ inside a scope T1 with semantics "A" and also inside a
+ scope T2 with semantics "BD", then 'RAS4' probably is
+ associated with the parts of the accompanying image labeled A",
+ "B", and "D".
+
+
+
+
The Minorthird format for stand-off annotation
+
+The format for output is the one used by Minorthird. Specifically, the
+output (in the default format) is a series of lines in one of these
+formats:
+
+
FILE is the name of the file containing some span;
+
+
START and LENGTH are the initial byte position of the span, and its length;
+
+
SPANTYPE is the type of span (e.g., 'imagePointer',
+'cellLine', 'protein', 'scope', etc.
+
+
LETTERS is (as noted above) the concatenation of all the
+ image pointers associated with that span
+
+
+
+
Other options
+
+
+
Option
Explanation
+
+
-help
+
Gives brief command line help
+
+
+
-gui
+
Pops up a window that allows you to interactively fill in the other arguments, monitor the execution of the annotation process, etc.
+
+
+
-showLabels
+
Pops up a window that displays the set of documents being labeled.
+ (This is not recommended for a large document collection, due to
+ memory usage.)
+
+
+
+
-showResult
+
Pops up a window that displays the result of the annotation.
+ (Again, not recommended for a large document collection.)
+
+
+
+
-format strings
Outputs results as a
+ tab-separated table, instead of minorthird format. The first
+ column summarizes the type of the span, the file the span was
+ taken from, and the start and end byte positions, in a
+ colon-separated format. (E.g.,
+ "cellLine:p11029059-fig_4_1:1293:1303".) The remaining column(s)
+ are the text that is contained in the span (e.g., "HeLa cells",
+ for the span above) almost exactly as it appears in the document; the
+ only change is that newlines are replaced with spaces.
+
+
+A number of people have contributed to these tools, including William
+Cohen, Zhenzhen Kou, Quinten Mercer, Robert Murphy, Richard Wang, and
+other members of the SLIF team.
+
+The initial development of these tools was supported by grant 017396
+from the Commonwealth of Pennsylvania Tobacco Settlement Fund. Further
+development is supported by National Institutes of Health grant R01
+GM078622.
+
+
+
+
diff --git a/Thumbs.db b/Thumbs.db
new file mode 100755
index 0000000..6d4990b
Binary files /dev/null and b/Thumbs.db differ
diff --git a/aaai-fs-2012.ppt b/aaai-fs-2012.ppt
new file mode 100644
index 0000000..b970895
Binary files /dev/null and b/aaai-fs-2012.ppt differ
diff --git a/aaai-ss-2015.ppt b/aaai-ss-2015.ppt
new file mode 100644
index 0000000..0a4c9cf
Binary files /dev/null and b/aaai-ss-2015.ppt differ
diff --git a/advice.html b/advice.html
new file mode 100755
index 0000000..17f7b83
--- /dev/null
+++ b/advice.html
@@ -0,0 +1,181 @@
+
+
+
+
+Advice for Technical Speaking
+
+
+
+
+
+When you hear a talk - a good one or a bad one! - think about the
+presentation as well as the content. Copy what works, and avoid what
+doesn't. If you see a great talk, examine it and try to figure out
+what makes it great. If you see a poor talk, examine it and ask
+yourself if you might make the same mistakes. Some of the most common
+mistakes are below, but people are really quite creative in coming up
+with new mistakes, so don't assume this list is complete!
+
+Don't use too many slides. If you have more than one slide per minute,
+you are definitely using too many. One slide per two minutes is a much
+more reasonable pace.
+
+Don't read your slides. You do not need to put everything you are
+going to say up on a slide; that's what speaker notes are for. Save
+your slides for things that don't work as well with just speech:
+figures, diagrams, movies, animations, extra emphasis on important
+concepts. If your slides are just lists of bullets, they should
+probably be speaker notes and not slides.
+
+Don't put too many pixels on your slides. Use big fonts and
+contrasting colors. Projectors are notoriously bad at making ochre and
+mauve actually look different on the screen. If you copy a figure from
+a paper, ask yourself whether the text labels are big enough to read
+from the back of the room, and redo them if not.
+
+Don't try and say too much. You can't explain everything you've been
+going in a semester project in 30 minutes - part of communicating is
+deciding what to leave out. If you feel like you have to rush to say
+what you need to say, you're going too fast: the presentation should
+be relaxed enough that people have a chance to reflect on what you
+say, and ask questions if they need to.
+
+Talk concretely about your work. People are great at abstracting from
+examples, but it's hard work for them to think through high-level
+abstractions. (This is the opposite case from when you're programming
+a computer - then you always program the most general case possible,
+and let the computer instantiate it as needed.) When you're talking
+to a person, start with a concrete problem you want to solve, and then
+help the person understand how to generalize that concrete problem to
+the general case.
+
+View your talk as an advertisement for your paper(s). Your goal is
+to convince your audience to use your ideas for their own work, so
+that they cite you and make you famous. Your goal is not to
+make them understand Equation 43 on page 17 (unless that convinces
+them to cite you and make you famous). Instead, say what your
+techniques are good for, why they're important, what the alternatives
+are, and how to choose when your techniques are appropriate instead of
+the alternatives. Then and only then, use the rest of the time on
+technical stuff, with the goal of giving listeners the tools to read
+your paper. (If you're talking about someone else's work, imagine
+instead that you're trying to get the audience to trust your
+evaluation of that work.)
+
+Be honest and diligent. Don't try to cover up flaws or overstate the
+applicability of your techniques; instead, try to discover flaws and
+limitations and expose them.
+
+Think concretely about your audience. Will they be able to understand
+each slide as it comes up? Will they understand why each slide is
+important? As a heuristic, I often find it best to prepare the first
+version of a talk with one specific person I know well in
+mind - and think about what I would say to engage and inform him or
+her specifically.
+
+Talk at the right level for your audience. Remember that, almost by
+definition, you understand the material and they don't, and fight the
+inclination to go too fast. Be aware of people's cognitive
+limitations. Don't make your audience figure something out if you
+don't have to; that will save more processing power for what you want
+them to focus on. In particular:
+
+
Don't ask people to listen to one thing and read another at the
+same time.
Don't ask people to remember an equation or
+definition 5 slides later: just put up a copy when you refer to it.
+
Use direct, simple language. For example, if there are three ways
+to refer to something, pick one and use it consistently throughout
+your talk: don't call something a “model&rdquo on one slide and
+a “parameter vector” on another.
+
Label every graph clearly and in large fonts: both axes, every line, and even the sign of any comparison you want to make (“higher is better”).
+
If a fact is important, emphasize it. The audience doesn't necessarily manage to process every word you say. Help them process your talk by telling them what is important, and by repeating things they might have forgotten.
+
+
+
+Synthesize. The audience should get something out of your talk that
+they can't get as quickly or easily out of the paper(s). This means:
+pull together concepts from multiple papers if necessary; compare to
+related work; communicate your judgement about benefits and
+limitations of each technique.
+
+Be careful with equations. You can use a limited number of
+equations if you want to, but make sure that you spend enough time
+explaining them that the audience truly understands them.
+
+
Often, it's a good idea to leave the slide blank and hand-write the equations on it during the actual talk; this trick will keep you from going too fast. Of course, this trick only works if you have a tablet or (gasp) an analog device like a whiteboard or overhead transparencies.
+
If you use this trick, make sure you practice writing out the equations ahead of time at the same level of detail that you plan to use during the talk.; Don't just assume they're simple enough that you can't possibly get them wrong; that assumption is usually false.
+
+
+
+Organize well:
+
+
+
Introduce one new concept at a time. Make sure you know, for every part of every slide, which concept it is intended to convey.; Make sure you can describe each concept with a clear, short phrase -- else it's probably more than one concept.
+
Introduce concepts in the right order. If concept B depends on concept A, make sure to introduce A first.
+
Sometimes it helps to make a directed graph: nodes are the short phrases for concepts, and arrows represent prerequisites. You can then check that your talk is consistent with the graph (i.e., doesn't try to reverse any arrows).
+
If there are directed cycles in your graph, you have a problem. Try to refactor your concepts and pull out something that you can introduce before any of the nodes in a cycle, then re-evaluate the dependencies, and repeat until you get a DAG.
+
+
+
+
+Start and end your talk well:
+
+
If possible, put up your title slide while you're being introduced. Then you don't need to read it.
+
Make sure the audience knows who you are, especially if you're talking about a paper with multiple authors. You may want to put your name at the bottom of every slide, for people who come in late.
+
Make sure you know the first few sentences of your talk by heart. Exact memorization is usually a bad idea for the body of the talk (it sounds stilted), but I find that knowing the first sentence or two helps me get started. (And once I get started I can almost always keep going.)
+
+
Make sure you have an obvious end to your talk, and don't just trail off into silence. Always end with a statement (e.g., “thank you&rdquo) not a question (e.g., “any questions?”). If you end with a question, the audience doesn't know whether to answer it or applaud, which can be awkward.
+
+
+Audiences hate to have their time wasted. So:
+
+
Whenever you can do a little work to save your audience a little
+work, you should. E.g., make a better visualization or a better
+figure, if you think it will improve your audience's ability to
+understand. Or, take that huge table of timing results from your
+paper and translate it into a bar chart that highlights the
+comparisons you're trying to make.
+
+
View an agreement to give a talk as a commitment. Don't
+cancel unless you really, really need to. If you do have to
+cancel, give as much notice as you can.
+
+
Plan to show up early. That way if something goes wrong
+(miss a bus, projector doesn't work, etc.), you have time to fix
+it. Snafus like the above are part of the normal order of the
+world, and somehow seem to be even more common when you're about to
+give a talk. Speakers should therefore expect and plan for them.
+
+
Know your tools. Make sure you know how to hook your laptop
+up to a projector, how to operate your presentation software quickly
+and unobtrusively, how to avoid having instant messages pop up on top
+of your slides, etc.
+
+
+
+
+Don't waste your own time either. Don't spend lots of time
+designing pretty animations, flying text, etc., unless they will
+actually help audience comprehension and not distract from your
+talk. Every second spent animating is a second you don't have
+for explaining your ideas.
+
+
+
+
+
+
+
+
+
+Last modified: Mon Jan 03 16:09:28 Eastern Standard Time 2011
+
+
+
diff --git a/all-bibdata.tgz b/all-bibdata.tgz
new file mode 100644
index 0000000..197c812
Binary files /dev/null and b/all-bibdata.tgz differ
diff --git a/all-nell-triples.txt.gz b/all-nell-triples.txt.gz
new file mode 100644
index 0000000..86137de
Binary files /dev/null and b/all-nell-triples.txt.gz differ
diff --git a/balloon.zip b/balloon.zip
new file mode 100755
index 0000000..5d00d55
Binary files /dev/null and b/balloon.zip differ
diff --git a/block-lda-icml-ws-2010.ppt b/block-lda-icml-ws-2010.ppt
new file mode 100755
index 0000000..8a1d1aa
Binary files /dev/null and b/block-lda-icml-ws-2010.ppt differ
diff --git a/bottom.html b/bottom.html
new file mode 100755
index 0000000..929f23f
--- /dev/null
+++ b/bottom.html
@@ -0,0 +1,302 @@
+
+
+William W. Cohen
+
+
+
+
Areas of expertise
+
+I have extensive experience in machine learning and discovery,
+information retrieval, information extraction, and data integration.
+
+
Biography
+
+William Cohen received his bachelor's degree in Computer Science from
+Duke University in 1984, and a PhD
+in Computer Science from Rutgers
+University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell Labs and later AT&T Labs-Research, and from
+April 2000 to May 2002 Dr. Cohen worked at Whizbang Labs, a company
+specializing in extracting information from the web. Dr. Cohen is
+currently an action editor for the Journal of Machine Learning
+Research, has served as an editor for the journal Machine
+Learning and the Journal of
+Artificial Intelligence Research, co-organized the 1994
+International Machine Learning Conference, and has served on more than
+20 program committees or advisory committees. In addition to
+his position at CMU, Dr. Cohen also serves on the advisory board of
+Intelliseek.
+
+
+Dr. Cohen's research
+interests include information integration and machine learning,
+particularly text categorization and learning from large datasets. He
+holds six patents related to learning, discovery, information
+retrieval, and data integration, and is the author of more than 60
+refereed publications.
+
+
+
+
My
+latest baby is Minorthird,
+an open-source Java package of information extraction
+and text classification learning tools.
+
+
+SecondString is
+another open-source Java package, of approximate string matching
+techniques.
+
+
SLIPPER and WHIRL are
+now being distributed via Rutgers University. They are free for research
+purposes.
+
+
Send me email to find out how to get a copy of RIPPER.
+
+As an alternative to that ancient code: I haven't used it myself, but
+I've heard good things about
+
+J-RIP, a Ripper clone written for WEKA.
+
+
+The following datasets are available for anyone to use for research
+purposes:
+
+
classify.tar.gz (0.4Mb) contains
+nine problems in which the goal is to classify short entity names.
+This data was used in Joins that Generalize: Text Classification
+Using WHIRL (KDD-98).
+
+
match.tar.gz (0.7Mb) contains a suite of
+labeled entity-name matching and clustering problems
+(i.e. problems for which the correct matches/clusters are provided),
+in a single consistent format. In most cases with WHIRL's
+performance is given as a benchmark.
+
+
ranking.tar.gz (8Mb) contains the
+data used for the meta-search experiments in my JAIR paper Learning to Order
+Things (with Rob Schapire and Yoram Singer).
+
+
Information extraction (PowerPoint;
+ 4.8Mb), aimed at folks somewhat familiar with statistical NLP
+ methods. Two earlier versions of this are also available, both
+ given with Andew McCallum at recent conferences, KDD-2003(PowerPoint; 6.8Mb) and NIPS-2002.
+
+
Text classification
+ (PowerPoint; 3Mb), given at a recent CALD Summer Course.
+
+
A presentation of my NIPS-2002 results
+on using bootstrapping techniques to improve web page classification,
+given at CMU in October 2002. (PowerPoint; 3.2mb).
+
Change to that same directory and
+ then run Minorthird with the command
+ java -Xmx500M -jar minorthird.jar
+
+
+ What will pop up will be a small launch pad that can be used to
+ start any of the UI programs. You can also start a particular
+ main by specifying minorthird.jar as your classpath, for
+ instance:
+
+ java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
+
All publications. Here is an more-or-less
+complete chronological list of my publications. The bibliography
+includes pointers to on-line versions when I can provide them, but
+unfortunately copyright restrictions don't allow me to make all of my
+publications available on-line. Of course, reprints are always
+available from me on request.
+
+
+
+Recent papers I'm keeping in HTML or PDF (which requires Adobe
+Acrobat Reader to view). Older papers are mostly in Postscript.
+For Windows, I use the GSView reader for
+postscript. Most of these papers are viewable in several formats in
+ResearchIndex.
+
+
For those many friends whose research I have built on, be warned.
+My full name, "William Weston Cohen", is an anagram of the phrase "I now
+cite shallow men".
+
+
I am often praised for my highly artistic and functional web site designs.
+An example is the site for SC Indexing,
+a professional book indexer. However, I accept few clients - this
+one happens to be my wife.
+
+
Through my advisor, Alex Borgida, I can trace my "academic lineage" back to luminaries like
+Leibniz and Alfred Whitehead.
+
+
+
+
+
+
diff --git a/captions.tgz b/captions.tgz
new file mode 100755
index 0000000..5141598
Binary files /dev/null and b/captions.tgz differ
diff --git a/cikm-2012.ppt b/cikm-2012.ppt
new file mode 100644
index 0000000..54e0da5
Binary files /dev/null and b/cikm-2012.ppt differ
diff --git a/classify.tar.gz b/classify.tar.gz
new file mode 100644
index 0000000..d79b56d
Binary files /dev/null and b/classify.tar.gz differ
diff --git a/cloud/Notes.html b/cloud/Notes.html
new file mode 100644
index 0000000..6f023fe
--- /dev/null
+++ b/cloud/Notes.html
@@ -0,0 +1,132 @@
+Tag Cloud
+
+
+
What I did
+
+
+ - I used Yahoo Site explorer and wget to download 1000 pages from
+ dailykos and redstate.com. I believe this is the top 1000 pages on
+ the site by #inlinks. I filtered these to get blog entries, including
+ comments.
+
+ - I extracted the words from dkos & redstate blog entries, and the
+ corresponding comments, using a perl script (that uses an extendable
+ perl HTML parser, and site-specific "class" tags on the comment and
+ entry DOM nodes). The redstate comments are a little messier, since
+ I could not easily strip out signatures.
+
+ - I tokenized, stoplisted, counted a bunch of word frequencies, and
+ saved all the words that appear >= 5 times in dkos entries, redstate
+ entries, dkos comments, redstate comments, etc.
+
+ - I estimating a bunch of relative-frequency/MI sort of statistics.
+ What seemed most reasonable was to look for "non-general English"
+ words that are "more common in context X than context Y", which
+ I express with this score
+
+ log[ P(w|X) / P(w|Y)*P(w|GE) ]
+
+ Stats for "general English" were from the brown corpus. I smoothed
+ with a Dirichlet, and probably more importantly, by replacing zero
+ counts for P(w) with counts of 4 (since I only stored counts>=5).
+
+ Then for each X,Y I looked at, I took the top 200 scoring words,
+ broke them into 10 equal-frequency bins, and built a "tagcloud"
+ visualization of them. The top 200 ignored a handful of stuff that I
+ decided was noise: signature line tokens, like ----; words like
+ "pstin", which seem to be poorly-tokenized dkos words; date, time and
+ number words; and words like kos, dailykos, entry, diary, fold, and
+ email.
+
+===================================================================
+X Y file name
+===================================================================
+dkos entries redstate blog entries blue-red-entry.html
+dkos comments redstate blog comments blue-red-comment.html
+dkos anything redstate blog anything blue-red-all.html
+redstate entries dkos entries red-blue-entry.html
+redstate comments dkos comments red-blue-comment.html
+redstate anything dkos anything red-blue-all.html
+redstate comment redstate entry redComment-redEntry.html
+dkos comment dkos entry blueComment-blueEntry.html
+===================================================================
+
+ For a few other context's I scored as
+
+ log [ P(w|X)*P(W|Y) / P(w|GE) ]
+
+ ie "non-general English" words that are "common in both context X and
+ context Y"
+
+===================================================================
+X,Y file name
+===================================================================
+dkos,redstate comments blue+red-comment.html
+dkos,redstate entries blue+red-entry.html
+dkos,redstate anything blue+red-all.html
+===================================================================
+
+ - I also wrote code to pick up subject-matter 'tags' from dailykos
+ (like the delicious tagging scheme), which turned out to be pretty
+ noisy (eg, "republican" and "repulican party" are both tags, as are
+ "iraq" and "iraq war".) I set up some additional contexts X = "dkos
+ comments for entries tagged with something that contains the word T"
+ and compared them to Y="all dkos comments"
+
+===================================================================
+T file name
+===================================================================
+elections blueElections-blue-comment.html
+iraq blueIraq-blue-comment.html
+media blueMedia-blue-comment.html
+===================================================================
+
+ - Sizes of all of this, in words:
+
+==============================
+brown 480098
+
+dkos-all 3351061
+dkos-comment 3311702
+dkos-entry 39359
+
+redstate-all 1152883
+redstate-comment.freq 940241
+redstate-entry 212642
+
+dkos-iraq-comment 341238
+dkos-elections-comment 256129
+dkos-media-comment 160413
+==============================
+
+
+
+
Observations
+
+
+
Redstate has way less comment text total than dkos, but way more
+entry text. There are also more entries in redstate than dkos (788 vs
+351) so the amount of entry text may simply be more because dkos has a
+bunch of high-inlink pages that are not comment-containing.
+
+
There are apparently pretty big differences in vocabulary in
+comments pertaining to different entries (e.g., media vs Iraq). There doesn't seem to be
+a major impact of the actual vocabulary used in the entries though -
+eg I don't see the term "iraq" in the Iraq-related comments).
+
+
A lot of the "vocabulary" from the comments may be user names.
+
+
There seems to be a lot of argumentation in the comment sections
+(agree, aren't, doesn't, don't, etc)
+
+
+
+
+
+
\ No newline at end of file
diff --git a/cloud/blue+red-all.html b/cloud/blue+red-all.html
new file mode 100644
index 0000000..df4f5bf
--- /dev/null
+++ b/cloud/blue+red-all.html
@@ -0,0 +1,226 @@
+Tag Cloud
+
+
+
Date: Thu Mar 13 15:40:47 EDT 2008
+ Command: bin/clouder.plredComment-redEntry-words.txt 200 red
+
diff --git a/cohenBio.doc b/cohenBio.doc
new file mode 100644
index 0000000..be267ca
Binary files /dev/null and b/cohenBio.doc differ
diff --git a/collab-filtering-tutorial.ppt b/collab-filtering-tutorial.ppt
new file mode 100755
index 0000000..c1ce8cb
Binary files /dev/null and b/collab-filtering-tutorial.ppt differ
diff --git a/cover.png b/cover.png
new file mode 100755
index 0000000..5746989
Binary files /dev/null and b/cover.png differ
diff --git a/cutonce.pdf b/cutonce.pdf
new file mode 100755
index 0000000..f4481be
Binary files /dev/null and b/cutonce.pdf differ
diff --git a/cuts.txt b/cuts.txt
new file mode 100644
index 0000000..3c657ea
--- /dev/null
+++ b/cuts.txt
@@ -0,0 +1,22 @@
+
+
+-->
+
diff --git a/cv.aux b/cv.aux
new file mode 100755
index 0000000..54cd7ee
--- /dev/null
+++ b/cv.aux
@@ -0,0 +1 @@
+\relax
diff --git a/cv.dvi b/cv.dvi
new file mode 100755
index 0000000..4e210ca
Binary files /dev/null and b/cv.dvi differ
diff --git a/cv.log b/cv.log
new file mode 100755
index 0000000..37c8329
--- /dev/null
+++ b/cv.log
@@ -0,0 +1,151 @@
+This is TeX, Version 3.141592 (MiKTeX 2.3) (preloaded format=latex 2000.11.28) 19 SEP 2003 09:36
+**cv
+(cv.tex
+LaTeX2e <2001/06/01>
+Babel and hyphenation patterns for english, french, german, ngerman, du
+mylang, nohyphenation, loaded.
+(C:\texmf\tex\latex\base\latex209.def
+File: latex209.def 1998/05/13 v0.52 Standard LaTeX file
+
+
+ Entering LaTeX 2.09 COMPATIBILITY MODE
+ *************************************************************
+ !!WARNING!! !!WARNING!! !!WARNING!! !!WARNING!!
+
+ This mode attempts to provide an emulation of the LaTeX 2.09
+ author environment so that OLD documents can be successfully
+ processed. It should NOT be used for NEW documents!
+
+ New documents should use Standard LaTeX conventions and start
+ with the \documentclass command.
+
+ Compatibility mode is UNLIKELY TO WORK with LaTeX 2.09 style
+ files that change any internal macros, especially not with
+ those that change the FONT SELECTION or OUTPUT ROUTINES.
+
+ Therefore such style files MUST BE UPDATED to use
+ Current Standard LaTeX: LaTeX2e.
+ If you suspect that you may be using such a style file, which
+ is probably very, very old by now, then you should attempt to
+ get it updated by sending a copy of this error message to the
+ author of that file.
+ *************************************************************
+
+\footheight=\dimen102
+\@maxsep=\dimen103
+\@dblmaxsep=\dimen104
+\@cla=\count79
+\@clb=\count80
+\mscount=\count81
+(C:\texmf\tex\latex\base\tracefnt.sty
+Package: tracefnt 1997/05/29 v3.0j Standard LaTeX package (font tracing)
+\tracingfonts=\count82
+LaTeX Info: Redefining \selectfont on input line 96.
+)
+\symbold=\mathgroup4
+\symsans=\mathgroup5
+\symtypewriter=\mathgroup6
+\symitalic=\mathgroup7
+\symsmallcaps=\mathgroup8
+\symslanted=\mathgroup9
+LaTeX Font Info: Redeclaring math alphabet \mathbf on input line 288.
+LaTeX Font Info: Redeclaring math alphabet \mathsf on input line 289.
+LaTeX Font Info: Redeclaring math alphabet \mathtt on input line 290.
+LaTeX Font Info: Redeclaring math alphabet \mathit on input line 296.
+LaTeX Info: Redefining \em on input line 306.
+ (C:\texmf\tex\latex\base\latexsym.sty
+Package: latexsym 1998/08/17 v2.2e Standard LaTeX package (lasy symbols)
+\symlasy=\mathgroup10
+LaTeX Font Info: Overwriting symbol font `lasy' in version `bold'
+(Font) U/lasy/m/n --> U/lasy/b/n on input line 42.
+)
+LaTeX Font Info: Redeclaring math delimiter \lgroup on input line 370.
+LaTeX Font Info: Redeclaring math delimiter \rgroup on input line 372.
+LaTeX Font Info: Redeclaring math delimiter \bracevert on input line 374.
+)
+(C:\texmf\tex\latex\base\article.cls
+Document Class: article 2001/04/21 v1.4e Standard LaTeX document class
+(C:\texmf\tex\latex\base\size11.clo
+File: size11.clo 2001/04/21 v1.4e Standard LaTeX file (size option)
+)
+\c@part=\count83
+\c@section=\count84
+\c@subsection=\count85
+\c@subsubsection=\count86
+\c@paragraph=\count87
+\c@subparagraph=\count88
+\c@figure=\count89
+\c@table=\count90
+\abovecaptionskip=\skip41
+\belowcaptionskip=\skip42
+Compatibility mode: definition of \rm ignored.
+Compatibility mode: definition of \sf ignored.
+Compatibility mode: definition of \tt ignored.
+Compatibility mode: definition of \bf ignored.
+Compatibility mode: definition of \it ignored.
+Compatibility mode: definition of \sl ignored.
+Compatibility mode: definition of \sc ignored.
+LaTeX Info: Redefining \cal on input line 501.
+LaTeX Info: Redefining \mit on input line 502.
+\bibindent=\dimen105
+) (cv.aux)
+LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 27.
+LaTeX Font Info: ... okay on input line 27.
+LaTeX Font Info: Try loading font information for OMS+cmr on input line 41.
+
+(C:\texmf\tex\latex\base\omscmr.fd
+File: omscmr.fd 1999/05/25 v2.5h Standard LaTeX font definitions
+)
+LaTeX Font Info: Font shape `OMS/cmr/m/n' in size <10.95> not available
+(Font) Font shape `OMS/cmsy/m/n' tried instead on input line 41.
+
+Overfull \hbox (3.64485pt too wide) in paragraph at lines 54--58
+[]\OT1/cmr/m/n/10.95 July 1999---April 2000. \OT1/cmr/m/it/10.95 Tech-nol-ogy C
+on-sul-tant, AT&T Labs-Research, Shan-non Lab-o-ra-to-ries, Florham
+ []
+
+
+Overfull \hbox (5.37444pt too wide) in paragraph at lines 73--77
+[]\OT1/cmr/m/n/10.95 Summer 1987. \OT1/cmr/m/it/10.95 Re-search As-sis-tant, Ru
+t-gers Uni-ver-sity Com-puter Sci-ence De-part-ment, New Brunswick,
+ []
+
+[1
+
+] [2] [3]
+Overfull \hbox (1.73416pt too wide) in paragraph at lines 291--295
+[]\OT1/cmr/m/n/10.95 W. Co-hen, ``Hard-ness Re-sults for Learn-ing First-Order
+Rep-re-sen-ta-tions and Pro-gram-ming by Demon-
+ []
+
+[4] [5] [6] [7] [8]
+LaTeX Font Info: External font `cmex10' loaded for size
+(Font) <10.95> on input line 667.
+LaTeX Font Info: External font `cmex10' loaded for size
+(Font) <8> on input line 667.
+LaTeX Font Info: External font `cmex10' loaded for size
+(Font) <6> on input line 667.
+LaTeX Font Info: Try loading font information for U+lasy on input line 667.
+ (C:\texmf\tex\latex\base\ulasy.fd
+File: ulasy.fd 1998/08/17 v2.2e LaTeX symbol font definitions
+) [9] (cv.aux) )
+Here is how much of TeX's memory you used:
+ 404 strings out of 96052
+ 3954 string characters out of 1197190
+ 50308 words of memory out of 1050926
+ 3356 multiletter control sequences out of 35000
+ 12042 words of font info for 44 fonts, out of 500000 for 1000
+ 14 hyphenation exceptions out of 607
+ 23i,4n,19p,191b,239s stack positions out of 1500i,500n,5000p,200000b,32768s
+
+Output written on cv.dvi (9 pages, 36492 bytes).
diff --git a/cv.pdf b/cv.pdf
new file mode 100644
index 0000000..c9df6fc
Binary files /dev/null and b/cv.pdf differ
diff --git a/cv.tex b/cv.tex
new file mode 100755
index 0000000..bfd8aae
--- /dev/null
+++ b/cv.tex
@@ -0,0 +1,741 @@
+%to do:
+% check ILP book info
+% web site
+
+\documentstyle[11pt]{article}
+
+\setlength{\textheight}{9.25in}
+\setlength{\textwidth}{7in}
+\setlength{\oddsidemargin}{-0.5in}
+\setlength{\evensidemargin}{-0.5in}
+\setlength{\topmargin}{-0.5in}
+\renewcommand{\topfraction}{0.99}
+\renewcommand{\bottomfraction}{0.99}
+\renewcommand{\textfraction}{0.01}
+
+\newcommand{\prt}[1]{\vspace{\baselineskip}
+
+{\noindent \bf #1:}}
+\newcommand{\prti}[2]{\vspace{\baselineskip}
+
+\noindent {\bf #1.} #2}
+\newcommand{\bi}{\begin{itemize}}
+\newcommand{\ei}{\end{itemize}}
+\newcommand{\bd}{\begin{description}}
+\newcommand{\ed}{\end{description}}
+
+\begin{document}
+
+\begin{center}
+{\bf William W. Cohen}\\
+%6941 Rosewood St, Pittsburgh, PA 15208\\
+Center for Automated Learning and Discovery\\
+Carnegie Mellon University\\
+5000 Forbes Ave, Pittsburgh, PA 15213 \\
+Email: {\tt william@wcohen.com}\\
+Web: {\tt http://www.wcohen.com}\\
+\end{center}
+
+\prt{Professional Experience}
+\bi
+\item May 2003--present. {\it Associate Research Professor, Center for
+Automated Learning and Discovery, Carnegie-Mellon University,
+Pittsburgh, PA}.
+\item July 2002--May 2003. {\it Visiting Associate Professor, Center for
+Automated Learning and Discovery, Carnegie-Mellon University,
+Pittsburgh, PA}. Conducted joint research with Steve Fienberg
+(Statistics), John Lafferty (Computer Science), Tom Mitchell (CALD),
+and Robert Murphy (Biology).
+\item April 2000--May, 2002. {\it Distinguished Research Scientist,
+Whizbang Labs, Pittsburgh, PA}. Conducted and supervised research and
+development for an information extraction company.
+\item November 2000--present. {\it Adjunct Faculty, Center for Automated
+Learning and Discovery, Carnegie Mellon School of Computer Science.}
+\item July 1999---April 2000. {\it Technology Consultant, AT\&T
+Labs-Research, Shannon Laboratories, Florham Park NJ}. Senior research
+scientist in the Machine Learning and Information Retrieval Research
+Department.
+\item February 1996---July 1999. {\it Principle Research Staff Member,
+AT\&T Labs-Research, Shannon Laboratories, Florham Park NJ}. Research
+scientist in the Machine Learning and Information Retrieval Research
+Department.
+\item September 1990---February 1996. {\it Member Technical Staff, AT\&T Bell
+Laboratories, Murray Hill NJ}. Research scientist in the Artificial
+Intelligence Research Department and (later) in the
+Information Extraction Research Department.
+\item September 1994---October 1994. {\it Visiting Senior Research Associate,
+Basser Department of Computer Science, University of Sydney, Sydney,
+Australia.} Conducted research in the research group
+headed by Dr.~J.~R.~Quinlan.
+\item Summer 1988. {\it Summer Intern, Bell Communications
+Research, Piscataway NJ}. Integration of natural-language front end
+to an advanced planning system.
+\item Summer 1987. {\it Research Assistant, Rutgers University
+Computer Science Department, New Brunswick, NJ}. Extension of the
+learning subsystem of a knowledge-based expert system for VLSI circuit
+design.
+\item November 1985---August 1986. {\it Member Technical Staff,
+Computer Science Corporation, under contract to the Space Telescope
+Science Institute, Baltimore MD.} Design and implementation of a
+natural language query system as part of a decision support system for
+selecting observations to be run on the Hubble Space Telescope.
+\item June 1984---September 1985. {\it Computer Aided Design R\&D Specialist,
+General Electric Microelectronics Center, Research Triangle Park, NC.}
+Project leader for an expert system project on optimization of
+combinational logic.
+\ei
+
+\newpage
+
+\prt{Education}
+
+\prti{Ph.D. in Computer Science}{Rutgers University, New Brunswick, New
+Jersey. August 1990. 4.0/4.0 average. Doctoral thesis, {\it
+Explanation Based Generalization as an Abstraction Mechanism for
+Concept Learning}, under the direction of Dr. Alex Borgida.}
+AT\&T Bell Laboratories Fellowship, September 1989---August 1990;
+Marion Johnson Fellowship, September 1986---June 1989.
+
+\prti{M.S. in Computer Science}{Rutgers University, New Brunswick, New
+Jersey. May 1988. 4.0/4.0 average.}
+
+\prti{B.S. in Computer Science}{Duke University, Durham, North Carolina.
+May 1984. Second major in mathematics. 3.9/4.0 average. Senior
+Honor's thesis under the direction of Dr. Allen Biermann.}
+Graduation Summa Cum Laude, May 1984;
+%Phi Beta Kappa, Spring 1983.
+%Liggett Group Scholarship, 1983---1984;
+%Class Honors and Dean's List, 1981---1984.
+
+\prt{Professional Service}
+\bi
+\item November 2001---present. Member of the Governing Board of the
+International Machine Learning Society. (The governing board is a
+group of fifteen whose principle mandate is to administer the
+International Machine Learning Conference. The members were chosen by
+an open election among all attendees of the last three conferences.)
+\item May 2002---present. Member of a National Research Council
+Subcommittee for the review of the NASA's Computing, Information, and
+Communications Technology (CICT) Program.
+\item September 2001---present.
+Action Editor for {\it The Journal of Machine Learning Research\/}.
+\item May 2000---September 2001. Member of the editorial
+board for the {\it Journal of Machine Learning Research.}
+\item January 1997---September 2001.
+Action Editor for the journal {\it Machine Learning\/}.
+\item January 1995---December 1997. Associate Editor for the
+{\it Journal of Artificial Intelligence Research}.
+\item March 1998. With Jaime Carbonell and Yiming Yang (of CMU),
+editor of special issue of the journal {\it Machine Learning\/} on the
+topic ``{machine learning and information retrieval\/}''.
+\item January 1998---present. Member of the advisory board for
+the {\it Journal of Artificial Intelligence Research}.
+\item September 1997---present. Member of the editorial
+board for {\it Applied Intelligence.}
+\item Co-chair of the 1994 International Machine Learning Conference.
+\item Member of advisory committees for ML-98, ML-97, ML-96, ML-95.
+\item Area chair for ML-2000, SIGIR-2001, and SIGIR-2002.
+\item Program committee member for KDD-2003, SIGMOD-2003, WWW-2003,
+NIPS-2002, ICML-2002, ICML-2001, SIGIR-2001, WWW-2000,
+ILP-2000, SIGIR-99, WWW-99, ILP-99,
+COLT-98, ML-97, ILP-97, AAAI-96, ALT-96, ILP-95, ILP-94, AAAI-93, and
+ML-93, and various specialized workshops.
+\item Member of organizing committee for AAAI-92 workshop on
+``Constraining Learning Using Prior Knowledge''.
+\item Reviewer for numerous journals including {\it Artificial
+Intelligence}, {\it The Journal of Logic Programming},
+{\it Information Processing Letters},
+and {\it Theoretical Computer Science}.
+%\item Past president of the Rutgers Computer Science Graduate Student Society.
+\ei
+
+\newpage
+
+\prt{Teaching and Supervisory Experience}
+\bi
+\item Spring 2002. {\it Center for Automated Learning and Discovery,
+Carnegie-Mellon University, Pittsbugh PA}. With T.~Mitchell,
+taught a graduate seminar in knowledge discovery from data.
+\item Fall 2000. {\it Computer Science Department,
+Carnegie-Mellon University, Pittsbugh PA}. With
+J.~Baxter, A.~McCallum, T.~Mitchell, F.~Pereira,
+taught a graduate seminar in text mining and
+information extraction.
+\item September 1998---April 2000. Member of the AT\&T Labs
+Fellowship Selection Committee.
+\item Fall 1997. {\it Computer Science Department,
+Rutgers University, New Brunswick, NJ}. With Y.~Freund and
+R.~Schapire, taught a graduate seminar in machine learning.
+\item Spring 1996. {\it Computer Science
+Department, Columbia University, New York, NY}.
+Taught a course in machine learning.
+\item Committee member for four Ph.D theses: Chunki Basu (Rutgers),
+Daniel Kudenko (Rutgers), John Zelle (University of Texas at Austin),
+and Peter Whigham (Australian National University).
+\ei
+
+\prt{Invited Talks and Seminars}
+\bi
+\item October 2002, invited talk, ``Exploiting Document Structure in
+Information Extraction and Document Classification'', McKay
+Distinguished Lecture, University of California, Berkeley.
+\item October 2001, invited talk, ``Issues in Extracting Information
+from the Web'', at the 7th International Workshop on Parsing
+Technologies, Sponsored by ACL/SIGPARSE, Bejing, China.
+\item December 2000, invited talk, ``Learning Using the Web as Background Knowledge'',
+at the Eleventh International Conference on Algorithmic Learning
+Theory, Sydney, Australia.
+\item June 1999, keynote address, ``What can we learn from the Web?''
+at the 16th International Conference on Machine Learning, Bled,
+Slovenia.
+\item March 1996, invited talk, ``What the Well-Informed IR
+Researcher Should Know About Machine Learning'', at the 1996
+AAAI Spring Symposium on Machine Learning and Information Access,
+Palo Alto, CA.
+\item September 1995, invited talk,
+``Learning to Classify English Text with ILP
+Methods'', at the Fifth International Workshop on Inductive Logic
+Programming, Leuven, Belgium.
+\ei
+
+\prt{Patents}
+\bi
+\item {\em Method and apparatus for extracting data from data sources on
+a network}. Patent \#6,516,308.
+\item {\em A system and method for accessing heterogeneous databases}.
+Patent \#6,295,533.
+\item {\em A system and method for finding information in a
+distributed information system using query learning
+and meta search}. Patent \# 5,347,623.
+\item {\em Rule induction on large noisy data sets}. Patent \# 5,719,692.
+\item {\em Software discovery system}. Patent \# 5,642,472.
+\item {\em Biased learning system}. Patents \# 5,481,650 and \# 5,627,945.
+\ei
+
+\newpage
+
+% total pubs count = 95
+
+\prt{Journal Publications}
+\bd
+\item[submitted] S.~Zelikovitz, H.~Hirsh, W.~Cohen,
+ ``Using and Extending WHIRL Queries for Text Classification'',
+ submitted for journal publication.
+
+\item[submitted] W.~Cohen, ``Learning and Discovering Structure in Web Pages'',
+ submitted for journal publication.
+
+\item[in press] M.~Bilenko, R.~Mooney, W.~Cohen, P.~Ravikumar, S.~Fienberg
+ ``Adaptive Name-Matching in Information Integration'',
+ {\it IEEE Intelligent Systems}, in press.
+
+\item[2001] C.~Basu, H.~Hirsh, W.~Cohen, C.~Neville-Manning,
+ ``Technical Paper Recommendation: A Study in Combining
+ Multiple Information Source'', in {\it
+ The Journal of Artificial Intelligence Research} 14,
+ pp 231-252.
+
+\item[2000] W.~Cohen,
+ ``Data Integration using Similarity Joins and
+ a Word-based Information Representation Language'',
+ in {\it ACM Transactions on Information Systems},
+ 18(3), July 2000, pp 288-321.
+
+\item[2000] W.~Cohen and W.~Fan, ``Web-collaborative filtering: recommending
+music by crawling the Web'', in {\it Computer Networks}, 33, pp 685--698.
+ A version of this paper also appeared in the proceedings of
+ the {\em Ninth International World Wide Web Conference
+ (WWW-2000)}.
+
+\item[2000] W.~Cohen,
+ ``WHIRL: A Word-Based Information Representation Language'',
+ {\it Artificial Intelligence}, 118, pp 163--196.
+
+\item[2000] W.~Cohen, P.~Devanbu,
+ ``Automatically Exploring
+ Hypotheses about Fault Prediction: a Comparative Study of Inductive
+ Logic Programming Methods'',
+ {\it International Journal of
+ Software Engineering and Knowledge Engineering},
+ 9(5), pp 519--546.
+
+\item[1999] W.~Cohen, Y.~Singer,
+ ``Context-sensitive learning methods for text categorization'',
+ in {\it ACM Transactions on Information Systems}, 17(2), Apr 1999,
+ pages 171-173.
+
+\item[1999] W.~Cohen, Y.~Singer, R.~Schapire,
+ ``Learning to Order Things'', in
+ {\it Journal of Artificial Intelligence Research},
+ 10, 1999, pp 243-270.
+
+\item[1999] W.~Cohen, W.~Fan,
+ ``Learning page-independent heuristics for extracting data
+ from Web pages'', {\it International Journal of Computer and
+ Telecommunications Networking}, 31, pp 1641--1652. Versions of
+ this paper also appeared in the proceedings of
+ the {\em Eighth International World Wide Web Conference
+ (WWW-99)} and {\em the 1999 AAAI Spring Symposium Workshop on Intelligent
+ Agents in Cyberspace}.
+
+\item[1999] W.~Cohen,
+ ``Reasoning about Textual Similarity in Information Access'',
+ {\it Autonomous Agents and Multi-Agent Systems}, 2, 1999,
+ pp 65--86.
+
+\item[1998] W.~Cohen, ``The WHIRL Approach to Information Integration'',
+ {\it IEEE Intelligent Systems}, Sept/Oct 1998, pp 20--23.
+ (In the {\em Trends and Controversies\/} feature
+ on information integration.)
+
+\item[1998] W.~Cohen, ``Hardness Results for Learning
+ First-Order Representations and Programming by
+ Demonstration'', {\it Machine Learning}, 30(1),
+ pp 57--88.
+
+\item[1996] W.~Cohen, ``Adaptive Mapping and Navigation by
+ Teams of Simple Robots'', {\it Robotics
+ and Autonomous Systems}, 18, pp 411--434.
+
+\item[1996] W.~Cohen, ``Pac-Learning Non-Recursive Prolog
+ Clauses'', {\it Artificial Intelligence},
+ 79(1), pp 1--38.
+
+\item[1995] W.~Cohen and D.~Page, ``Polynomial Learnability
+ and Inductive Logic Programming: Methods and Results'',
+ {\it New Generation Computing}, 13(3).
+
+\item[1995] W.~Cohen, ``Pac-Learning Recursive Logic
+ Programs: Efficient Algorithms'', {\it
+ Journal of Artificial Intelligence Research},
+ Vol 2, pp 500--539.
+
+\item[1995] W.~Cohen, ``Pac-Learning Recursive Logic
+ Programs: Negative Results'', {\it
+ Journal of Artificial Intelligence Research}.
+ Vol 2, pp 541--573.
+
+\item[1995] W.~Cohen, ``Inductive Specification Recovery:
+ Understanding Software by Learning from Example Behaviors'',
+ {\it Automated Software Engineering},
+ 2(2).
+
+\item[1994] W.~Cohen, H.~Hirsh, ``Learnability of Description Logics
+ with Equality Constraints'',
+ {\em Machine Learning\/}, 17(2/3).
+
+\item[1994] W.~Cohen, ``Grammatically Biased Learning:
+ Learning Logic Programs Using an Explicit
+ Antecedent Description Language'',
+ {\em Artificial Intelligence}, Vol 68, pp 303-366.
+
+\item[1994] W.~Cohen, ``Incremental Abductive EBL'',
+ {\em Machine Learning\/}, 15(1).
+
+\item[1994] L.~T.~McCarty, W.~Cohen, ``The Case for Explicit
+ Exceptions'', in {\em Methods of
+ Logic in Computer Science}, 1(1).
+
+\item[1992] W.~Cohen, ``Abductive Explanation-Based Learning:
+ A Solution to the Multiple Inconsistent Explanation
+ Problem in Explanation-Based Learning'',
+ {\em Machine Learning}, 8(2).
+
+\item[1992] W.~Cohen, ``Using Distribution-Free Learning Theory
+ to Analyze Solution Path Caching Mechanisms'',
+ {\em Computational Intelligence}, 8(2).
+
+\item[1986] K.~Bartlett, W.~Cohen, A.~De Geus, G.~Hachtel,
+ ``Synthesis and Optimization of Multi-level Logic
+ Under Timing Constraints'', {\em IEEE Transactions on
+ Computer-Aided Design}, October 1985.
+
+\item[1985] A.~De Geus, W.~Cohen, ``Optimization of
+ Combinational Logic Using a Rule-Based Expert
+ System'', {\em IEEE Design and Test of Computers},
+ August 1985.
+\ed
+
+\prt{Book Chapters}
+\bd
+\item[2003] W.~Cohen, M.~Hurst, L.~S.~Jensen,
+ ``A Flexible Learning System for Wrapping Tables and Lists in HTML Documents'',
+ in ``Web Document Analysis: Challenges and Opportunities'', ed. Antonacopoulos \& Hu,
+ Word Scientific Publishing.
+
+\item[1996] W.~Cohen, ``Learning to Classify English Text with
+ ILP Methods'', in ``Advances in ILP'',
+ ed.~L.~de Readt, IOS Press.
+
+\item[1994] W.~Cohen, R.~Greiner, D.~Schuurmans,
+ ``Probabilistic Hill-Climbing'',
+ in ``Computational learning theory and
+ natural learning systems (Volume II)'', MIT Press.
+
+\item[1994] H.~Hirsh, W.~Cohen, ``Learning from Data with Bounded
+ Inconsistency: Theoretical and Experimental Results'',
+ in ``Computational learning theory and natural learning
+ systems (Volume I)'', MIT Press.
+
+%W.~Cohen, R.~Greiner and D.~Schuurmans,
+%``Probabilistic Hill-Climbing'',
+%in {\em Computational Learning Theory and Natural Learning Systems,
+%Volume II: Intersection between Theory and Experiment},
+%ed.~S.~Hanson, T.~Petsche, M.~Kearns and R.~Rivest,
+%Chapter 11, p.~171--81, MIT Press, 1994.
+
+\ed
+
+\prt{Rigorously Refereed Conference Publications}
+\bd
+\item[2003] C.~Zhai, W.~Cohen, J.~Lafferty,
+ ``Beyond Independent Topical Relevance: Methods and Evaluation Metrics for Subtopic Retrieval'',
+ in {\it Proceedings of the 26th Annual International ACM SIGIR Conference (SIGIR-2003)}.
+
+\item[2003] W.~Cohen, R.~Wang, R.~F.~Murphey,
+ ``Understanding Captions in Biomedical Publications''
+ in {\em Proceedings of the Ninth International Conference on Knowledge Discovery and
+ Data Mining (KDD-2003)}.
+
+\item[2003] W.~Cohen,
+ ``Infrastructure Components for Large-Scale Information Extraction
+ Systems'', in {\em Proceedings of the
+ Fifteenth Innovative Applications of Artificial
+ Intelligence Conference (IAAI-03)}.
+
+\item[2002] W.~Cohen,
+ ``Improving A Page Classifier with Anchor Extraction and Link
+ Analysis'', in {\em Neural Information Processing
+ (NIPS-2002)}.
+
+\item[2002] W.~Cohen, J. Richman,
+ ``Learning to Match and Cluster Large High-Dimensional Data
+ Sets For Data Integration'', in {\em Proceedings of
+ the Eighth International Conference on Knowledge Discovery and
+ Data Mining (KDD-2002)}.
+
+\item[2002] W.~Cohen, M.~Hurst and L.~Jensen,
+ ``A Flexible Learning System for Wrapping Tables and Lists in
+ HTML Documents'', in {\em Proceedings of the Eleventh
+ International World Wide Conference (WWW-2002)}.
+
+\item[2000] W.~Cohen, ``Automatically extracting features for concept learning from the Web'',
+ in {\em Proceedings of the Seventeenth International Machine Learning
+ Conference} (ICML-2000).
+
+\item[2000] W.~Cohen, H.~Kautz, D.~McAllester ``Hardening Soft
+ Information Sources'' in {\em Proceedings of the Sixth
+ International Conference on Knowledge Discovery and Data Mining}
+ (KDD-2000).
+
+\item[1999] W.~Cohen and Y.~Singer, ``A Simple, Fast, and Effective
+ Rule Learner'', in {\em Proceedings,
+ Seventeenth National Conference on
+ Artificial Intelligence} (AAAI-99).
+
+\item[1999] W.~Cohen, ``Recognizing Structure in Web Pages using Similarity Queries'',
+ in {\em Proceedings, Seventeenth National Conference on
+ Artificial Intelligence} (AAAI-99).
+
+\item[1998] W.~Cohen, ``Joins that Generalize: Text Classification
+ Using a Similarity-Based Database'', in {\em Proceedings of the Fourth
+ International Conference on Knowledge Discovery and Data Mining}
+ (KDD-98).
+
+\item[1998] C.~Basu and H.~Hirsh and W.~Cohen, ``Recommendation as
+ Classification: Combining Social and Content-Based Information in
+ Recommendation'', in
+ {\em Proceedings,
+ Sixteenth National Conference on
+ Artificial Intelligence} (AAAI-98).
+
+\item[1998] W.~Cohen, ``Integration of Heterogeneous Databases Without
+ Common Domains Using Queries Based on Textual Similarity'',
+ in {\em Proceedings of ACM SIGMOD International Conference
+ on Management of Data} (SIGMOD-98).
+
+\item[1998] W.~Cohen, ``A Web-based Information System that Reasons
+ with Structured Collections of Text'', in
+ {\em Autonomous Agents 1998.}
+
+\item[1997] W.~Cohen and R.~Schapire and Y.~Singer, ``Learning to
+ Order Things'' in {\em Proceedings of the
+ 1997 Conference on Neural
+ Information Processing Systems} (NIPS-97).
+
+\item[1997] W.~Cohen and D.~Kudenko, ``Transferring and Retraining
+ Learned Information Filters'', in {\em Proceedings,
+ Fifteenth National Conference on
+ Artificial Intelligence} (AAAI-97).
+
+\item[1997] W.~Cohen and P.~Devanbu, ``A Comparative Study of
+ Inductive Logic Programming Methods for Software
+ Fault Prediction'', {\em Machine Learning, Proceedings
+ of the Fourteenth International Conference} (ML-96).
+
+\item[1996] W.~Cohen and Y.~Singer, ``Context-sensitive Learning Methods for
+ Text Categorization'',
+ in {\em Nineteenth Annual International ACM SIGIR
+ Conference on Research and Development in
+ Information Retrieval} (SIGIR-96).
+
+\item[1996] W.~Cohen, ``Learning Trees and Rules with Set-Valued Attributes'',
+ in {\em Proceedings, Fourteenth National Conference on
+ Artificial Intelligence} (AAAI-96).
+
+\item[1996] W.~Cohen, ``The Dual DFA Learning Problem:
+ Hardness Results for Programming by Demonstration
+ and Learning First-Order Representations'',
+ in {\em Proceedings of the Ninth Annual
+ ACM Conference on Computational Learning
+ Theory} (COLT-96).
+
+\item[1995] W.~Cohen, ``Fast Effective Rule Induction'',
+ in {\em Machine Learning, Proceedings
+ of the Twelfth International Conference} (ML-95).
+
+\item[1995] W.~Cohen, ``Text Categorization and Relational Learning'',
+ in {\em Machine Learning, Proceedings
+ of the Twelfth International Conference} (ML-95).
+
+\item[1994] W.~Cohen, ``Pac-learning Nondeterminate Clauses'',
+ in {\em Proceedings, Twelth National Conference on
+ Artificial Intelligence} (AAAI-94).
+
+\item[1994] W.~Cohen, ``Recovering Software Specifications with
+ Inductive Logic Programming'',
+ in {\em Proceedings, Twelth National Conference on
+ Artificial Intelligence} (AAAI-94).
+
+\item[1994] W.~Cohen and H.~Hirsh, ``Learning the CLASSIC Description
+ Logic: Theoretical and Experimental Results'',
+ in {\em Principles of Knowledge Representation and
+ Reasoning: Proceedings of the Fourth International
+ Conference\/} (KR-94).
+
+\item[1993] W.~Cohen, ``Cryptographic Limitations
+ on Learning One-Clause Logic Programs'',
+ in {\em Proceedings of the Eleventh National Conference
+ on Artificial Intelligence} (AAAI-93).
+
+ This paper was also one of ten papers
+ nominated for the Best Written Paper Award.
+
+\item[1993] W.~Cohen, ``Pac-Learning a Restricted Class of
+ Recursive Logic Programs'',
+ in {\em Proceedings of the Eleventh National Conference
+ on Artificial Intelligence} (AAAI-93).
+
+\item[1993] W.~Cohen, ``Efficient Pruning Methods for
+ Separate-and-Conquer Rule Learning Systems'',
+ in {\em Proceedings of the 13th International
+ Joint Conference on Artificial Intelligence} (IJCAI-93).
+
+\item[1992] W.~Cohen, A.~Borgida, H.~Hirsh ``Computing Least
+ Common Subsumers in Description Logics'', in {\em
+ Proceedings of the Tenth National Conference on
+ Artificial Intelligence} (AAAI-92).
+
+\item[1992] W.~Cohen, ``Compiling Prior Knowledge Into An Explicit
+ Bias'', in {\em Proceedings of the Eighth Annual
+ Conference on Machine Learning} (ML-92).
+
+\item[1992] W.~Cohen, H.~Hirsh, ``Learnability of Description Logics'',
+ in {\em Proceedings of the Fourth Annual Workshop
+ on Computational Learning Theory} (COLT-92).
+
+\item[1990] W.~Cohen, ``Using Distribution-Free Learning Theory to
+ Analyze Chunking'', in {\em Proceedings of the
+ Canadian Conference on Artificial Intelligence}.
+
+\item[1990] W.~Cohen, ``Learning Approximate Control Rules of High
+ Utility'', in {\em Proceedings of the Seventh
+ International Machine Learning Conference} (ML-90).
+
+\item[1990] W.~Cohen, ``An Analysis of Representation Shift in
+ Concept Learning'', in {\em Proceedings of the Seventh
+ International Machine Learning Conference} (ML-90).
+
+\item[1990] W.~Cohen, ``Learning from Textbook Knowledge: A Case Study'',
+ in {\em Proceedings of the Eighth National Conference
+ on Artificial Intelligence} (AAAI-90).
+
+\item[1988] W.~Cohen, ``Generalization of Number and Learning from
+ Multiple Examples in Explanation Based Learning'',
+ in {\em Proceedings of the Fifth International Machine
+ Learning Conference} (ML-88).
+
+\ed
+
+
+\prt{Other Conference and Workshop Publications}
+\bd
+\item[2003] W.~Cohen, Z.~Kou R.~Murphy,
+ ``Extracting Information from Text and Images for Location Proteomics'',
+ in {\it Proceedings of the 3rd Workshop on Data Mining in Bioinformatics,
+ held in conjunction with 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
+ (KDD'2003)}
+
+\item[2003]W~.Cohen, P.~Ravikumar, S.~Fienberg,
+ ``A Comparison of String Metrics for Matching Names and Records'',
+ in {\it Proceedings of the Workshop on Data Cleaning and Object Consolidation,
+ held in conjunction with 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
+ (KDD'2003)}.
+
+\item[2003]W~.Cohen, P.~Ravikumar, S.~Fienberg,
+ ``A Comparison of String Distance Metrics for Name-Matching
+ Tasks'', in {\it Proceedings of the 2003 Workshop on
+ Information Integration on the Web (IIWeb-03), held in
+ conjunction with the Eighteenth International Joint Conference
+ On Artificial Intelligence (IJCAI-03).}
+
+\item[2001] W.~Cohen and J.~Richman, ``Learning to Match and Cluster Entity
+Names'', in {\em Proceedings of the ACM SIGIR-2001 Workshop on Mathematical/Formal Methods in
+Information Retrieval}.
+
+\item[2001] W.~Cohen and L.~Jensen, ``A Structured Wrapper Induction
+System for Extracting Information from Semi-Structured Documents'', in
+{\em Proceedings of the IJCAI-2001 Workshop on Adaptive Text
+Extraction and Mining}.
+
+\item[2001] L. Jensen and W.~Cohen, ``Grouping Extracted Fields'',
+in {\em Proceedings of the IJCAI-2001 Workshop on Adaptive Text
+Extraction and Mining}.
+
+\item[2000] W.~Cohen, ``Learning Using the Web as Background Knowledge'',
+ abstract of invited talk at the {\em Eleventh International
+ Conference on Algorithmic Learning Theory}, Sydney, Australia.
+
+\item[1999] W.~Cohen, ``What Can We Learn from the Web?'', abstract of
+ the keynote address at the {\em Sixteenth International Machine Learning
+ Conference}, Bled, Slovenia.
+
+\item[1999] W.~Cohen, ``Some practical observations on integration of
+ Web information'', in {\em the ACM Sigmod Workshop on the Web
+ and Databases (WebDB'99)}.
+
+\item[1998] W.~Cohen, ``The WHIRL Approach to Integration: An Overview'',
+ in the {\em AAAI-98 Workshop on AI and Information
+ Integration}.
+
+ This paper was the only submission to be
+ selected for full plenary presenation.
+
+\item[1997] W.~Cohen, ``Knowledge Integration for Structured
+ Information Sources Containing Text (Extended
+ Abstract)'', in the {\em SIGIR-97 Workshop on
+ Networked Information Retrieval}.
+
+\item[1996] W.~Cohen and Y.~Singer, ``Learning to Query the Web'',
+ in {\em Proceedings of the 1996 AAAI Workshop
+ on Internet-based Information Systems}.
+
+\item[1996] W.~Cohen, ``Learning Rules that Classify E-mail'',
+ in {\em Proceedings of the AAAI Spring Symposium on
+ Machine Learning and Information Access}.
+
+\item[1993] W.~Cohen, ``A Review of `Creating a Memory of
+ Causal Relationships' by Michael Pazzani'',
+ {\em Machine Learning}.
+
+\item[1993] P.~Rosenbloom, H.~Hirsh, W.~Cohen, B.~Smith, ``Two
+ frameworks for integrating knowledge in induction'',
+ in {\em Proceedings of the Seventh Annual Workshop on
+ Space Operations, Applications, and Research (SOAR '93)}
+
+\item[1993] W.~Cohen, ``Rapid Prototyping of ILP Systems Using Explicit
+ Bias'' in {\em Proceedings of the 1993
+ IJCAI Workshop on Inductive Logic Programming}.
+
+\item[1993] W.~Cohen, ``Learnability of Restricted Logic Programs'',
+ in {\em Proceedings of the Third International
+ Workshop on Inductive Logic Programming} (ILP-93).
+
+\item[1992] W.~Cohen, ``Desiderata for Generalization-to-N Algorithms'',
+ in {\em Analogical and Inductive Inference: Proceedings
+ of International Workshop} (AII-92).
+
+\item[1991] W.~Cohen ``The Generality of Overgenerality'',
+ in {\em Proceedings of the Eighth International
+ Machine Learning Workshop} (ML-91).
+
+\item[1991] R.~Greiner, W.~Cohen ``Probabilistic Hill-Climbing'',
+ in {\em Proceedings of the 1991 Workshop on Computational
+ Learning Theory and Natural Learning Systems} (CLNL-91).
+
+\item[1990] W.~Cohen, ``Learning from Examples and an `Abductive
+ Theory'$\,$'', in {\em Proceedings of the 1990
+ AAAI Spring Symposium on Abduction}.
+
+\item[1988] W.~Cohen, J.~Mostow, A.~Borgida, ``Generalizing Number
+ in Explanation Based Learning'', {\em Proceedings of
+ the 1988 AAAI Spring Symposium on Explanation-Based
+ Learning}.
+
+\item[1987] T.~Hornick, W.~Cohen, G.~Miller, ``A Natural Language
+ Query System for Hubble Space Telescope Proposal
+ Selection'', in {\em Proceedings of the Goddard
+ Conference on Space Applications of Artificial
+ Intelligence and Robotics}.
+
+\item[1987] G.~Miller, D.~Rosenthal, W.~Cohen, M.~Johnston,
+ ``Expert Systems Tools for Hubble Space Telescope
+ Scheduling'', in {\em Proceedings of the Goddard
+ Conference on Space Applications of Artificial
+ Intelligence and Robotics.}
+
+\item[1985] K.~Bartlett, W.~Cohen, A.~De Geus, G.~Hachtel,
+ ``Synthesis and Optimization of Multi-level Logic
+ Under Timing Constraints'', in {\em Proceedings of the
+ IEEE International Conference on Computer-Aided
+ Design}.
+
+\item[1985] W.~Cohen, K.~Bartlett, A.~De Geus, ``Impact of
+ Metarules in a Rule-Based Expert System for Gate Level
+ Optimization'', in {\em Proceedings of the IEEE
+ International Symposium on Circuits and Systems}.
+
+
+\item[1984] K.~Garrison, D.~Gregory, W.~Cohen, A.~De Geus,
+ ``Automatic Area and Performance Optimization of
+ Combinatorial Logic'', in {\em Proceedings of the IEEE
+ International Conference on Computer-Aided Design}.
+\ed
+
+\iffalse
+\newpage
+
+\section*{References for William W. Cohen}
+
+\bi
+\item Thomas G. Dietterich (tgd@cs.orst.edu),
+Professor, Computer Science Department,
+Oregon State University, Corvallis, Oregon.
+
+\item Tom Mitchell (tom.mitchell@cs.cmu.edu),
+Fredkin Professor of AI and Learning and
+Director, Center for Automated Learning and Discovery,
+School of Computer Science,
+Carnegie Mellon University,
+Pittsbugh, PA.
+
+\item Ray Mooney (mooney@cs.utexas.edu),
+Professor, Department of Computer Sciences,
+The University of Texas at Austin,
+Austin, Texas.
+
+\item Fernando Pereira (pereira@cis.upenn.edu)
+Andrew and Debra Rachleff Professor and
+Chair, Department of Computer and Information Science,
+University of Pennsylvania,
+Philadelphia, PA.
+
+\item J. Ross Quinlan (quinlan@cse.unsw.edu.au),
+Adjunct Professor, School of Computer Science and Engineering,
+University of New South Wales,
+Sydney, Australia.
+
+\ei
+\fi
+
+\end{document}
diff --git a/day3.ppt b/day3.ppt
new file mode 100755
index 0000000..f5afe6d
Binary files /dev/null and b/day3.ppt differ
diff --git a/dbirday-06.ppt b/dbirday-06.ppt
new file mode 100755
index 0000000..323a3dd
Binary files /dev/null and b/dbirday-06.ppt differ
diff --git a/declarative-learning-workshop-2018.pptx b/declarative-learning-workshop-2018.pptx
new file mode 100644
index 0000000..4eb8060
Binary files /dev/null and b/declarative-learning-workshop-2018.pptx differ
diff --git a/dict/.DS_Store b/dict/.DS_Store
new file mode 100644
index 0000000..f45b784
Binary files /dev/null and b/dict/.DS_Store differ
diff --git a/dict/bin/buildabc b/dict/bin/buildabc
new file mode 100644
index 0000000..54c967f
--- /dev/null
+++ b/dict/bin/buildabc
@@ -0,0 +1,8 @@
+#!/usr/bin/perl
+
+##############################################################################
+#build an alphabetic index for the dictionary game
+##############################################################################
+
+require 'dictutil.pl';
+
diff --git a/dict/bin/buildall b/dict/bin/buildall
new file mode 100755
index 0000000..76ab854
--- /dev/null
+++ b/dict/bin/buildall
@@ -0,0 +1,6 @@
+#!/bin/csh
+
+/home/wcohen/dict/bin/builddict
+/home/wcohen/dict/bin/buildtrie
+/home/wcohen/dict/bin/buildbonk
+
diff --git a/dict/bin/buildbonk b/dict/bin/buildbonk
new file mode 100755
index 0000000..371faf7
--- /dev/null
+++ b/dict/bin/buildbonk
@@ -0,0 +1,272 @@
+#!/usr/bin/perl
+
+##############################################################################
+#build the "bonkomatic" game
+#
+#to do---complete typing in a specified target word
+# - read in list of targets
+# - loading them into viewer
+# - cartoon "guide"
+##############################################################################
+
+$Debug=1;
+
+chdir('/Users/wcohen/Desktop/dict/stuff/_bonk/') || die("can't cd");
+
+#read sfx files
+@Sfx = ();
+opendir(SFX,'sfx') || die("can't open sound effects dir");
+while ($f = readdir(SFX)) {
+ print "file: $f\n" if $Debug>2;
+ #skip special files
+ next if $f =~ /^\./;
+ next if $f =~ /^_/;
+ push(@Sfx,$f);
+}
+closedir(SFX);
+print "// read ",$#Sfx," sound files\n" if $Debug;
+print "sound files: ",join(';',@Sfx),"\n" if $Debug>2;
+
+#read dictionary files
+opendir(DIRS,"..");
+@dirs = readdir(DIRS);
+@dirs = sort(@dirs);
+closedir(DIRS);
+print "// dirs: @dirs\n" if $Debug;
+@Words = ();
+foreach $d (@dirs) {
+ next if $d =~ /^\./;
+ next if $d =~ /^\_/;
+ while ($file = <../$d/p*0.html>) {
+ print "// file: $file\n" if $Debug>2;
+ ($word) = $file =~ /p(\w*)0.html/;
+ $word =~ tr/_/ /;
+ $word =~ tr/a-z/A-Z/;
+ $word = "'$word'";
+ push(@Words,$word);
+ $File{$word} = "'$file'";
+ }
+}
+
+sub bylength { length($a) <=> length($b) || rand() <=> 0.5};
+
+@Words = sort bylength @Words;
+@Files = ();
+foreach $w (@Words) {
+ push(@Files,$File{$w});
+}
+
+$JoinedFiles = join(",",@Files);
+$JoinedWords = join(",",@Words);
+
+
+print "// files: $JoinedFiles\n" if $Debug>2;
+print "// words: $JoinedWords\n" if $Debug>2;
+
+##############################################################################
+#print top-level frames stuff
+
+ open(TOP,">index.html") || die("can't write index.html");
+ print TOP <<"END_TOP";
+
+Bonk-o-matic: Top
+