initial copy from cmu

wwcohen · Aug 22, 2018 · 6b2a698 · 6b2a698
1 parent e13ae63
commit 6b2a698
Show file tree

Hide file tree

Showing 3,371 changed files with 257,558 additions and 0 deletions.
diff --git a/CrowdComp_MTurkData.tar.gz b/CrowdComp_MTurkData.tar.gz
diff --git a/CutOncev0.0.2a.xpi b/CutOncev0.0.2a.xpi
diff --git a/FastEffectiveClustering-v2.pdf b/FastEffectiveClustering-v2.pdf
diff --git a/FastEffectiveClustering-v2.ppt b/FastEffectiveClustering-v2.ppt
diff --git a/FastEffectiveClustering.pdf b/FastEffectiveClustering.pdf
diff --git a/FastEffectiveClustering.ppt b/FastEffectiveClustering.ppt
diff --git a/GuideToBiology-pictures-color-release1.5.pdf b/GuideToBiology-pictures-color-release1.5.pdf
diff --git a/GuideToBiology-pictures-color-release1.5.ppt b/GuideToBiology-pictures-color-release1.5.ppt
diff --git a/GuideToBiology-sampleChapter-release1.4.pdf b/GuideToBiology-sampleChapter-release1.4.pdf
diff --git a/IIWeb.ppt b/IIWeb.ppt
diff --git a/MSM-2009.ppt b/MSM-2009.ppt
diff --git a/Matching-1.ppt b/Matching-1.ppt
diff --git a/Matching-2.ppt b/Matching-2.ppt
diff --git a/Matching-3.ppt b/Matching-3.ppt
diff --git a/Shortcut to 10-802.lnk b/Shortcut to 10-802.lnk
diff --git a/SimStudent-wc.ppt b/SimStudent-wc.ppt
diff --git a/SlifTextComponent.html b/SlifTextComponent.html
@@ -0,0 +1,200 @@
+<html>
+<head>
+<title>How to use the SLIF Text Components</title>
+</head>
+<body bgcolor="white">
+
+<h2>How to use the SLIF Text Components</h2> 
+
+<h3>Invocation and basic options</h3>
+
+<p>
+The SLIF text components are distributed as single large JAR file.  To
+run it you will need a copy of Java.  A typical invocation would be
+
+<p>
+<code>
+% java -cp slifTextComponents.jar -Xmx500M SlifTextComponent -labels <i>DIR</i> -saveAs <i>FILE</i> -use <i>COMPONENT1,COMPONENT2,....</i> [<i>OPTIONS</i>]
+</code>
+
+<p>
+where -Xmx500M allocates additional memory for the Java heap, and the additional arguments are as follows:
+<ul>
+
+<li><i>DIR</i> is a directory containing some number of text files to
+annotate.  
+
+<p>A <a href="http://www.cs.cmu.edu/~wcohen/captions.tgz">sample
+directory of files is available</a> as a compressed tarfile.  These
+captions were all taken from PubMedCentral papers, e.g., the file
+"p9770486-fig_4_2" is from Figure 4 of the paper with the <a
+href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed">PubMed</a>
+Id of 9770486.
+
+<li><i>FILE</i> is where annotations should be placed.  These are output in 'minorthird format',
+which is explained below.
+
+<li><i>COMPONENT1,...</i> are the names of 'text components' to use to
+</ul>
+
+The components available are:
+  <ul>
+
+  <li><b>CellLine</b>: marks spans that are predicted to be the names
+  of cell lines with 'cellLine', using an entity-tagger trained using
+  the Genia corpus.
+
+  <li><b>CRFonYapex</b>: marks spans that are predicted to be the
+  names of genes or proteins with 'protein', and also
+  'proteinFromCRFonYapex', using a gene-taggertrained using the YAPEX
+  corpus using the CRF algorithm, as described by <a
+  href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2005.pdf">Kou,
+  Cohen and Murphy (2005)</a>.
+
+  <li><b>CRFonTexas, CRFonGenia</b>: analogous to CRFonYapex, but
+  using gene-taggers trained on different corpora (as outlined in the
+  Kou et al paper.)
+
+  <li><b>SemiCRFOnYapex, SemiCRFOnTexas, SemiCRFOnGenia</b>: analogous
+  to the CRFon* components, but trained with the SemiCRF algorithm.
+
+  <li><b>DictHMMOnYapex, DictHMMOnTexas, DictHMMOnGenia</b>: analogous
+  to the CRFon* components, but trained with the DictHMM algorithm.
+
+  <li><b>Caption</b>: marks spans according to the criteria described
+     by <a
+     href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2003.pdf">Cohen,
+     Murphy and Wang (2002)</a>: <ul>
+
+       <li>Spans marked as 'imagePointer' are predicted to be image
+       pointers.  For a definition of image pointers, see Cohen,
+       Murphy, and Wang (2002).
+
+       <li>Spans marked as 'bulletStyle' and 'citationStyle' are
+       predicted to be bullet-style and citation-style image pointers,
+       respectively.   
+
+       <li>Spans marked as 'bulletScope' and 'localScope' are predicted
+       to be the <i>scopes</i> of bullet-style and citation-style
+       image pointers, respectively.  
+
+       <li>Spans marked as 'globalScope' are text assumed to pertain
+       to the entire associated image.
+
+       <li>Spans marked as either 'bulletScope', 'localScope', or
+       'globalScope' are marked as 'scope'.  
+
+       <li>Every 'scope' span is associated with a <i>span
+       property</i> called its 'semantics'.  The 'semantics' of a span
+       is the concatenation of all the image pointers associated with
+       that span.
+
+     </ul>
+
+     Additionally, the span labels 'regional' and 'local' are synonyms
+     for bullet-style and citation-style, respectively.
+
+     <p>
+     Briefly, to find out what parts of an image some span
+     <i>S</i> might refer to, you need to (1) find out what 'scope'
+     spans <I>S</i> is inside of and (2) find out what the 'semantics'
+     of these scope spans are.  For instance, if the span 'RAS4' is
+     inside a scope <I>T1</i> with semantics "A" and also inside a
+     scope <I>T2</i> with semantics "BD", then 'RAS4' probably is
+     associated with the parts of the accompanying image labeled A",
+     "B", and "D".
+
+  </ul>
+
+<h3>The Minorthird format for stand-off annotation</h3>
+
+The format for output is the one used by <a
+href="http://minorthird.sourceforge.net">Minorthird</a>.  Specifically, the
+output (in the default format) is a series of lines in one of these
+formats:
+
+<p>
+<code>
+addToType <i>FILE</i> <i>START</i> <i>LENGTH</i> <i>SPANTYPE</i><br>
+setSpanProp <i>FILE</i> <i>START</i> <i>LENGTH</i> semantics <i>LETTERS</i>
+</code>
+
+<p>
+where 
+<ul>
+
+<li><i>FILE</i> is the name of the file containing some span;
+
+<li><i>START</i> and <i>LENGTH</i> are the initial byte position of the span, and its length;
+
+<li><i>SPANTYPE</i> is the type of span (e.g., 'imagePointer',
+'cellLine', 'protein', 'scope', etc.
+
+<li><i>LETTERS</i> is (as noted above) the concatenation of all the
+       image pointers associated with that span 
+
+</ul>
+
+<h3>Other options</h3>
+
+<table border=1>
+<tr><th>Option</th><th>Explanation</th></tr>
+
+<tr><td><code>-help</code></td>
+    <td>Gives brief command line help</td>
+</tr>
+
+<tr><td><code>-gui</code></td>
+    <td>Pops up a window that allows you to interactively fill in the other arguments, monitor the execution of the annotation process, etc.</td>
+</tr>
+
+<tr><td><code>-showLabels</code></td>
+    <td>Pops up a window that displays the set of documents being labeled.
+    (This is not recommended for a large document collection, due to 
+    memory usage.)
+</td>
+</tr>
+
+<tr><td><code>-showResult</code></td>
+    <td>Pops up a window that displays the result of the annotation.
+    (Again, not recommended for a large document collection.)
+</td>
+</tr>
+
+<tr><td><code>-format strings</code></td> <td>Outputs results as a
+    tab-separated table, instead of minorthird format. The first
+    column summarizes the type of the span, the file the span was
+    taken from, and the start and end byte positions, in a
+    colon-separated format. (E.g.,
+    "cellLine:p11029059-fig_4_1:1293:1303".)  The remaining column(s)
+    are the text that is contained in the span (e.g., "HeLa cells",
+    for the span above) almost exactly as it appears in the document; the 
+    only change is that newlines are replaced with spaces.
+ </tr>
+
+</table>
+
+<h3>References</h3>
+
+<ul>
+<li>Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2005): <a href="postscript/ismb-2005.pdf">High-Recall Protein Entity Recognition Using a Dictionary</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/ismb/ismb2005.html#KouCM05">ISMB-2005</a>.
+<li>William W. Cohen, Richard Wang & Robert Murphy (2003): <a href="postscript/ismb-2003.pdf">Understanding Captions in Biomedical Publications</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/kdd/kdd2003.html#CohenWM03">KDD 2003: 499-504</a>.
+<p>
+<li>Robert F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004): <a href="postscript/ksce-2004.pdf">Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder</a> in <a href="recent.html">KSCE-2004</a>.
+<li><a href="http://murphylab.web.cmu.edu/services/SLIF2/">The SLIF home page</a>
+</ul>
+
+<h3>Acknowledgements</h3>
+
+A number of people have contributed to these tools, including William
+Cohen, Zhenzhen Kou, Quinten Mercer, Robert Murphy, Richard Wang, and
+other members of the SLIF team.
+
+The initial development of these tools was supported by grant 017396
+from the Commonwealth of Pennsylvania Tobacco Settlement Fund. Further
+development is supported by National Institutes of Health grant R01
+GM078622.
+
+</BODY>
+</HTML>
+
diff --git a/Thumbs.db b/Thumbs.db
diff --git a/aaai-fs-2012.ppt b/aaai-fs-2012.ppt
diff --git a/aaai-ss-2015.ppt b/aaai-ss-2015.ppt
diff --git a/advice.html b/advice.html
@@ -0,0 +1,181 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html> 
+<head> 
+<link rel="SHORTCUT ICON" href="CarnegieMellon_logo.gif">
+<title>Advice for Technical Speaking</title> 
+</head> 
+
+<body bgcolor = "#ffcc99">   
+<table bgcolor = "#ffffff" width = "70%" align = "center" border="0" cellpadding = "15">  
+<tr> 
+<td> 
+<h3>Advice for technical speaking</h3>
+
+<h5>Shamelessly pilfered from <a
+href="http://www.cs.cmu.edu/~ggordon/speaking-advice.html">Geoff
+Gordon's advice page</a></h5>
+
+When you hear a talk - a good one or a bad one! - think about the
+presentation as well as the content.  Copy what works, and avoid what
+doesn't. If you see a great talk, examine it and try to figure out
+what makes it great.  If you see a poor talk, examine it and ask
+yourself if you might make the same mistakes. Some of the most common
+mistakes are below, but people are really quite creative in coming up
+with new mistakes, so don't assume this list is complete!<p>
+
+Don't use too many slides. If you have more than one slide per minute,
+you are definitely using too many. One slide per two minutes is a much
+more reasonable pace.<p>
+
+Don't read your slides.  You do not need to put everything you are
+going to say up on a slide; that's what speaker notes are for. Save
+your slides for things that don't work as well with just speech:
+figures, diagrams, movies, animations, extra emphasis on important
+concepts. If your slides are just lists of bullets, they should
+probably be speaker notes and not slides.<p>
+
+Don't put too many pixels on your slides. Use big fonts and
+contrasting colors. Projectors are notoriously bad at making ochre and
+mauve actually look different on the screen. If you copy a figure from
+a paper, ask yourself whether the text labels are big enough to read
+from the back of the room, and redo them if not.<p>
+
+Don't try and say too much.  You can't explain everything you've been
+going in a semester project in 30 minutes - part of communicating is
+deciding what to leave out.  If you feel like you have to rush to say
+what you need to say, you're going too fast: the presentation should
+be relaxed enough that people have a chance to reflect on what you
+say, and ask questions if they need to.<p>
+
+Talk concretely about your work.  People are great at abstracting from
+examples, but it's hard work for them to think through high-level
+abstractions. (This is the opposite case from when you're programming
+a computer - then you always program the most general case possible,
+and let the computer instantiate it as needed.)  When you're talking
+to a person, start with a concrete problem you want to solve, and then
+help the person understand how to generalize that concrete problem to
+the general case.<p>
+
+View your talk as an advertisement for your paper(s).  Your goal is
+to convince your audience to use your ideas for their own work, so
+that they cite you and make you famous. Your goal is <b>not</b> to
+make them understand Equation 43 on page 17 (unless that convinces
+them to cite you and make you famous). Instead, say what your
+techniques are good for, why they're important, what the alternatives
+are, and how to choose when your techniques are appropriate instead of
+the alternatives. Then and only then, use the rest of the time on
+technical stuff, with the goal of giving listeners the tools to read
+your paper. (If you're talking about someone else's work, imagine
+instead that you're trying to get the audience to trust your
+evaluation of that work.)<p>
+
+Be honest and diligent.  Don't try to cover up flaws or overstate the
+applicability of your techniques; instead, try to discover flaws and
+limitations and expose them.  
+
+Think concretely about your audience.  Will they be able to understand
+each slide as it comes up?  Will they understand why each slide is
+important? As a heuristic, I often find it best to prepare the first
+version of a talk with <it>one</it> specific person I know well in
+mind - and think about what I would say to engage and inform him or
+her specifically.<p>
+
+Talk at the right level for your audience. Remember that, almost by
+definition, you understand the material and they don't, and fight the
+inclination to go too fast.  Be aware of people's cognitive
+limitations. Don't make your audience figure something out if you
+don't have to; that will save more processing power for what you want
+them to focus on. In particular:
+<UL>
+<LI> Don't ask people to listen to one thing and read another at the
+same time.  <LI> Don't ask people to remember an equation or
+definition 5 slides later: just put up a copy when you refer to it.
+<LI> Use direct, simple language. For example, if there are three ways
+to refer to something, pick one and use it consistently throughout
+your talk: don't call something a &ldquo;model&rdquo on one slide and
+a &ldquo;parameter vector&rdquo; on another.
+<LI> Label every graph clearly and in large fonts: both axes, every line, and even the sign of any comparison you want to make (&ldquo;higher is better&rdquo;).
+<LI> If a fact is important, emphasize it.&nbsp;  The audience doesn't necessarily manage to process every word you say.&nbsp;  Help them process your talk by telling them what is important, and by repeating things they might have forgotten.
+</UL>
+<p>
+
+Synthesize.  The audience should get something out of your talk that
+they can't get as quickly or easily out of the paper(s). This means:
+pull together concepts from multiple papers if necessary; compare to
+related work; communicate your judgement about benefits and
+limitations of each technique.<p>
+
+Be careful with equations.&nbsp; You can use a limited number of
+equations if you want to, but make sure that you spend enough time
+explaining them that the audience truly understands them.
+<ul>
+<li> Often, it's a good idea to leave the slide blank and hand-write the equations on it during the actual talk; this trick will keep you from going too fast.  Of course, this trick only works if you have a tablet or (gasp) an analog device like a whiteboard or overhead transparencies.
+<li> If you use this trick, make sure you practice writing out the equations ahead of time at the same level of detail that you plan to use during the talk.; Don't just assume they're simple enough that you can't possibly get them wrong; that assumption is usually false.
+</ul>
+<p>
+
+Organize well:
+
+<ul>
+<li> Introduce one new concept at a time.  Make sure you know, for every part of every slide, which concept it is intended to convey.;  Make sure you can describe each concept with a clear, short phrase -- else it's probably more than one concept.
+<li> Introduce concepts in the right order.  If concept B depends on concept A, make sure to introduce A first.
+<li> Sometimes it helps to make a directed graph: nodes are the short phrases for concepts, and arrows represent prerequisites.  You can then check that your talk is consistent with the graph (i.e., doesn't try to reverse any arrows).
+<li> If there are directed cycles in your graph, you have a problem.  Try to refactor your concepts and pull out something that you can introduce before any of the nodes in a cycle, then re-evaluate the dependencies, and repeat until you get a DAG.
+
+</ul>
+<p>
+
+Start and end your talk well:
+<ul>
+<li> If possible, put up your title slide while you're being introduced. Then you don't need to read it.
+<li> Make sure the audience knows who you are, especially if you're talking about a paper with multiple authors. You may want to put your name at the bottom of every slide, for people who come in late.
+<li> Make sure you know the first few sentences of your talk by heart. Exact memorization is usually a bad idea for the body of the talk (it sounds stilted), but I find that knowing the first sentence or two helps me get started. (And once I get started I can almost always keep going.)
+
+<li> Make sure you have an obvious end to your talk, and don't just trail off into silence. Always end with a statement (e.g., &ldquo;thank you&rdquo) not a question (e.g., &ldquo;any questions?&rdquo;).&nbsp; If you end with a question, the audience doesn't know whether to answer it or applaud, which can be awkward.
+</ul>
+
+Audiences hate to have their time wasted.&nbsp;  So:
+<ul>
+<li> Whenever you can do a little work to save your audience a little
+work, you should.&nbsp; E.g., make a better visualization or a better
+figure, if you think it will improve your audience's ability to
+understand.&nbsp; Or, take that huge table of timing results from your
+paper and translate it into a bar chart that highlights the
+comparisons you're trying to make.
+
+<li> View an agreement to give a talk as a commitment.&nbsp; Don't
+cancel unless you really, really need to.&nbsp; If you do have to
+cancel, give as much notice as you can.
+
+<li> Plan to show up early.&nbsp; That way if something goes wrong
+(miss a bus, projector doesn't work, etc.), you have time to fix
+it.&nbsp; Snafus like the above are part of the normal order of the
+world, and somehow seem to be even more common when you're about to
+give a talk.&nbsp; Speakers should therefore expect and plan for them.
+
+<li> Know your tools.&nbsp; Make sure you know how to hook your laptop
+up to a projector, how to operate your presentation software quickly
+and unobtrusively, how to avoid having instant messages pop up on top
+of your slides, etc.
+
+</ul>
+<p>
+
+Don't waste your own time either.&nbsp; Don't spend lots of time
+designing pretty animations, flying text, etc., unless they will
+actually help audience comprehension and not distract from your
+talk.&nbsp; Every second spent animating is a second you don't have
+for explaining your ideas.<p>
+
+</td>
+</tr>
+</table>
+
+    <hr>
+    <address><a href="mailto:wcohen@ROCKY"></a></address>
+<!-- Created: Mon Jan 03 15:12:34 Eastern Standard Time 2011 -->
+<!-- hhmts start -->
+Last modified: Mon Jan 03 16:09:28 Eastern Standard Time 2011
+<!-- hhmts end -->
+  </body>
+</html>
diff --git a/all-bibdata.tgz b/all-bibdata.tgz
diff --git a/all-nell-triples.txt.gz b/all-nell-triples.txt.gz
diff --git a/balloon.zip b/balloon.zip
diff --git a/block-lda-icml-ws-2010.ppt b/block-lda-icml-ws-2010.ppt