Skip to content

Commit

Permalink
initial copy from cmu
Browse files Browse the repository at this point in the history
  • Loading branch information
wwcohen committed Aug 22, 2018
1 parent e13ae63 commit 6b2a698
Show file tree
Hide file tree
Showing 3,371 changed files with 257,558 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
Binary file added CrowdComp_MTurkData.tar.gz
Binary file not shown.
Binary file added CutOncev0.0.2a.xpi
Binary file not shown.
Binary file added FastEffectiveClustering-v2.pdf
Binary file not shown.
Binary file added FastEffectiveClustering-v2.ppt
Binary file not shown.
Binary file added FastEffectiveClustering.pdf
Binary file not shown.
Binary file added FastEffectiveClustering.ppt
Binary file not shown.
Binary file added GuideToBiology-pictures-color-release1.5.pdf
Binary file not shown.
Binary file added GuideToBiology-pictures-color-release1.5.ppt
Binary file not shown.
Binary file added GuideToBiology-sampleChapter-release1.4.pdf
Binary file not shown.
Binary file added IIWeb.ppt
Binary file not shown.
Binary file added MSM-2009.ppt
Binary file not shown.
Binary file added Matching-1.ppt
Binary file not shown.
Binary file added Matching-2.ppt
Binary file not shown.
Binary file added Matching-3.ppt
Binary file not shown.
Binary file added Shortcut to 10-802.lnk
Binary file not shown.
Binary file added SimStudent-wc.ppt
Binary file not shown.
200 changes: 200 additions & 0 deletions SlifTextComponent.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
<html>
<head>
<title>How to use the SLIF Text Components</title>
</head>
<body bgcolor="white">

<h2>How to use the SLIF Text Components</h2>

<h3>Invocation and basic options</h3>

<p>
The SLIF text components are distributed as single large JAR file. To
run it you will need a copy of Java. A typical invocation would be

<p>
<code>
% java -cp slifTextComponents.jar -Xmx500M SlifTextComponent -labels <i>DIR</i> -saveAs <i>FILE</i> -use <i>COMPONENT1,COMPONENT2,....</i> [<i>OPTIONS</i>]
</code>

<p>
where -Xmx500M allocates additional memory for the Java heap, and the additional arguments are as follows:
<ul>

<li><i>DIR</i> is a directory containing some number of text files to
annotate.

<p>A <a href="http://www.cs.cmu.edu/~wcohen/captions.tgz">sample
directory of files is available</a> as a compressed tarfile. These
captions were all taken from PubMedCentral papers, e.g., the file
"p9770486-fig_4_2" is from Figure 4 of the paper with the <a
href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed">PubMed</a>
Id of 9770486.

<li><i>FILE</i> is where annotations should be placed. These are output in 'minorthird format',
which is explained below.

<li><i>COMPONENT1,...</i> are the names of 'text components' to use to
</ul>

The components available are:
<ul>

<li><b>CellLine</b>: marks spans that are predicted to be the names
of cell lines with 'cellLine', using an entity-tagger trained using
the Genia corpus.

<li><b>CRFonYapex</b>: marks spans that are predicted to be the
names of genes or proteins with 'protein', and also
'proteinFromCRFonYapex', using a gene-taggertrained using the YAPEX
corpus using the CRF algorithm, as described by <a
href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2005.pdf">Kou,
Cohen and Murphy (2005)</a>.

<li><b>CRFonTexas, CRFonGenia</b>: analogous to CRFonYapex, but
using gene-taggers trained on different corpora (as outlined in the
Kou et al paper.)

<li><b>SemiCRFOnYapex, SemiCRFOnTexas, SemiCRFOnGenia</b>: analogous
to the CRFon* components, but trained with the SemiCRF algorithm.

<li><b>DictHMMOnYapex, DictHMMOnTexas, DictHMMOnGenia</b>: analogous
to the CRFon* components, but trained with the DictHMM algorithm.

<li><b>Caption</b>: marks spans according to the criteria described
by <a
href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2003.pdf">Cohen,
Murphy and Wang (2002)</a>: <ul>

<li>Spans marked as 'imagePointer' are predicted to be image
pointers. For a definition of image pointers, see Cohen,
Murphy, and Wang (2002).

<li>Spans marked as 'bulletStyle' and 'citationStyle' are
predicted to be bullet-style and citation-style image pointers,
respectively.

<li>Spans marked as 'bulletScope' and 'localScope' are predicted
to be the <i>scopes</i> of bullet-style and citation-style
image pointers, respectively.

<li>Spans marked as 'globalScope' are text assumed to pertain
to the entire associated image.

<li>Spans marked as either 'bulletScope', 'localScope', or
'globalScope' are marked as 'scope'.

<li>Every 'scope' span is associated with a <i>span
property</i> called its 'semantics'. The 'semantics' of a span
is the concatenation of all the image pointers associated with
that span.

</ul>

Additionally, the span labels 'regional' and 'local' are synonyms
for bullet-style and citation-style, respectively.

<p>
Briefly, to find out what parts of an image some span
<i>S</i> might refer to, you need to (1) find out what 'scope'
spans <I>S</i> is inside of and (2) find out what the 'semantics'
of these scope spans are. For instance, if the span 'RAS4' is
inside a scope <I>T1</i> with semantics "A" and also inside a
scope <I>T2</i> with semantics "BD", then 'RAS4' probably is
associated with the parts of the accompanying image labeled A",
"B", and "D".

</ul>

<h3>The Minorthird format for stand-off annotation</h3>

The format for output is the one used by <a
href="http://minorthird.sourceforge.net">Minorthird</a>. Specifically, the
output (in the default format) is a series of lines in one of these
formats:

<p>
<code>
addToType <i>FILE</i> <i>START</i> <i>LENGTH</i> <i>SPANTYPE</i><br>
setSpanProp <i>FILE</i> <i>START</i> <i>LENGTH</i> semantics <i>LETTERS</i>
</code>

<p>
where
<ul>

<li><i>FILE</i> is the name of the file containing some span;

<li><i>START</i> and <i>LENGTH</i> are the initial byte position of the span, and its length;

<li><i>SPANTYPE</i> is the type of span (e.g., 'imagePointer',
'cellLine', 'protein', 'scope', etc.

<li><i>LETTERS</i> is (as noted above) the concatenation of all the
image pointers associated with that span

</ul>

<h3>Other options</h3>

<table border=1>
<tr><th>Option</th><th>Explanation</th></tr>

<tr><td><code>-help</code></td>
<td>Gives brief command line help</td>
</tr>

<tr><td><code>-gui</code></td>
<td>Pops up a window that allows you to interactively fill in the other arguments, monitor the execution of the annotation process, etc.</td>
</tr>

<tr><td><code>-showLabels</code></td>
<td>Pops up a window that displays the set of documents being labeled.
(This is not recommended for a large document collection, due to
memory usage.)
</td>
</tr>

<tr><td><code>-showResult</code></td>
<td>Pops up a window that displays the result of the annotation.
(Again, not recommended for a large document collection.)
</td>
</tr>

<tr><td><code>-format strings</code></td> <td>Outputs results as a
tab-separated table, instead of minorthird format. The first
column summarizes the type of the span, the file the span was
taken from, and the start and end byte positions, in a
colon-separated format. (E.g.,
"cellLine:p11029059-fig_4_1:1293:1303".) The remaining column(s)
are the text that is contained in the span (e.g., "HeLa cells",
for the span above) almost exactly as it appears in the document; the
only change is that newlines are replaced with spaces.
</tr>

</table>

<h3>References</h3>

<ul>
<li>Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2005): <a href="postscript/ismb-2005.pdf">High-Recall Protein Entity Recognition Using a Dictionary</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/ismb/ismb2005.html#KouCM05">ISMB-2005</a>.
<li>William W. Cohen, Richard Wang & Robert Murphy (2003): <a href="postscript/ismb-2003.pdf">Understanding Captions in Biomedical Publications</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/kdd/kdd2003.html#CohenWM03">KDD 2003: 499-504</a>.
<p>
<li>Robert F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004): <a href="postscript/ksce-2004.pdf">Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder</a> in <a href="recent.html">KSCE-2004</a>.
<li><a href="http://murphylab.web.cmu.edu/services/SLIF2/">The SLIF home page</a>
</ul>

<h3>Acknowledgements</h3>

A number of people have contributed to these tools, including William
Cohen, Zhenzhen Kou, Quinten Mercer, Robert Murphy, Richard Wang, and
other members of the SLIF team.

The initial development of these tools was supported by grant 017396
from the Commonwealth of Pennsylvania Tobacco Settlement Fund. Further
development is supported by National Institutes of Health grant R01
GM078622.

</BODY>
</HTML>

Binary file added Thumbs.db
Binary file not shown.
Binary file added aaai-fs-2012.ppt
Binary file not shown.
Binary file added aaai-ss-2015.ppt
Binary file not shown.
181 changes: 181 additions & 0 deletions advice.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="SHORTCUT ICON" href="CarnegieMellon_logo.gif">
<title>Advice for Technical Speaking</title>
</head>

<body bgcolor = "#ffcc99">
<table bgcolor = "#ffffff" width = "70%" align = "center" border="0" cellpadding = "15">
<tr>
<td>
<h3>Advice for technical speaking</h3>

<h5>Shamelessly pilfered from <a
href="http://www.cs.cmu.edu/~ggordon/speaking-advice.html">Geoff
Gordon's advice page</a></h5>

When you hear a talk - a good one or a bad one! - think about the
presentation as well as the content. Copy what works, and avoid what
doesn't. If you see a great talk, examine it and try to figure out
what makes it great. If you see a poor talk, examine it and ask
yourself if you might make the same mistakes. Some of the most common
mistakes are below, but people are really quite creative in coming up
with new mistakes, so don't assume this list is complete!<p>

Don't use too many slides. If you have more than one slide per minute,
you are definitely using too many. One slide per two minutes is a much
more reasonable pace.<p>

Don't read your slides. You do not need to put everything you are
going to say up on a slide; that's what speaker notes are for. Save
your slides for things that don't work as well with just speech:
figures, diagrams, movies, animations, extra emphasis on important
concepts. If your slides are just lists of bullets, they should
probably be speaker notes and not slides.<p>

Don't put too many pixels on your slides. Use big fonts and
contrasting colors. Projectors are notoriously bad at making ochre and
mauve actually look different on the screen. If you copy a figure from
a paper, ask yourself whether the text labels are big enough to read
from the back of the room, and redo them if not.<p>

Don't try and say too much. You can't explain everything you've been
going in a semester project in 30 minutes - part of communicating is
deciding what to leave out. If you feel like you have to rush to say
what you need to say, you're going too fast: the presentation should
be relaxed enough that people have a chance to reflect on what you
say, and ask questions if they need to.<p>

Talk concretely about your work. People are great at abstracting from
examples, but it's hard work for them to think through high-level
abstractions. (This is the opposite case from when you're programming
a computer - then you always program the most general case possible,
and let the computer instantiate it as needed.) When you're talking
to a person, start with a concrete problem you want to solve, and then
help the person understand how to generalize that concrete problem to
the general case.<p>

View your talk as an advertisement for your paper(s). Your goal is
to convince your audience to use your ideas for their own work, so
that they cite you and make you famous. Your goal is <b>not</b> to
make them understand Equation 43 on page 17 (unless that convinces
them to cite you and make you famous). Instead, say what your
techniques are good for, why they're important, what the alternatives
are, and how to choose when your techniques are appropriate instead of
the alternatives. Then and only then, use the rest of the time on
technical stuff, with the goal of giving listeners the tools to read
your paper. (If you're talking about someone else's work, imagine
instead that you're trying to get the audience to trust your
evaluation of that work.)<p>

Be honest and diligent. Don't try to cover up flaws or overstate the
applicability of your techniques; instead, try to discover flaws and
limitations and expose them.

Think concretely about your audience. Will they be able to understand
each slide as it comes up? Will they understand why each slide is
important? As a heuristic, I often find it best to prepare the first
version of a talk with <it>one</it> specific person I know well in
mind - and think about what I would say to engage and inform him or
her specifically.<p>

Talk at the right level for your audience. Remember that, almost by
definition, you understand the material and they don't, and fight the
inclination to go too fast. Be aware of people's cognitive
limitations. Don't make your audience figure something out if you
don't have to; that will save more processing power for what you want
them to focus on. In particular:
<UL>
<LI> Don't ask people to listen to one thing and read another at the
same time. <LI> Don't ask people to remember an equation or
definition 5 slides later: just put up a copy when you refer to it.
<LI> Use direct, simple language. For example, if there are three ways
to refer to something, pick one and use it consistently throughout
your talk: don't call something a &ldquo;model&rdquo on one slide and
a &ldquo;parameter vector&rdquo; on another.
<LI> Label every graph clearly and in large fonts: both axes, every line, and even the sign of any comparison you want to make (&ldquo;higher is better&rdquo;).
<LI> If a fact is important, emphasize it.&nbsp; The audience doesn't necessarily manage to process every word you say.&nbsp; Help them process your talk by telling them what is important, and by repeating things they might have forgotten.
</UL>
<p>

Synthesize. The audience should get something out of your talk that
they can't get as quickly or easily out of the paper(s). This means:
pull together concepts from multiple papers if necessary; compare to
related work; communicate your judgement about benefits and
limitations of each technique.<p>

Be careful with equations.&nbsp; You can use a limited number of
equations if you want to, but make sure that you spend enough time
explaining them that the audience truly understands them.
<ul>
<li> Often, it's a good idea to leave the slide blank and hand-write the equations on it during the actual talk; this trick will keep you from going too fast. Of course, this trick only works if you have a tablet or (gasp) an analog device like a whiteboard or overhead transparencies.
<li> If you use this trick, make sure you practice writing out the equations ahead of time at the same level of detail that you plan to use during the talk.; Don't just assume they're simple enough that you can't possibly get them wrong; that assumption is usually false.
</ul>
<p>

Organize well:

<ul>
<li> Introduce one new concept at a time. Make sure you know, for every part of every slide, which concept it is intended to convey.; Make sure you can describe each concept with a clear, short phrase -- else it's probably more than one concept.
<li> Introduce concepts in the right order. If concept B depends on concept A, make sure to introduce A first.
<li> Sometimes it helps to make a directed graph: nodes are the short phrases for concepts, and arrows represent prerequisites. You can then check that your talk is consistent with the graph (i.e., doesn't try to reverse any arrows).
<li> If there are directed cycles in your graph, you have a problem. Try to refactor your concepts and pull out something that you can introduce before any of the nodes in a cycle, then re-evaluate the dependencies, and repeat until you get a DAG.

</ul>
<p>

Start and end your talk well:
<ul>
<li> If possible, put up your title slide while you're being introduced. Then you don't need to read it.
<li> Make sure the audience knows who you are, especially if you're talking about a paper with multiple authors. You may want to put your name at the bottom of every slide, for people who come in late.
<li> Make sure you know the first few sentences of your talk by heart. Exact memorization is usually a bad idea for the body of the talk (it sounds stilted), but I find that knowing the first sentence or two helps me get started. (And once I get started I can almost always keep going.)

<li> Make sure you have an obvious end to your talk, and don't just trail off into silence. Always end with a statement (e.g., &ldquo;thank you&rdquo) not a question (e.g., &ldquo;any questions?&rdquo;).&nbsp; If you end with a question, the audience doesn't know whether to answer it or applaud, which can be awkward.
</ul>

Audiences hate to have their time wasted.&nbsp; So:
<ul>
<li> Whenever you can do a little work to save your audience a little
work, you should.&nbsp; E.g., make a better visualization or a better
figure, if you think it will improve your audience's ability to
understand.&nbsp; Or, take that huge table of timing results from your
paper and translate it into a bar chart that highlights the
comparisons you're trying to make.

<li> View an agreement to give a talk as a commitment.&nbsp; Don't
cancel unless you really, really need to.&nbsp; If you do have to
cancel, give as much notice as you can.

<li> Plan to show up early.&nbsp; That way if something goes wrong
(miss a bus, projector doesn't work, etc.), you have time to fix
it.&nbsp; Snafus like the above are part of the normal order of the
world, and somehow seem to be even more common when you're about to
give a talk.&nbsp; Speakers should therefore expect and plan for them.

<li> Know your tools.&nbsp; Make sure you know how to hook your laptop
up to a projector, how to operate your presentation software quickly
and unobtrusively, how to avoid having instant messages pop up on top
of your slides, etc.

</ul>
<p>

Don't waste your own time either.&nbsp; Don't spend lots of time
designing pretty animations, flying text, etc., unless they will
actually help audience comprehension and not distract from your
talk.&nbsp; Every second spent animating is a second you don't have
for explaining your ideas.<p>

</td>
</tr>
</table>

<hr>
<address><a href="mailto:wcohen@ROCKY"></a></address>
<!-- Created: Mon Jan 03 15:12:34 Eastern Standard Time 2011 -->
<!-- hhmts start -->
Last modified: Mon Jan 03 16:09:28 Eastern Standard Time 2011
<!-- hhmts end -->
</body>
</html>
Binary file added all-bibdata.tgz
Binary file not shown.
Binary file added all-nell-triples.txt.gz
Binary file not shown.
Binary file added balloon.zip
Binary file not shown.
Binary file added block-lda-icml-ws-2010.ppt
Binary file not shown.
Loading

0 comments on commit 6b2a698

Please sign in to comment.