-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3,371 changed files
with
257,558 additions
and
0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
<html> | ||
<head> | ||
<title>How to use the SLIF Text Components</title> | ||
</head> | ||
<body bgcolor="white"> | ||
|
||
<h2>How to use the SLIF Text Components</h2> | ||
|
||
<h3>Invocation and basic options</h3> | ||
|
||
<p> | ||
The SLIF text components are distributed as single large JAR file. To | ||
run it you will need a copy of Java. A typical invocation would be | ||
|
||
<p> | ||
<code> | ||
% java -cp slifTextComponents.jar -Xmx500M SlifTextComponent -labels <i>DIR</i> -saveAs <i>FILE</i> -use <i>COMPONENT1,COMPONENT2,....</i> [<i>OPTIONS</i>] | ||
</code> | ||
|
||
<p> | ||
where -Xmx500M allocates additional memory for the Java heap, and the additional arguments are as follows: | ||
<ul> | ||
|
||
<li><i>DIR</i> is a directory containing some number of text files to | ||
annotate. | ||
|
||
<p>A <a href="http://www.cs.cmu.edu/~wcohen/captions.tgz">sample | ||
directory of files is available</a> as a compressed tarfile. These | ||
captions were all taken from PubMedCentral papers, e.g., the file | ||
"p9770486-fig_4_2" is from Figure 4 of the paper with the <a | ||
href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed">PubMed</a> | ||
Id of 9770486. | ||
|
||
<li><i>FILE</i> is where annotations should be placed. These are output in 'minorthird format', | ||
which is explained below. | ||
|
||
<li><i>COMPONENT1,...</i> are the names of 'text components' to use to | ||
</ul> | ||
|
||
The components available are: | ||
<ul> | ||
|
||
<li><b>CellLine</b>: marks spans that are predicted to be the names | ||
of cell lines with 'cellLine', using an entity-tagger trained using | ||
the Genia corpus. | ||
|
||
<li><b>CRFonYapex</b>: marks spans that are predicted to be the | ||
names of genes or proteins with 'protein', and also | ||
'proteinFromCRFonYapex', using a gene-taggertrained using the YAPEX | ||
corpus using the CRF algorithm, as described by <a | ||
href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2005.pdf">Kou, | ||
Cohen and Murphy (2005)</a>. | ||
|
||
<li><b>CRFonTexas, CRFonGenia</b>: analogous to CRFonYapex, but | ||
using gene-taggers trained on different corpora (as outlined in the | ||
Kou et al paper.) | ||
|
||
<li><b>SemiCRFOnYapex, SemiCRFOnTexas, SemiCRFOnGenia</b>: analogous | ||
to the CRFon* components, but trained with the SemiCRF algorithm. | ||
|
||
<li><b>DictHMMOnYapex, DictHMMOnTexas, DictHMMOnGenia</b>: analogous | ||
to the CRFon* components, but trained with the DictHMM algorithm. | ||
|
||
<li><b>Caption</b>: marks spans according to the criteria described | ||
by <a | ||
href="http://www.cs.cmu.edu/~wcohen/postscript/ismb-2003.pdf">Cohen, | ||
Murphy and Wang (2002)</a>: <ul> | ||
|
||
<li>Spans marked as 'imagePointer' are predicted to be image | ||
pointers. For a definition of image pointers, see Cohen, | ||
Murphy, and Wang (2002). | ||
|
||
<li>Spans marked as 'bulletStyle' and 'citationStyle' are | ||
predicted to be bullet-style and citation-style image pointers, | ||
respectively. | ||
|
||
<li>Spans marked as 'bulletScope' and 'localScope' are predicted | ||
to be the <i>scopes</i> of bullet-style and citation-style | ||
image pointers, respectively. | ||
|
||
<li>Spans marked as 'globalScope' are text assumed to pertain | ||
to the entire associated image. | ||
|
||
<li>Spans marked as either 'bulletScope', 'localScope', or | ||
'globalScope' are marked as 'scope'. | ||
|
||
<li>Every 'scope' span is associated with a <i>span | ||
property</i> called its 'semantics'. The 'semantics' of a span | ||
is the concatenation of all the image pointers associated with | ||
that span. | ||
|
||
</ul> | ||
|
||
Additionally, the span labels 'regional' and 'local' are synonyms | ||
for bullet-style and citation-style, respectively. | ||
|
||
<p> | ||
Briefly, to find out what parts of an image some span | ||
<i>S</i> might refer to, you need to (1) find out what 'scope' | ||
spans <I>S</i> is inside of and (2) find out what the 'semantics' | ||
of these scope spans are. For instance, if the span 'RAS4' is | ||
inside a scope <I>T1</i> with semantics "A" and also inside a | ||
scope <I>T2</i> with semantics "BD", then 'RAS4' probably is | ||
associated with the parts of the accompanying image labeled A", | ||
"B", and "D". | ||
|
||
</ul> | ||
|
||
<h3>The Minorthird format for stand-off annotation</h3> | ||
|
||
The format for output is the one used by <a | ||
href="http://minorthird.sourceforge.net">Minorthird</a>. Specifically, the | ||
output (in the default format) is a series of lines in one of these | ||
formats: | ||
|
||
<p> | ||
<code> | ||
addToType <i>FILE</i> <i>START</i> <i>LENGTH</i> <i>SPANTYPE</i><br> | ||
setSpanProp <i>FILE</i> <i>START</i> <i>LENGTH</i> semantics <i>LETTERS</i> | ||
</code> | ||
|
||
<p> | ||
where | ||
<ul> | ||
|
||
<li><i>FILE</i> is the name of the file containing some span; | ||
|
||
<li><i>START</i> and <i>LENGTH</i> are the initial byte position of the span, and its length; | ||
|
||
<li><i>SPANTYPE</i> is the type of span (e.g., 'imagePointer', | ||
'cellLine', 'protein', 'scope', etc. | ||
|
||
<li><i>LETTERS</i> is (as noted above) the concatenation of all the | ||
image pointers associated with that span | ||
|
||
</ul> | ||
|
||
<h3>Other options</h3> | ||
|
||
<table border=1> | ||
<tr><th>Option</th><th>Explanation</th></tr> | ||
|
||
<tr><td><code>-help</code></td> | ||
<td>Gives brief command line help</td> | ||
</tr> | ||
|
||
<tr><td><code>-gui</code></td> | ||
<td>Pops up a window that allows you to interactively fill in the other arguments, monitor the execution of the annotation process, etc.</td> | ||
</tr> | ||
|
||
<tr><td><code>-showLabels</code></td> | ||
<td>Pops up a window that displays the set of documents being labeled. | ||
(This is not recommended for a large document collection, due to | ||
memory usage.) | ||
</td> | ||
</tr> | ||
|
||
<tr><td><code>-showResult</code></td> | ||
<td>Pops up a window that displays the result of the annotation. | ||
(Again, not recommended for a large document collection.) | ||
</td> | ||
</tr> | ||
|
||
<tr><td><code>-format strings</code></td> <td>Outputs results as a | ||
tab-separated table, instead of minorthird format. The first | ||
column summarizes the type of the span, the file the span was | ||
taken from, and the start and end byte positions, in a | ||
colon-separated format. (E.g., | ||
"cellLine:p11029059-fig_4_1:1293:1303".) The remaining column(s) | ||
are the text that is contained in the span (e.g., "HeLa cells", | ||
for the span above) almost exactly as it appears in the document; the | ||
only change is that newlines are replaced with spaces. | ||
</tr> | ||
|
||
</table> | ||
|
||
<h3>References</h3> | ||
|
||
<ul> | ||
<li>Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2005): <a href="postscript/ismb-2005.pdf">High-Recall Protein Entity Recognition Using a Dictionary</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/ismb/ismb2005.html#KouCM05">ISMB-2005</a>. | ||
<li>William W. Cohen, Richard Wang & Robert Murphy (2003): <a href="postscript/ismb-2003.pdf">Understanding Captions in Biomedical Publications</a> in <a href="http://www.informatik.uni-trier.de/~ley/db/conf/kdd/kdd2003.html#CohenWM03">KDD 2003: 499-504</a>. | ||
<p> | ||
<li>Robert F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004): <a href="postscript/ksce-2004.pdf">Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder</a> in <a href="recent.html">KSCE-2004</a>. | ||
<li><a href="http://murphylab.web.cmu.edu/services/SLIF2/">The SLIF home page</a> | ||
</ul> | ||
|
||
<h3>Acknowledgements</h3> | ||
|
||
A number of people have contributed to these tools, including William | ||
Cohen, Zhenzhen Kou, Quinten Mercer, Robert Murphy, Richard Wang, and | ||
other members of the SLIF team. | ||
|
||
The initial development of these tools was supported by grant 017396 | ||
from the Commonwealth of Pennsylvania Tobacco Settlement Fund. Further | ||
development is supported by National Institutes of Health grant R01 | ||
GM078622. | ||
|
||
</BODY> | ||
</HTML> | ||
|
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> | ||
<html> | ||
<head> | ||
<link rel="SHORTCUT ICON" href="CarnegieMellon_logo.gif"> | ||
<title>Advice for Technical Speaking</title> | ||
</head> | ||
|
||
<body bgcolor = "#ffcc99"> | ||
<table bgcolor = "#ffffff" width = "70%" align = "center" border="0" cellpadding = "15"> | ||
<tr> | ||
<td> | ||
<h3>Advice for technical speaking</h3> | ||
|
||
<h5>Shamelessly pilfered from <a | ||
href="http://www.cs.cmu.edu/~ggordon/speaking-advice.html">Geoff | ||
Gordon's advice page</a></h5> | ||
|
||
When you hear a talk - a good one or a bad one! - think about the | ||
presentation as well as the content. Copy what works, and avoid what | ||
doesn't. If you see a great talk, examine it and try to figure out | ||
what makes it great. If you see a poor talk, examine it and ask | ||
yourself if you might make the same mistakes. Some of the most common | ||
mistakes are below, but people are really quite creative in coming up | ||
with new mistakes, so don't assume this list is complete!<p> | ||
|
||
Don't use too many slides. If you have more than one slide per minute, | ||
you are definitely using too many. One slide per two minutes is a much | ||
more reasonable pace.<p> | ||
|
||
Don't read your slides. You do not need to put everything you are | ||
going to say up on a slide; that's what speaker notes are for. Save | ||
your slides for things that don't work as well with just speech: | ||
figures, diagrams, movies, animations, extra emphasis on important | ||
concepts. If your slides are just lists of bullets, they should | ||
probably be speaker notes and not slides.<p> | ||
|
||
Don't put too many pixels on your slides. Use big fonts and | ||
contrasting colors. Projectors are notoriously bad at making ochre and | ||
mauve actually look different on the screen. If you copy a figure from | ||
a paper, ask yourself whether the text labels are big enough to read | ||
from the back of the room, and redo them if not.<p> | ||
|
||
Don't try and say too much. You can't explain everything you've been | ||
going in a semester project in 30 minutes - part of communicating is | ||
deciding what to leave out. If you feel like you have to rush to say | ||
what you need to say, you're going too fast: the presentation should | ||
be relaxed enough that people have a chance to reflect on what you | ||
say, and ask questions if they need to.<p> | ||
|
||
Talk concretely about your work. People are great at abstracting from | ||
examples, but it's hard work for them to think through high-level | ||
abstractions. (This is the opposite case from when you're programming | ||
a computer - then you always program the most general case possible, | ||
and let the computer instantiate it as needed.) When you're talking | ||
to a person, start with a concrete problem you want to solve, and then | ||
help the person understand how to generalize that concrete problem to | ||
the general case.<p> | ||
|
||
View your talk as an advertisement for your paper(s). Your goal is | ||
to convince your audience to use your ideas for their own work, so | ||
that they cite you and make you famous. Your goal is <b>not</b> to | ||
make them understand Equation 43 on page 17 (unless that convinces | ||
them to cite you and make you famous). Instead, say what your | ||
techniques are good for, why they're important, what the alternatives | ||
are, and how to choose when your techniques are appropriate instead of | ||
the alternatives. Then and only then, use the rest of the time on | ||
technical stuff, with the goal of giving listeners the tools to read | ||
your paper. (If you're talking about someone else's work, imagine | ||
instead that you're trying to get the audience to trust your | ||
evaluation of that work.)<p> | ||
|
||
Be honest and diligent. Don't try to cover up flaws or overstate the | ||
applicability of your techniques; instead, try to discover flaws and | ||
limitations and expose them. | ||
|
||
Think concretely about your audience. Will they be able to understand | ||
each slide as it comes up? Will they understand why each slide is | ||
important? As a heuristic, I often find it best to prepare the first | ||
version of a talk with <it>one</it> specific person I know well in | ||
mind - and think about what I would say to engage and inform him or | ||
her specifically.<p> | ||
|
||
Talk at the right level for your audience. Remember that, almost by | ||
definition, you understand the material and they don't, and fight the | ||
inclination to go too fast. Be aware of people's cognitive | ||
limitations. Don't make your audience figure something out if you | ||
don't have to; that will save more processing power for what you want | ||
them to focus on. In particular: | ||
<UL> | ||
<LI> Don't ask people to listen to one thing and read another at the | ||
same time. <LI> Don't ask people to remember an equation or | ||
definition 5 slides later: just put up a copy when you refer to it. | ||
<LI> Use direct, simple language. For example, if there are three ways | ||
to refer to something, pick one and use it consistently throughout | ||
your talk: don't call something a “model&rdquo on one slide and | ||
a “parameter vector” on another. | ||
<LI> Label every graph clearly and in large fonts: both axes, every line, and even the sign of any comparison you want to make (“higher is better”). | ||
<LI> If a fact is important, emphasize it. The audience doesn't necessarily manage to process every word you say. Help them process your talk by telling them what is important, and by repeating things they might have forgotten. | ||
</UL> | ||
<p> | ||
|
||
Synthesize. The audience should get something out of your talk that | ||
they can't get as quickly or easily out of the paper(s). This means: | ||
pull together concepts from multiple papers if necessary; compare to | ||
related work; communicate your judgement about benefits and | ||
limitations of each technique.<p> | ||
|
||
Be careful with equations. You can use a limited number of | ||
equations if you want to, but make sure that you spend enough time | ||
explaining them that the audience truly understands them. | ||
<ul> | ||
<li> Often, it's a good idea to leave the slide blank and hand-write the equations on it during the actual talk; this trick will keep you from going too fast. Of course, this trick only works if you have a tablet or (gasp) an analog device like a whiteboard or overhead transparencies. | ||
<li> If you use this trick, make sure you practice writing out the equations ahead of time at the same level of detail that you plan to use during the talk.; Don't just assume they're simple enough that you can't possibly get them wrong; that assumption is usually false. | ||
</ul> | ||
<p> | ||
|
||
Organize well: | ||
|
||
<ul> | ||
<li> Introduce one new concept at a time. Make sure you know, for every part of every slide, which concept it is intended to convey.; Make sure you can describe each concept with a clear, short phrase -- else it's probably more than one concept. | ||
<li> Introduce concepts in the right order. If concept B depends on concept A, make sure to introduce A first. | ||
<li> Sometimes it helps to make a directed graph: nodes are the short phrases for concepts, and arrows represent prerequisites. You can then check that your talk is consistent with the graph (i.e., doesn't try to reverse any arrows). | ||
<li> If there are directed cycles in your graph, you have a problem. Try to refactor your concepts and pull out something that you can introduce before any of the nodes in a cycle, then re-evaluate the dependencies, and repeat until you get a DAG. | ||
|
||
</ul> | ||
<p> | ||
|
||
Start and end your talk well: | ||
<ul> | ||
<li> If possible, put up your title slide while you're being introduced. Then you don't need to read it. | ||
<li> Make sure the audience knows who you are, especially if you're talking about a paper with multiple authors. You may want to put your name at the bottom of every slide, for people who come in late. | ||
<li> Make sure you know the first few sentences of your talk by heart. Exact memorization is usually a bad idea for the body of the talk (it sounds stilted), but I find that knowing the first sentence or two helps me get started. (And once I get started I can almost always keep going.) | ||
|
||
<li> Make sure you have an obvious end to your talk, and don't just trail off into silence. Always end with a statement (e.g., “thank you&rdquo) not a question (e.g., “any questions?”). If you end with a question, the audience doesn't know whether to answer it or applaud, which can be awkward. | ||
</ul> | ||
|
||
Audiences hate to have their time wasted. So: | ||
<ul> | ||
<li> Whenever you can do a little work to save your audience a little | ||
work, you should. E.g., make a better visualization or a better | ||
figure, if you think it will improve your audience's ability to | ||
understand. Or, take that huge table of timing results from your | ||
paper and translate it into a bar chart that highlights the | ||
comparisons you're trying to make. | ||
|
||
<li> View an agreement to give a talk as a commitment. Don't | ||
cancel unless you really, really need to. If you do have to | ||
cancel, give as much notice as you can. | ||
|
||
<li> Plan to show up early. That way if something goes wrong | ||
(miss a bus, projector doesn't work, etc.), you have time to fix | ||
it. Snafus like the above are part of the normal order of the | ||
world, and somehow seem to be even more common when you're about to | ||
give a talk. Speakers should therefore expect and plan for them. | ||
|
||
<li> Know your tools. Make sure you know how to hook your laptop | ||
up to a projector, how to operate your presentation software quickly | ||
and unobtrusively, how to avoid having instant messages pop up on top | ||
of your slides, etc. | ||
|
||
</ul> | ||
<p> | ||
|
||
Don't waste your own time either. Don't spend lots of time | ||
designing pretty animations, flying text, etc., unless they will | ||
actually help audience comprehension and not distract from your | ||
talk. Every second spent animating is a second you don't have | ||
for explaining your ideas.<p> | ||
|
||
</td> | ||
</tr> | ||
</table> | ||
|
||
<hr> | ||
<address><a href="mailto:wcohen@ROCKY"></a></address> | ||
<!-- Created: Mon Jan 03 15:12:34 Eastern Standard Time 2011 --> | ||
<!-- hhmts start --> | ||
Last modified: Mon Jan 03 16:09:28 Eastern Standard Time 2011 | ||
<!-- hhmts end --> | ||
</body> | ||
</html> |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Oops, something went wrong.