[HowTo Create Unified Author Index and ACL Anthology XML Files]
- Jing-Shin Chang (jshin@csie.ncnu.edu.tw)
- Revised from: 2010/06/26 08:01:53
- http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.txt
- Online version:
- Last Update: 2011/05/20
- http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.html
[Introduction - with Examples for ACL-2010]
The root directory of the final CDROM image includes the following files
primarily to provide a unified author index (across various conferences)
and a unified BIB data (combining individual .bib files).
The publication chair of the main conference is responsible for creating
these files. Fortunately, the makefile, $ACLPUB/make/Makefile_pubchair,
automates the creation process to a large extent.
$ACLPUB/templates/cdrom-root-files/ of the official ACLPUB
package contains some of the templates files for the latest ACL
conference. But you may also want to do it manually for some of the files.
- index.html : root index, re-directing users to */index.html
of various subdirectories for workshops and main conference
- $ACLPUB/templates/mainindex.html.head can be modified
and used as a templete for creating the root index
file automatically using the procedure in this guide.
- you can also copy any previous ACL CDROM root index.html
and modify it directly.
- authors.html : unified author index, re-directing authors
to their papers in all workshops and main conference
- can be created almost automatically as outlined in the next section
- inconsist spelling of author names may result in multiple entries
for the same author
- manual correction (across workshops) is necessary before generating
the unified author index
- ACL-2010-with-workshops.bib : combined BIB data file
- is simply a concatenated version of */bib/*000.bib,
in the same order as they appear in the root index.html
- can be created almost automatically as outlined in the next section
- remember to make necessary manual fixes to .../bib/*000.bib before creation
- acl2010-*.(gif|jpg) : auxiliary LOGO image(s) for root index.html
- create your own and put them into $ACLPUB/templates/cdrom-root-files/
- acl-2010.ico : "favicon", icon image to be displayed in the URL field of a browser
- optional, not available before ACL-2010
- create your own and put it/them into $ACLPUB/templates/cdrom-root-files/
- generated using some free favicon generator (such as http://www.favicon.cc/)
- add a "" tag
(with correct path name) in the header part of the root index.html
(and any other *.html files, if necessary)
- standard.css : standard style file used by all *.html files in the CDROM
- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
- can be modified manually to change the style of the html files
(such as background colors, font attributes)
- autorun.inf : autorun file for automatically browse the root index.html
- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
- use it as-is unless you want to change the name of the root index.html
or do other things automatically when inserting the CDROM into your laptop
- README.txt : README file on how to start browsing the CDROM
- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
- use it as-is; modification is optional
With the help of the $ACLPUB/make/Makefile_pubchair, you can generate
the unified author index, unified bibdata in an almost automatic
manner as follows. But you need the 'proceedings.tgz' files from
all the workshops bookchairs in order to unified author information.
The 'proceedings.tgz' includes the complete working directory of a bookchair.
It can be tared by using the 'make all' option if the bookchair is working with
a non-START package, or by clicking on the All button in the Generate tab
if the bookchair is working with the START platform.
The Makefile_pubchair simply re-create the cdrom images of individual workshops,
but only for the purpose of collecting and joining information of all workshops.
The same makefile can also be used to generate XML files for archiving the
proceedings to the ACL Anthology.
Trace the Makefile_pubchair file for more details.
[Procedures for Generating Unified Author Index and ACL Anthology XML Files]
0. create the working directory for you
md unified ; cd unified
md books
1. copy all latest "proceedings" directories from
book chairs into an appropriate subdirectories
(named after the "abbrev" field of the "meta" file of the respective workshop.)
cp -a $wsd/ws$nn.$abbrev/$latest_date/proceedings $abbrev
Ensure that all symbolic links are linking to appropriate
files or directories if the link-to files and
directories are not copied as well.
# to list all symbolic links:
find . -type l -ls
2. create an empty 'isbn.eps' for each subdirectory,
which seems to be necessary at the end of the
'make all' step
shin@nlp [3485] >> foreach book ( `\ls` )
foreach? touch $book/isbn.eps
foreach? end
3. use the $ACLPUB/make/Makefile_pubchair as your
Makefile
cp $ACLPUB/make/Makefile_pubchair Makefile
vi Makefile
# change the name of your unified BibTex file,for instance:
# unified = ACL-2010-with-workshops
4. do a 'make all' and respond to all errors,
repeatedly untill all errors are
resloved
make all >&! log.MAKE.ALL
more log.MAKE.ALL
common 'errors' to respond:
- touch $book/db
- touch $book/cdrom/{index.html,program.html,authors.html}
- touch $book/{toc.tex,program.tex,allpapers.tex}
* to use current bookchairs' settings
without doing your local changes
simply tell Makefile that all the
dependent files are newest
by touching them
* If you did changed the local settings of some
bookchair's subdirectory, you should
be very careful with their effects
on the above files. If otherwise,
you may accidentally changed
"db" and thus the order of the papers
in the proceedings.
* You can cd into individual book directory,
make local modification to "db" file
(using LaTex special characters, if necessary)
and 'make cdrom' for creating $book/cdrom/*.html
if your only purpose is to
create the unified author index.
You then go back to the upper directory
and re-generate the unified author index.
- cd $book
- mv cdrom cdrom.prev_version
- md tex.prev_version
- mv toc.tex program.tex allpapers.tex advertisement.html tex.prev_version
- make cdrom
- diff -rwb cdrom.prev_version cdrom # make sure
- cd ..
- make all
5. output
cdrom/$unified - unified BibTex files
- concatenation of $book/*.bib in the order they appear
inthe unified (root directory) index.html
- You can also create it as follows:
- 0. touch $allBiB
- 1. set allbooks = ( `grep '/index.html' index.html | sed 's|.*href=\"\(.*\)/index.html.*|\1|'` )
- get book names sorted by their order in $root/index.html
- 2. foreach book ( $allbooks )
- 2.1 cat $book/*.bib >> $allBiB
- 2.2 end
cdrom/authors.html - unified author index
6. merge entries for authors with multiple accepted papers but with
different spellings for their names
- view the initial authors.html, and check adjacent entries
to see if they should actually be the same author
- some author names are spelled with first name and last name inverted
- e.g., Chinese names (confusion about first/last names sometimes happen)
- e.g., "Huang, Chu-Ren" vs. "Chu-Ren, Huang"
- some authors used different capitalization convention to spell their names
- e.g., "Lu, Bao-liang" vs "LU, Bao-Liang"
- fully capitalized last name
- especially Chinese or Japanese names
- capitalized name after the hyphens
- some names include middle name initials, some don't
- e.g., "Radev, Dragomir R." vs "Radev, Dragomir"
- e.g., "Kate, Rohit J." vs "Kate, Rohit"
- some names have a "." at the end of their middle name initials, and some don't
- e.g., "Smith, Noah A." vs "Smith, Noah A"
- some names used different first names
- e.g., "Jurafsky, Daniel" vs "Jurafsky, Dan"
- some punctuations (accent marks) are removed, some aren't
- e.g., "Tsujii, Jun'ichi" vs. "Tsujii, Junichi"
- some use accent characters, some don't
- "López de Lacalle, Oier" vs. "Lopez de Lacalle, Oier"
- "Bojar, Ond\v{r}ej" vs. "Bojar, Ondrej"
- "Tsou, Benjamin K." vs. "K. Tsou, Benjamin"
- unconverted accent characters (not currently supported by ACLPUB scripts)
vs their ASCII norm entered by authors
- "Chrupa{\l}a, Grzegorz => Chrupaa, Grzegorz" vs "Chrupala, Grzegorz"
- small letter 'l' with stroke
= "Durgar El-Kahlout, \.{I}lknur => Durgar El-Kahlout, \.Ilknur" vs "Durgar El-Kahlout, Ilknur"
- small/uppercase letter 'I' with dot above
- may need manual editing after creating unified author index
- special names that need manuall editing
- "Mausam, Mausam": author with one single name "Mausam" but enter into
both "lastname" and "firtname" fields to fit ACLPUB's
submission form
- correct output: "Mausam" (one name only)
- "Wei, Wei": author whose lastname and firstname have the same
transliterations
- correct output: "Wei, Wei" (as-is)
- different authors but with the same (normal/abbreviated/transliterated) name ??
- likely to happen in the future
- should not be merged, but should be given different
internal ID's in internal "symbol table" (name list)
- may need manual editing after creating unified author index