[HowTo Create Unified Author Index and ACL Anthology XML Files] - Jing-Shin Chang (jshin@csie.ncnu.edu.tw) - Revised from: 2010/06/26 08:01:53 - http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.txt - Online version: - Last Update: 2011/05/20 - http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.html [Introduction - with Examples for ACL-2010] The root directory of the final CDROM image includes the following files primarily to provide a unified author index (across various conferences) and a unified BIB data (combining individual .bib files). The publication chair of the main conference is responsible for creating these files. Fortunately, the makefile, $ACLPUB/make/Makefile_pubchair, automates the creation process to a large extent. $ACLPUB/templates/cdrom-root-files/ of the official ACLPUB package contains some of the templates files for the latest ACL conference. But you may also want to do it manually for some of the files. - index.html : root index, re-directing users to */index.html of various subdirectories for workshops and main conference - $ACLPUB/templates/mainindex.html.head can be modified and used as a templete for creating the root index file automatically using the procedure in this guide. - you can also copy any previous ACL CDROM root index.html and modify it directly. - authors.html : unified author index, re-directing authors to their papers in all workshops and main conference - can be created almost automatically as outlined in the next section - inconsist spelling of author names may result in multiple entries for the same author - manual correction (across workshops) is necessary before generating the unified author index - ACL-2010-with-workshops.bib : combined BIB data file - is simply a concatenated version of */bib/*000.bib, in the same order as they appear in the root index.html - can be created almost automatically as outlined in the next section - remember to make necessary manual fixes to .../bib/*000.bib before creation - acl2010-*.(gif|jpg) : auxiliary LOGO image(s) for root index.html - create your own and put them into $ACLPUB/templates/cdrom-root-files/ - acl-2010.ico : "favicon", icon image to be displayed in the URL field of a browser - optional, not available before ACL-2010 - create your own and put it/them into $ACLPUB/templates/cdrom-root-files/ - generated using some free favicon generator (such as http://www.favicon.cc/) - add a "" tag (with correct path name) in the header part of the root index.html (and any other *.html files, if necessary) - standard.css : standard style file used by all *.html files in the CDROM - available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's) - can be modified manually to change the style of the html files (such as background colors, font attributes) - autorun.inf : autorun file for automatically browse the root index.html - available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's) - use it as-is unless you want to change the name of the root index.html or do other things automatically when inserting the CDROM into your laptop - README.txt : README file on how to start browsing the CDROM - available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's) - use it as-is; modification is optional With the help of the $ACLPUB/make/Makefile_pubchair, you can generate the unified author index, unified bibdata in an almost automatic manner as follows. But you need the 'proceedings.tgz' files from all the workshops bookchairs in order to unified author information. The 'proceedings.tgz' includes the complete working directory of a bookchair. It can be tared by using the 'make all' option if the bookchair is working with a non-START package, or by clicking on the All button in the Generate tab if the bookchair is working with the START platform. The Makefile_pubchair simply re-create the cdrom images of individual workshops, but only for the purpose of collecting and joining information of all workshops. The same makefile can also be used to generate XML files for archiving the proceedings to the ACL Anthology. Trace the Makefile_pubchair file for more details. [Procedures for Generating Unified Author Index and ACL Anthology XML Files] 0. create the working directory for you md unified ; cd unified md books 1. copy all latest "proceedings" directories from book chairs into an appropriate subdirectories (named after the "abbrev" field of the "meta" file of the respective workshop.) cp -a $wsd/ws$nn.$abbrev/$latest_date/proceedings $abbrev Ensure that all symbolic links are linking to appropriate files or directories if the link-to files and directories are not copied as well. # to list all symbolic links: find . -type l -ls 2. create an empty 'isbn.eps' for each subdirectory, which seems to be necessary at the end of the 'make all' step shin@nlp [3485] >> foreach book ( `\ls` ) foreach? touch $book/isbn.eps foreach? end 3. use the $ACLPUB/make/Makefile_pubchair as your Makefile cp $ACLPUB/make/Makefile_pubchair Makefile vi Makefile # change the name of your unified BibTex file,for instance: # unified = ACL-2010-with-workshops 4. do a 'make all' and respond to all errors, repeatedly untill all errors are resloved make all >&! log.MAKE.ALL more log.MAKE.ALL common 'errors' to respond: - touch $book/db - touch $book/cdrom/{index.html,program.html,authors.html} - touch $book/{toc.tex,program.tex,allpapers.tex} * to use current bookchairs' settings without doing your local changes simply tell Makefile that all the dependent files are newest by touching them * If you did changed the local settings of some bookchair's subdirectory, you should be very careful with their effects on the above files. If otherwise, you may accidentally changed "db" and thus the order of the papers in the proceedings. * You can cd into individual book directory, make local modification to "db" file (using LaTex special characters, if necessary) and 'make cdrom' for creating $book/cdrom/*.html if your only purpose is to create the unified author index. You then go back to the upper directory and re-generate the unified author index. - cd $book - mv cdrom cdrom.prev_version - md tex.prev_version - mv toc.tex program.tex allpapers.tex advertisement.html tex.prev_version - make cdrom - diff -rwb cdrom.prev_version cdrom # make sure - cd .. - make all 5. output cdrom/$unified - unified BibTex files - concatenation of $book/*.bib in the order they appear inthe unified (root directory) index.html - You can also create it as follows: - 0. touch $allBiB - 1. set allbooks = ( `grep '/index.html' index.html | sed 's|.*href=\"\(.*\)/index.html.*|\1|'` ) - get book names sorted by their order in $root/index.html - 2. foreach book ( $allbooks ) - 2.1 cat $book/*.bib >> $allBiB - 2.2 end cdrom/authors.html - unified author index 6. merge entries for authors with multiple accepted papers but with different spellings for their names - view the initial authors.html, and check adjacent entries to see if they should actually be the same author - some author names are spelled with first name and last name inverted - e.g., Chinese names (confusion about first/last names sometimes happen) - e.g., "Huang, Chu-Ren" vs. "Chu-Ren, Huang" - some authors used different capitalization convention to spell their names - e.g., "Lu, Bao-liang" vs "LU, Bao-Liang" - fully capitalized last name - especially Chinese or Japanese names - capitalized name after the hyphens - some names include middle name initials, some don't - e.g., "Radev, Dragomir R." vs "Radev, Dragomir" - e.g., "Kate, Rohit J." vs "Kate, Rohit" - some names have a "." at the end of their middle name initials, and some don't - e.g., "Smith, Noah A." vs "Smith, Noah A" - some names used different first names - e.g., "Jurafsky, Daniel" vs "Jurafsky, Dan" - some punctuations (accent marks) are removed, some aren't - e.g., "Tsujii, Jun'ichi" vs. "Tsujii, Junichi" - some use accent characters, some don't - "López de Lacalle, Oier" vs. "Lopez de Lacalle, Oier" - "Bojar, Ond\v{r}ej" vs. "Bojar, Ondrej" - "Tsou, Benjamin K." vs. "K. Tsou, Benjamin" - unconverted accent characters (not currently supported by ACLPUB scripts) vs their ASCII norm entered by authors - "Chrupa{\l}a, Grzegorz => Chrupaa, Grzegorz" vs "Chrupala, Grzegorz" - small letter 'l' with stroke = "Durgar El-Kahlout, \.{I}lknur => Durgar El-Kahlout, \.Ilknur" vs "Durgar El-Kahlout, Ilknur" - small/uppercase letter 'I' with dot above - may need manual editing after creating unified author index - special names that need manuall editing - "Mausam, Mausam": author with one single name "Mausam" but enter into both "lastname" and "firtname" fields to fit ACLPUB's submission form - correct output: "Mausam" (one name only) - "Wei, Wei": author whose lastname and firstname have the same transliterations - correct output: "Wei, Wei" (as-is) - different authors but with the same (normal/abbreviated/transliterated) name ?? - likely to happen in the future - should not be merged, but should be given different internal ID's in internal "symbol table" (name list) - may need manual editing after creating unified author index