[HowTo Create Unified Author Index and ACL Anthology XML Files]

	- Jing-Shin Chang (jshin@csie.ncnu.edu.tw)
	- Revised from: 2010/06/26 08:01:53
		- http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.txt
	- Online version:
		- Last Update: 2011/05/20
		- http://nlp.csie.ncnu.edu.tw/~shin/acl2010/publication/howto/aclpub/HowTo.Create.Unified.Author.Index.html

[Introduction - with Examples for ACL-2010]

The root directory of the final CDROM image includes the following files
primarily to provide a unified author index (across various conferences)
and a unified BIB data (combining individual .bib files).

The publication chair of the main conference is responsible for creating
these files. Fortunately, the makefile, $ACLPUB/make/Makefile_pubchair,
automates the creation process to a large extent.

$ACLPUB/templates/cdrom-root-files/ of the official ACLPUB
package contains some of the templates files for the latest ACL
conference. But you may also want to do it manually for some of the files.


	- index.html : root index, re-directing users to */index.html
		of various subdirectories for workshops and main conference

		- $ACLPUB/templates/mainindex.html.head can be modified
			and used as a templete for creating the root index
			file automatically using the procedure in this guide.

		- you can also copy any previous ACL CDROM root index.html
			and modify it directly. 

	- authors.html : unified author index, re-directing authors
			 to their papers in all workshops and main conference
		- can be created almost automatically as outlined in the next section
		- inconsist spelling of author names may result in multiple entries
			for the same author
		- manual correction (across workshops) is necessary before generating
			the unified author index

	- ACL-2010-with-workshops.bib : combined BIB data file
		- is simply a concatenated version of */bib/*000.bib,
			in the same order as they appear in the root index.html
		- can be created almost automatically as outlined in the next section
		- remember to make necessary manual fixes to .../bib/*000.bib before creation

	- acl2010-*.(gif|jpg) : auxiliary LOGO image(s) for root index.html
		- create your own and put them into $ACLPUB/templates/cdrom-root-files/

	- acl-2010.ico : "favicon", icon image to be displayed in the URL field of a browser
		- optional, not available before ACL-2010
		- create your own and put it/them into $ACLPUB/templates/cdrom-root-files/
		- generated using some free favicon generator (such as http://www.favicon.cc/)
		- add a "" tag
			(with correct path name) in the header part of the root index.html
			(and any other *.html files, if necessary)

	- standard.css : standard style file used by all *.html files in the CDROM
		- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
		- can be modified manually to change the style of the html files
			(such as background colors, font attributes)

	- autorun.inf : autorun file for automatically browse the root index.html
		- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
		- use it as-is unless you want to change the name of the root index.html
			or do other things automatically when inserting the CDROM into your laptop

	- README.txt : README file on how to start browsing the CDROM
		- available in $ACLPUB/templates/cdrom-root-files/ (and previous ACL CDROM's)
		- use it as-is; modification is optional

With the help of the $ACLPUB/make/Makefile_pubchair, you can generate
the unified author index, unified bibdata in an almost automatic
manner as follows. But you need the 'proceedings.tgz' files from
all the workshops bookchairs in order to unified author information.

The 'proceedings.tgz' includes the complete working directory of a bookchair.
It can be tared by using the 'make all' option if the bookchair is working with
a non-START package, or by clicking on the All button in the Generate tab
if the bookchair is working with the START platform.

The Makefile_pubchair simply re-create the cdrom images of individual workshops,
but only for the purpose of collecting and joining information of all workshops.
The same makefile can also be used to generate XML files for archiving the
proceedings to the ACL Anthology.

Trace the Makefile_pubchair file for more details.

[Procedures for Generating Unified Author Index and ACL Anthology XML Files]

0. create the working directory for you

	md unified ; cd unified
	md books

1. copy all latest "proceedings" directories from
	book chairs into an appropriate subdirectories
	(named after the "abbrev" field of the "meta" file of the respective workshop.)

	cp -a $wsd/ws$nn.$abbrev/$latest_date/proceedings $abbrev

	Ensure that all symbolic links are linking to appropriate
	files or directories if the link-to files and
	directories are not copied as well.

	# to list all symbolic links:
	find . -type l -ls

2. create an empty 'isbn.eps' for each subdirectory,
	which seems to be necessary at the end of the
	'make all' step

	shin@nlp [3485] >> foreach book ( `\ls` )
	foreach? touch $book/isbn.eps
	foreach? end

3. use the $ACLPUB/make/Makefile_pubchair as your
	Makefile

	cp $ACLPUB/make/Makefile_pubchair Makefile
	vi Makefile
	# change the name of your unified BibTex file,for instance:
	# unified = ACL-2010-with-workshops

4. do a 'make all' and respond to all errors,
	repeatedly untill all errors are
	resloved

	make all >&! log.MAKE.ALL
	more log.MAKE.ALL

	common 'errors' to respond:

	- touch $book/db
	- touch $book/cdrom/{index.html,program.html,authors.html}
	- touch $book/{toc.tex,program.tex,allpapers.tex}

	* to use current bookchairs' settings
		without doing your local changes
		simply tell Makefile that all the
		dependent files are newest
		by touching them

	* If you did changed the local settings of some
		bookchair's subdirectory, you should
		be very careful with their effects
		on the above files. If otherwise,
		you may accidentally changed
		"db" and thus the order of the papers
		in the proceedings. 

	* You can cd into individual book directory,
		make local modification to "db" file
		(using LaTex special characters, if necessary)
		and 'make cdrom' for creating $book/cdrom/*.html
		if your only purpose is to
		create the unified author index.
		You then go back to the upper directory
		and re-generate the unified author index.

		- cd $book
		- mv cdrom cdrom.prev_version
		- md tex.prev_version
		- mv toc.tex program.tex allpapers.tex advertisement.html tex.prev_version
		- make cdrom
		- diff -rwb cdrom.prev_version cdrom # make sure
		- cd ..
		- make all

5. output

	cdrom/$unified	- unified BibTex files
		- concatenation of $book/*.bib in the order they appear
		inthe  unified (root directory) index.html

		- You can also create it as follows:

		- 0. touch $allBiB
		- 1. set allbooks = ( `grep '/index.html' index.html | sed 's|.*href=\"\(.*\)/index.html.*|\1|'` )
			- get book names sorted by their order in $root/index.html
		- 2. foreach book ( $allbooks )
		- 2.1 cat $book/*.bib >> $allBiB
		- 2.2 end

	cdrom/authors.html	- unified author index

6. merge entries for authors with multiple accepted papers but with
	different spellings for their names

	- view the initial authors.html, and check adjacent entries
		to see if they should actually be the same author

	- some author names are spelled with first name and last name inverted
		- e.g., Chinese names (confusion about first/last names sometimes happen)
		- e.g., "Huang, Chu-Ren" vs. "Chu-Ren, Huang"

	- some authors used different capitalization convention to spell their names
		- e.g., "Lu, Bao-liang" vs "LU, Bao-Liang"
		- fully capitalized last name
			- especially Chinese or Japanese names
		- capitalized name after the hyphens

	- some names include middle name initials, some don't
		- e.g., "Radev, Dragomir R." vs "Radev, Dragomir"
		- e.g., "Kate, Rohit J." vs "Kate, Rohit"

	- some names have a "." at the end of their middle name initials, and some don't
		- e.g., "Smith, Noah A." vs "Smith, Noah A"

	- some names used different first names
		- e.g., "Jurafsky, Daniel" vs "Jurafsky, Dan"

	- some punctuations (accent marks) are removed, some aren't
		- e.g., "Tsujii, Jun'ichi" vs. "Tsujii, Junichi"

	- some use accent characters, some don't
		- "López de Lacalle, Oier" vs. "Lopez de Lacalle, Oier"
		- "Bojar, Ond\v{r}ej" vs. "Bojar, Ondrej"

	- "Tsou, Benjamin K." vs. "K. Tsou, Benjamin"

	- unconverted accent characters (not currently supported by ACLPUB scripts)
		vs their ASCII norm entered by authors

		- "Chrupa{\l}a, Grzegorz => Chrupaa, Grzegorz" vs "Chrupala, Grzegorz"
			- small letter 'l' with stroke
		= "Durgar El-Kahlout, \.{I}lknur => Durgar El-Kahlout, \.Ilknur" vs "Durgar El-Kahlout, Ilknur"
			- small/uppercase letter 'I' with dot above

		- may need manual editing after creating unified author index

	- special names that need manuall editing
		- "Mausam, Mausam": author with one single name "Mausam" but enter into
			both "lastname" and "firtname" fields to fit ACLPUB's
			submission form
			- correct output: "Mausam" (one name only)
		- "Wei, Wei": author whose lastname and firstname have the same
			transliterations
			- correct output: "Wei, Wei" (as-is)

		- different authors but with the same (normal/abbreviated/transliterated) name ??
			- likely to happen in the future
			- should not be merged, but should be given different
				internal ID's in internal "symbol table" (name list)

		- may need manual editing after creating unified author index