Compiling proceedings for ingestion to the ACL anthology#
When compiling proceedings in ACL format, one has to set up the output format accordingly in config.toml:
out_format = "acl" # Output format
Two other pieces of information are needed for the output to be well-formed for ingestion:
anthology_id = "jeptalnrecital" # Anthology ID bilingual = true # Bilingual bibtex fields
The anthology_id
(also known as venue identifier) is provided by the Director of the ACL Anthology (see submission procedure) and is used in the file names of the generated pdf and bib files.
The bilingual
option can be enabled to add a "language" field to the bib files to indicate whether the paper is in English or in French.
Note that language detection is done by calling the python langdetect library on the paper's title.
Overview of the process#
When compiling proceedings in ACL format, taln2x builds a directory adhering to the following structure (example taken from the ACL anthology documentation):
proceedings/ meta Conference information cdrom/ semeval-2018.bib Bib entries (all papers) semeval-2018.pdf PDF of whole proceedings bib/ 2018.semeval-1.0.bib BibTeX entry for volume 2018.semeval-1.1.bib BibTeX entry for paper 1 2018.semeval-1.2.bib etc. pdf/ 2018.semeval-1.0.pdf PDF of frontmatter 2018.semeval-1.1.pdf PDF for paper 1 2018.semeval-1.2.pdf etc.
Basically this directory contains PDF and Bib files for both the full proceedings and the articles themselves. Note that if these were compiled by taln2x, you can simply reuse the same project as the one used to compile these, update config.toml and then re-run taln2x. The proceedings in ACL format will also appear under out/
.
On top of these PDF and Bib files, a plain text file named meta of the following form is added:
abbrev SemEval volume 1 title 12th International Workshop on Semantic Evaluation booktitle Proceedings of the 12th International Workshop on Semantic Evaluation shortbooktitle Proceedings of SemEval month January year 2018 sig siglex chairs Marianna Apidianaki chairs Mohammad, Saif M. chairs Jonathan May chairs Ekaterina Shutova chairs Steven Bethard chairs Marine Carpuat location Berlin, Germany publisher Association for Computational Lingustics
This file is compiled by slightly reformating the pieces of information contained in event.yml.