Compiling proceedings for upload to the TALN-archives#
The TALN-archives are a web-based collection of NLP papers, they are somehow comparable to the ACL anthology but focus on French-speaking conferences (namely the TALN and RECITAL conferences and their colocated workshops). They have been built semi-automatically by Boudin (2013).
Concreteley, they correspond to a collection of PDF, Bib and XML files from which the full archives (web portal) is compiled. Their sources are available on gitlab.
In order to compile proceedings in TALN-archives format, one needs to set up the corresponding option in config.toml:
out_format = "taln" # Output format
Overview of the process#
Concretely taln2x will compute, a directory having the following structure:
. ├── actes | ├── taln-2023-long-001.pdf #Pdf of paper 001 | └── taln-2023-long-002.pdf #Pdf of paper 002, etc. ├── bib | ├── taln-2023-long-001.bib #Bib of paper 001 | └── taln-2023-long-002.bib #Bib of paper 002, etc. ├── taln-2023.bib #Bib of all papers └── taln-2023.xml #Event metadata
It will contain the conference metadata in both XML and Bib format, and the papers data in Bib and Pdf formats.
Please note that all tracks will be merged in a single event. Should you want to compile separate directories for tracks, you will need to comment out tracks in event.yml so that each track is compiled in a separate run of taln2x.
Using TALN-archives format as an input format#
Please note that the TALN-archives were used to ingest TALN and RECITAL conferences which took place before 2012 to the ACL anthology. To do so, taln2x was extended in order to be able not only to compile but also to parse proceedings in TALN-archives format.
To use this feature, just add the path to the conference XML file in config.toml as follows:
xml_file = "/tmp/TALN-2001/taln-2001.xml"