AIM-WEST visit to UFSCar and USP

Carlos Ramisch visited São Carlos UFSCar and USP from Aug 30 to Sep 02. The trip was partly funded by AIM-WEST. Extra funding was provided by PARSEME (FR->BR trip) and USP.

dsc04369  dsc04368

dsc04365  dsc04362

Aug 30 – Morning

  • Meeting at LALIC with Helena Caseli
    • Discussion of current research activities
    • Achieved and in-progress AIM-WEST project goals
    • Planning of submission to CAPES-COFECUB – ideas around semantic processing and creation/enrichment of semantic resources

Aug 30 – Afternoon

Meeting at LALIC with Helena Caseli and MsC candidate Natalie Vargas

  • Joint master supervision
    • Parallel corpora to mine MWes and their translations
    • Focus on light verb constructions in PT and EN
    • Pilot experiment:
      • Use list of LVCs in PT (Duran et al) and EN (Tu et al) from http://multiword.sf.net
      • Use FAPEST corpus
      • Locate entries of lists in corpora
      • Check their frequencies and translations manually
      • Evaluate whether we pursue the experiments on LVCs or switch to other construction

Aug 31 – Morning

10:30 Seminar at DC – UFSCar – On the interplay between complex function words and parsing

Carlos Ramisch (LIF, Aix Marseille Université – France) http://pageperso.lif.univ-mrs.fr/~carlos.ramisch/

Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multiword expressions identification, in which complex function words are represented as individual tokens linked with morphological dependencies. Our graph-based parser includes standard second-order features and verbal subcategorization features derived from a syntactic lexicon.We train it on a modified version of the French Treebank enriched with morphological dependencies. It recognizes 81.79% of ADV+que conjunctions with 91.57% precision, and 82.74% of de+DET determiners with 86.70% precision.

Slides: acl2015-ufscar

Aug 31 – Afternoon

  • Work at LALIC – UFSCar
    • Adaptation to English of Never-Ending learning of MWEs system (A. Rondon’s bachelor thesis – TCC) by H. Caseli
    • Creation of EN index from UKWaC fragment to extract association measures as features

Sep 01 – Morning

9:00 – 12:00 – Course on mwetoolkit NILC – ICMC – USP

  • Introduction to the command line
  • Use of mwetoolkit to transform corpora
  • Multi-level regular expressions for searches

Slides and course materials avaliable here

Sep 01 – Afternoon

MsC qualification defense USP – Francielle Vargas – member of jury

Sep 02 – Morning

9:00 – 12:00 – Course on mwetoolkit NILC – ICMC – USP

  • Extraction of candidates
  • Annotation of corpora
  • Extensions to use word embeddings, contrast, etc

Sep 02 – Afternoon

Discussion with NILC students and presentation of their work and group projects/resources. All work on PT language processing.

  • Nathan Hartmann – semantic role labeler
  • Enrico ? – Sentiment analysis using word embeddings composition
  • Thales ? – Twitter and product reviews normalization
  • ? – AMR for semantic parsing
  • ? – Complex networks for WSD
  • Francielle Vargas – ontology mining from opinative texts