CLPsych 2021 Shared Task: Team Registration
Thanks for your interest in participating in the The Shared Task for CLPsych 2021!  The task, including schedule, practice data, etc. is described here: https://seanmacavaney.github.io/clpsych2021-shared-task/.


OVERVIEW

This year, the shared task is breaking new ground for community-level participation in mental health research using language data.  Even when de-identified, language data still needs to be considered very sensitive, and sharing sensitive healthcare data can create significant practical obstacles in terms of ethical review, privacy concerns, and legal paperwork. In collaboration with the University of Maryland, NORC at the University of Chicago has created the UMD/NORC Mental Health Data Enclave (or just the Enclave, for short), a secure computing environment backed on AWS where the key idea is that researchers come to the data, rather than vice-versa.

With internal support from NORC, and additional support from Amazon and the University of Maryland's AI + Medicine for High Impact program, this year we will be running the shared task on the Enclave rather than sending data out to researchers.  We will be using social media data connected with people who have made suicide attempts, generously provided by Qntfy, who run the OurDataHelps.org data donation platform, and the task will involve prediction of attempts.  To our knowledge this is the first-ever opportunity for multiple teams to be working in sync with each other on sensitive data related to suicide outcomes.


HOW TO PARTICIPATE

To request participation, please do the following:

 • You should choose a concise team name that does not contain spaces.

 • **The contact person for your team** should fill out this team information form. (If you're a one-person team, that's ok: you're both the contact person and the individual team member.)

 • **Each individual member of your team** should do the following, making sure all members of your team use the same (identical!) team name:

       o   Fill out and submit the account request form at https://docs.google.com/forms/d/e/1FAIpQLSe43xvI1pcPHjMIL28MeCp2IU7j02u_l_ljEmHl7A03tQsClA/viewform?usp=sf_link to generate an individual account-creation request for the Enclave.

       o  Send an email to clpsych-2021-shared-task-organizers@googlegroups.com with subject line "CLPsych shared task: <team_name>" with the following PDF attachments:
  - Signed NORC non-disclosure agreement https://github.com/seanmacavaney/clpsych2021-shared-task/raw/main/clpsych2021_NORC_nondisclosure_agreement.pdf
  - Signed NORC data use agreement https://raw.githubusercontent.com/seanmacavaney/clpsych2021-shared-task/main/norc_dua.pdf

 • Processing these forms and creating accounts at NORC may take on the order of 4-5 days.  While your team is waiting, you can do the following:

o Begin using the shared task Google Group, https://groups.google.com/g/clpsych-2021-shared-task. We view this as a community-level shared activity, not a competition, so teams are highly encouraged to communicate with each other.

o Begin developing your approach using the practice dataset, https://seanmacavaney.github.io/clpsych2021-shared-task/#practice-data.

o Begin looking at the Enclave Training and Reference Guide https://raw.githubusercontent.com/seanmacavaney/clpsych2021-shared-task/main/Data_Enclave_Training_Guide.pdf

 • Once your account is created and active, you will receive email from NORC. At that point:

o You should follow the directions in the Enclave Training and Reference Guide to log in.
o You can follow the instructions in the guide for how to get code or other resources onto the Enclave.
o A variety of python packages are available via pip install on the Enclave. If you need additional packages, contact NORC support.

 • Finally, the fun part!  Please work on doing the best possible job on the prediction task, and learning as much as you can that will help the community tackle this important problem.  See the Shared Task schedule at https://seanmacavaney.github.io/clpsych2021-shared-task/ for when you can expect us to share test data and instructions for how to return your results to us for evaluation.


OTHER IMPORTANT INFO

 • Teams participating in this shared task affirm having read Benton et al. (2017), "Ethical research protocols for social media health research", and commit to its broad ethical principles. (We also strongly encourage reading Chancellor et al., 2019, "A taxonomy of ethical tensions in inferring mental health states from social media").  

 • Data available on the Enclave will include not only the primary shared task data (de-identified data from OurDataHelps, provided by Qntfy), but also the UMD Reddit Suicidality Dataset, which could potentially be useful for pretraining, transfer learning, or other purposes.  

 • The Enclave Training and Reference Guide discusses how to request export materials (e.g. code that you've updated there) off the Enclave. Please note that no data, models, or resources (e.g. term lists) will be permitted to be taken off the Enclave, except for small examples to use for illustrative purposes in papers and presentations.  Because both upload requests and export requests require review (this is part of what makes the Enclave secure and therefore suitable for sensitive data!), we recommend budgeting in time in both directions.

 • Please note that participating teams will receive a budget of AWS EC2 credits for working on the shared task that is expected to be ample for participation.  It is possible (although unlikely) that we may need to impose a cap on the number of participating teams in order to ensure that everyone has sufficient resources.


PUBLICATION

All participating teams will be invited to submit a short system overview paper, to be reviewed per the Shared Task schedule.  (Please note that although the individual Data Use Agreement officially requires pre-submission review by Qntfy, that review will take place *as part of* the normal review process for shared task system overview papers, so nothing needs to be submitted to them in advance.)  Teams that do not submit a system paper will not be included in the shared task overview paper or workshop discussion.

Teams participating in the task also agree to include the following in any publications, whether for this workshop or in the future,  containing information or results derived from work on the shared task:

      • A citation of the shared task overview paper. (Reference will be provided to teams to include in their papers.)

     •  A citation of Coppersmith, Glen, Ryan Leary, Patrick Crutchley, and Alex Fine. "Natural language processing of social media as screening for suicide risk." Biomedical informatics insights 10 (2018): 1178222618792860. (This is the best current citation for Qntfy's OurDataHelps data donation platform.)

     •  An acknowledgement of the UMD/NORC Mental Health Data Enclave, e.g.  “The author(s) acknowledge the UMD/NORC Mental Health Data Enclave for providing researcher support and access to the data used in this research.”  (However, this shall in no way be construed as an endorsement of the Requestor’s work by NORC.)

Teams that use the UMD Reddit Suicidality Dataset also agree to cite the following papers:

     •  Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daumé III, and Philip Resnik, "Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings", Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pages 25–36, New Orleans, Louisiana, June 5, 2018.

     •  Ayah Zirikly, Philip Resnik, Özlem Uzuner, and Kristy Hollingshead. 2019. CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology (CLPsych'19), Minneapolis, June 6, 2019.
Sign in to Google to save your progress. Learn more
Team Name (brief and should not contain spaces) *
All Team Members (names, separated by commas) *
Primary Team Contact Name *
Primary Team Contact Email Address *
Primary Team Contact Affiliation/Organization *
The following questions are to help us prepare the Enclave with relevant tools.  We cannot guarantee that everything you want will be available, but the Enclave is already pre-loaded with thousands of the most relevant Python packages for NLP/machine learning research, and there is a process for requesting that NORC add other packages (albeit no guarantee in advance that they can do so); please see the Enclaving Training and Reference Guide for information.
Which programming languages (including versions) do you expect to use? (one per line)
Which software libraries do you expect to use? (one per line)
Do you plan to use a pre-trained model (such as GloVe, BERT, T5, etc.)? If so, please specify the version and the software library that you plan to use it with. (one per line)
Submit
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy