NLP220: Data Science and Machine Learning Fundamentals

Covers a broad set of tools and core skills required for working with Natural Language Data. Covers core traditional machine learning methods such as classification methods using Naive Bayes, SVMs, Linear regression and Support Vector Regression, as well as the use of Pytorch and other programming frameworks commonly used in the field. Also includes methods used for collecting, merging, cleaning, structuring and analyzing the properties of large and heterogeneous datasets of natural language, in order to address questions and support applications relying on those data. Course covers working with existing corpora as well as the challenges in collecting new corpora. (Formerly Data Collection, Wrangling and Crowdsourcing.) Enrollment is restricted to NLP MS students and CSE Phd students by permission of instructor.

5 credits

Year	Fall	Winter	Spring	Summer
2025-26	Section 50 Luca De Alfaro (luca)
2024-25	Section 50 Jalal Mahmud (jumahmud)
2023-24	Section 50 Jalal Mahmud (jumahmud)

While the information on this web site is usually the most up to date, in the event of a discrepancy please contact your adviser to confirm which information is correct.