top of page

Google Cloud : Leveraging Unstructured Data with Dataproc & GCP

  • Explored methods to extract insights from unstructured data sources such as logs, text, images, or streaming data

  • Learned how to integrate unstructured data into analytics pipelines to augment decision-making

  • Applied processing techniques to clean, transform, and analyse unstructured formats in a cloud environment

  • Demonstrated how unstructured data improves model richness, enhances feature sets, and supports richer reporting

​​

Tools & Techniques:

  • Apache Spark / Dataproc for distributed processing of unstructured data

  • Text processing & NLP techniques (e.g. tokenization, TF-IDF, embeddings)

  • Log parsing, pattern matching, and regular expressions

  • Data pipelines combining structured + unstructured sources

  • Cloud tools on GCP to orchestrate storage, processing, and integration

bottom of page