Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

Created by
  • Haebom

Author

Simon Roschmann, Quentin Bouniot, Vasilii Feofanov, Ievgen Redko, Zeynep Akata

Outline

This paper introduces the Time Vision Transformer (TiViT) framework, which is proposed to solve the problem of developing time series-based models (TSFMs) for important time series classification tasks in medical and industrial domains due to the lack of publicly available time series datasets. TiViT transforms time series into images, leveraging the representational capabilities of a fixed Vision Transformer (ViT) pretrained on large image datasets. Theoretically, we demonstrate that 2D patching can increase the number of tokens associated with labels and reduce sample complexity, and experimentally demonstrate that it achieves state-of-the-art performance on standard time series classification benchmarks using the hidden representation of a large OpenCLIP model. Furthermore, we analyze the structure of the TiViT representation and find that the intermediate layers with high intrinsic dimensionality are most effective for time series classification, and evaluate the alignment between the TiViT and TSFM representation spaces to verify that the performance can be improved through complementary features. In conclusion, we present a new direction for reusing visual representations in non-visual domains. The code is available at https://github.com/ExplainableML/TiViT .

Takeaways, Limitations

Takeaways:
Contributes to improving time series classification performance by solving the problem of lack of time series data through image transformation techniques.
We present the possibility of building an efficient time series classification model by utilizing pre-trained ViT.
Confirmation of the possibility of performance improvement through complementary features of TiViT and existing TSFM.
Presenting new possibilities for reusing visual representations in non-visual domains.
Limitations:
Further research is needed on the generalization performance of the method presented in this paper.
Further experimental validation is needed for various types of time series data.
Need to analyze information loss that may occur during image conversion.
Consideration should be given to the dependency on specific image datasets and the resulting bias issues.
👍