top of page

Tools for Content Analysis in Blockchain Studies

Updated: Oct 12

ree

The State of the Art

Blockchain isn’t just about numbers, transactions, or cryptographic protocols. Increasingly, researchers look at text data—whitepapers, tweets, news, forum discussions, and even academic papers—to understand how blockchain projects develop, how investors react, and how the technology is perceived.

A recent systematic review of 124 studies (Zhuo et al., 2024) shows that text analysis in blockchain research is booming. Two dominant approaches stand out:

  • Sentiment analysis of user-generated content (like Twitter or Reddit) to explore how public opinion links to cryptocurrency market movements.

  • Topic modeling of corporate and official documents (like whitepapers and filings) to track blockchain adoption, trends, and classifications.

Five major research themes emerge:

  1. Relationship discovery (linking text and market behavior).

  2. Cryptocurrency performance prediction.

  3. Classification and trend analysis.

  4. Crime and regulation.

  5. Public perception of blockchain.

With that big picture, let’s look at the tools researchers actually use.


1. Data Sources & Collection Platforms

Text analysis in blockchain starts with the right data. Researchers pull information from corporate disclosures, community discussions, and media reports. These sources shape the type of insights one can generate — from investor sentiment to regulatory trends.

  • Whitepaper aggregators: ICOHolder, ICOMarks, ICORatings, ICODrops, FoundICO, CryptoCompare.

  • Community forums: Bitcointalk, XRPChat, Ethereum Community Forum.

  • Abuse reports: Bitcoinabuse.org.

  • Crypto news sites: CoinDesk, Cointelegraph, NewsBTC, FXStreet (crypto), CryptoCompare (news), CryptoCoin.News.

  • General disclosures: SEC EDGAR (10-Ks, calls), Patents: USPTO/EPO, Jobs: recruitment sites, Terms of Service pages.

  • Social media: Twitter/X, Sina Weibo, Stocktwits.

  • Developer / Q&A / chat: Reddit, GitHub, Telegram, StackExchange, Discord; illicit topics on HackForums.

  • App reviews: App Store reviews.

  • Web-scale monitoring: Webz.io, Notified, OpView Social Listening Platform.

  • News terminals/APIs: Nexis, Refinitiv Eikon, NewsAPI, RavenPack; newspapers (Financial Times, The Economist, WSJ).

Academic corpora: Web of Science, Scopus, Google Scholar, ScienceDirect, IEEE Xplore, ACM DL, JSTOR, SSRN, Business Source Premier.


2. Off-the-Shelf Analytics Tools

When raw coding is too costly, researchers rely on ready-made NLP suites. These platforms lower the barrier to advanced analytics and provide quick sentiment, entity, or topic extraction.

  • Crimson Hexagon (social sentiment)

  • Semantria (Lexalytics)

  • MeaningCloud

  • Stanford CoreNLP

  • OPView

  • RavenPack (news analytics)

  • Leximancer


3. Sentiment Lexicons & Packages

To measure optimism or fear in blockchain markets, lexicon-based approaches remain popular. These dictionaries assign sentiment scores to words and are especially useful for social media and financial text.

  • VADER (social media-friendly)

  • TextBlob

  • SentiStrength

  • SentiWordNet

  • Bing Liu lexicon

  • AFINN

  • Loughran–McDonald (LM, finance-specific)

  • Harvard-IV General Purpose Psych Dictionary

  • Henry’s Finance Dictionary

  • qdap (Quantitative Discourse Analysis in R)

  • Pattern (CLiPS library)

  • Sentimentr (R package)

  • Crypto-specific wordlists, emoji-aware lexicons, and a Chinese crypto sentiment dictionary.


4. Emotion Resources

Beyond “positive” or “negative,” emotion resources capture nuances like fear, trust, or anticipation. These help researchers explore deeper market psychology in blockchain adoption.

  • NRC-VAD Emotion Lexicon

  • NRC Word–Emotion Association Lexicon

  • Text2Emotion


5. Feature Extraction & Representations

Before analysis, raw text must be turned into structured data. Feature extraction reduces noise and represents language in ways computers can understand.

  • Count-based: Bag-of-Words (BoW), N-gram, TF-IDF, DDPWI (paper-proposed).

  • Embeddings: Word2Vec, Doc2Vec, GloVe, FastText.

  • Specialized: AffectiveTweets (Weka), A-BiRNN (proposed), SBERT.


6. Topic Modeling & Trend Mining

Blockchain evolves quickly; topic modeling uncovers hidden themes and tracks shifts over time. Researchers use these tools to detect trends in adoption, policy, or hype cycles.

  • LDA (Latent Dirichlet Allocation)

  • DTM (Dynamic Topic Models)

  • SentLDA

  • Joint/Sentiment–Topic Models

  • Topic-Sentiment LDA

  • NMF (Nonnegative Matrix Factorization)

  • Anchored Correlation Explanation

  • W2V-LSA (Word2Vec-based Latent Semantic Analysis, proposed)

  • Leximancer


7. Similarity, Clustering & Classifiers

To classify whitepapers, cluster forum discussions, or detect fraud, machine learning models come into play. These range from simple similarity metrics to deep learning.

  • Similarity: Cosine, Jaccard, SBERT.

  • Clustering: K-means, DBSCAN.

  • Classifiers: CatBoost, Random Forest, XGBoost, Naive Bayes, SVM, Neural Networks, LSTM/BiLSTM, BERT, plus custom constructs (Voting-included Algorithm, Sentiment Graph).


8. Readability & Miscellaneous Tools

Finally, readability and general-purpose utilities help evaluate how accessible or complex blockchain texts are. These metrics are valuable in ICO whitepaper analysis or adoption studies.

  • Readability indices: Flesch-Kincaid, Dale–Chall, Gunning Fog, ARI, SMOG, Coleman–Liau, Linsear Write.

  • Other utilities: AWS Blockchain Template, Google Knowledge Graph (used for network/context analysis).


Reference

Zhuo, X., Irresberger, F., & Bostandzic, D. (2024). How are texts analyzed in blockchain research? A systematic literature review. Financial Innovation, 10(1), 60. https://doi.org/10.1186/s40854-023-00501-6


ree

Comments


bottom of page