Tools for Content Analysis in Blockchain Studies
- Wuxia (Amy) Bao

- Sep 30
- 3 min read
Updated: Oct 12

The State of the Art
Blockchain isn’t just about numbers, transactions, or cryptographic protocols. Increasingly, researchers look at text data—whitepapers, tweets, news, forum discussions, and even academic papers—to understand how blockchain projects develop, how investors react, and how the technology is perceived.
A recent systematic review of 124 studies (Zhuo et al., 2024) shows that text analysis in blockchain research is booming. Two dominant approaches stand out:
Sentiment analysis of user-generated content (like Twitter or Reddit) to explore how public opinion links to cryptocurrency market movements.
Topic modeling of corporate and official documents (like whitepapers and filings) to track blockchain adoption, trends, and classifications.
Five major research themes emerge:
Relationship discovery (linking text and market behavior).
Cryptocurrency performance prediction.
Classification and trend analysis.
Crime and regulation.
Public perception of blockchain.
With that big picture, let’s look at the tools researchers actually use.
1. Data Sources & Collection Platforms
Text analysis in blockchain starts with the right data. Researchers pull information from corporate disclosures, community discussions, and media reports. These sources shape the type of insights one can generate — from investor sentiment to regulatory trends.
Whitepaper aggregators: ICOHolder, ICOMarks, ICORatings, ICODrops, FoundICO, CryptoCompare.
Community forums: Bitcointalk, XRPChat, Ethereum Community Forum.
Abuse reports: Bitcoinabuse.org.
Crypto news sites: CoinDesk, Cointelegraph, NewsBTC, FXStreet (crypto), CryptoCompare (news), CryptoCoin.News.
General disclosures: SEC EDGAR (10-Ks, calls), Patents: USPTO/EPO, Jobs: recruitment sites, Terms of Service pages.
Social media: Twitter/X, Sina Weibo, Stocktwits.
Developer / Q&A / chat: Reddit, GitHub, Telegram, StackExchange, Discord; illicit topics on HackForums.
App reviews: App Store reviews.
Web-scale monitoring: Webz.io, Notified, OpView Social Listening Platform.
News terminals/APIs: Nexis, Refinitiv Eikon, NewsAPI, RavenPack; newspapers (Financial Times, The Economist, WSJ).
Academic corpora: Web of Science, Scopus, Google Scholar, ScienceDirect, IEEE Xplore, ACM DL, JSTOR, SSRN, Business Source Premier.
2. Off-the-Shelf Analytics Tools
When raw coding is too costly, researchers rely on ready-made NLP suites. These platforms lower the barrier to advanced analytics and provide quick sentiment, entity, or topic extraction.
Crimson Hexagon (social sentiment)
Semantria (Lexalytics)
MeaningCloud
Stanford CoreNLP
OPView
RavenPack (news analytics)
Leximancer
3. Sentiment Lexicons & Packages
To measure optimism or fear in blockchain markets, lexicon-based approaches remain popular. These dictionaries assign sentiment scores to words and are especially useful for social media and financial text.
VADER (social media-friendly)
TextBlob
SentiStrength
SentiWordNet
Bing Liu lexicon
AFINN
Loughran–McDonald (LM, finance-specific)
Harvard-IV General Purpose Psych Dictionary
Henry’s Finance Dictionary
qdap (Quantitative Discourse Analysis in R)
Pattern (CLiPS library)
Sentimentr (R package)
Crypto-specific wordlists, emoji-aware lexicons, and a Chinese crypto sentiment dictionary.
4. Emotion Resources
Beyond “positive” or “negative,” emotion resources capture nuances like fear, trust, or anticipation. These help researchers explore deeper market psychology in blockchain adoption.
NRC-VAD Emotion Lexicon
NRC Word–Emotion Association Lexicon
Text2Emotion
5. Feature Extraction & Representations
Before analysis, raw text must be turned into structured data. Feature extraction reduces noise and represents language in ways computers can understand.
Count-based: Bag-of-Words (BoW), N-gram, TF-IDF, DDPWI (paper-proposed).
Embeddings: Word2Vec, Doc2Vec, GloVe, FastText.
Specialized: AffectiveTweets (Weka), A-BiRNN (proposed), SBERT.
6. Topic Modeling & Trend Mining
Blockchain evolves quickly; topic modeling uncovers hidden themes and tracks shifts over time. Researchers use these tools to detect trends in adoption, policy, or hype cycles.
LDA (Latent Dirichlet Allocation)
DTM (Dynamic Topic Models)
SentLDA
Joint/Sentiment–Topic Models
Topic-Sentiment LDA
NMF (Nonnegative Matrix Factorization)
Anchored Correlation Explanation
W2V-LSA (Word2Vec-based Latent Semantic Analysis, proposed)
Leximancer
7. Similarity, Clustering & Classifiers
To classify whitepapers, cluster forum discussions, or detect fraud, machine learning models come into play. These range from simple similarity metrics to deep learning.
Similarity: Cosine, Jaccard, SBERT.
Clustering: K-means, DBSCAN.
Classifiers: CatBoost, Random Forest, XGBoost, Naive Bayes, SVM, Neural Networks, LSTM/BiLSTM, BERT, plus custom constructs (Voting-included Algorithm, Sentiment Graph).
8. Readability & Miscellaneous Tools
Finally, readability and general-purpose utilities help evaluate how accessible or complex blockchain texts are. These metrics are valuable in ICO whitepaper analysis or adoption studies.
Readability indices: Flesch-Kincaid, Dale–Chall, Gunning Fog, ARI, SMOG, Coleman–Liau, Linsear Write.
Other utilities: AWS Blockchain Template, Google Knowledge Graph (used for network/context analysis).
Reference
Zhuo, X., Irresberger, F., & Bostandzic, D. (2024). How are texts analyzed in blockchain research? A systematic literature review. Financial Innovation, 10(1), 60. https://doi.org/10.1186/s40854-023-00501-6


Comments