Details of Analysis
Since citation networks fundamentally encode the citation relationships between a publication and its references, the references comprise the data of the analysis. The citation data, then, must be located, extracted, and imported into network analysis software before the analytic process can be undertaken. Citation network software then can be used to create the citation maps which comprise the result of the analytic process.
Choice of Data
For this analysis, I chose to consider the Journal for Research in Mathematics Education (JRME), Educational Studies in Mathematics (ESM), and for the learning of mathematics (flm). These journals were chosen given their position within the field of mathematics education research: JRME claims to be “premier research journal in mathematics education” (JRME Journal Aims) while ESM aims to “[present] new ideas and developments of major importance to those working in the field of mathematical education…[where the] emphasis is on high-level articles which are of more than local or national interest” (ESM Journal Aims). In contrast, flm aims to “promote criticism and evaluation of ideas and procedures current in the field” (flm Journal Aims). I chose flm as one journal for analysis so that I might provide empirical evidence to either support or refute this claim and the intuition of researchers that qualitatively feel as though flm is a different sort of journal. Furthermore, these three journals constitute three of the eight “Major Journals in Mathematics Education” identified in the Compendium for Early Career Researchers in Mathematics Education (Kaiser & Presmeg, 2019).
The choice of which journals were included in this analysis is not neutral. In fact, choosing any number of mathematics education journals, from the start, privileges those journals that specifically claim to be ‘mathematics education’ while simultaneously excluding those any articles published in journals relevant to ‘education at large.’ For example, articles published in the Journal of Philosophy of Education on mathematics education would likely construe a different image of what mathematics education research can be. For the sake of this project, I chose to include the citation data for five decades of the JRME, one decade of ESM, and one decade of flm. These choices enabled me to trace the development of the field across time (five decades of JRME; Chapter 4) and to compare and contrast how the field is construed by each journal (JRME vs. flm vs. ESM; Chapter 7).
Acquisition and Processing of Data
The citation network data includes every reference from every article published in these three journals during these three time periods. There were a few options for acquiring this data: the references could be manually extracted from the published articles (either pdf or html versions depending on the journal offerings), reference information could be extracted from an article database such as JSTOR (jstor.org), or the references could be extracted from an existing database of citation relationships such as Clarivate’s Web of Knowledge (webofknowledge.com). The last option seems quite attractive; however, this method is heavily dependant on the quality of the data within the WoK database. Indeed, WoK only included JRME articles as early as 1986, meaning at least 16 years of data would need to be added manually. Furthermore, 191 of the JRME articles in the WoK were attributed to anonymous authors: a pass through the articles published in the JRME shows this to be inaccurate. Additionally, many articles in WoK only included one of its references with the remainder missing. Given these inadequacies, and the fact that flm is not indexed in the WoK database, WoK was not a suitable data source for this project.
The first option, manually extracting references from pdf or webpage versions of articles would work if a webpage version of the article existed so that the references could be copied and pasted into a spreadsheet, or if a webpage version did not exist, the pdf file would need to be selectable (some pdf files do not have the text stored as text so that it is selectable; older pdf files in particular store the text as a non-selectable image). Early JRME articles are of this non-selectable form, meaning that each reference of nearly 30 years of articles would need to be manually transcribed from a pdf to a spreadsheet.
JSTOR, it turns out, is the most convenient option for several reasons. First, while JSTOR stores the articles for each of JRME, ESM, and flm as pdf files, JSTOR has extracted every reference from every article and lists them on a webpage. For example, going to the JSTOR listing of a JRME article (see Figure 1, left) includes article information (author, year, issue, volume, title, etc.), a link to the pdf file, and a list of the references from the article (sorted alphabetically by author and year as they appear at the end of the article; see Figure 1, right). Even with this text easily copied and pasted from the JSTOR page to a spreadsheet, manually extracting this data would be time prohibitive since the JRME published 1,090 articles between 1970 and 2019, ESM published 445 articles between 2010 and 2016, and flm published 82 articles between 2010 and 2017. Now, these date ranges are not uniform because of the only limitation of using JSTOR: the publishers of ESM and flm each impose an embargo of two and three years, respectively, meaning that articles cannot be added to the JSTOR database until they are two or three years old. Even with this limitation, and since my goal of mapping ESM and flm are to contrast the foci of those journals with the foci of JRME, this missing data is not prohibitive to my goal.
Figure 1. Sample JSTOR article webpage (left), list of references (middle), and source code (right).
Nevertheless, manually copying and pasting the data for 1,617 articles would be time prohibitive. Therefore, I chose to develop a custom software tool to automate the processing of these references. I began by downloading every article’s webpage (1,617 .html files). Then, since the references were each listed in the source text of the html file (see Figure 1, right) in a uniform manner (the block of references is tagged “reference_content” and each reference is tagged “reference_data”), a simple text processing script could extract each reference’s details from the html files and store them into a plaintext file that could be imported into Microsoft Excel. Then, since references, at least when written using APA guidelines, are written in a standardized order (Author(s), year, title, journal, etc.), Microsoft Excel’s formulae were sufficient for splitting apart a complete reference into its component pieces (i.e., author, year, title, etc. were split into separate columns; see Figure 2). Such a spreadsheet was made for each decade slice (JRME 1970s, JRME 1980s, …, ESM 2010s, flm 2010s). Then, the Excel spreadsheet was imported into the citation network software; the details of which are unpacked next.
Figure 2. Sample Excel spreadsheet showing the extracted and split data.
Map Creation: Analysis of Data
The previous section describes the process of acquiring, extracting, and formatting the necessary citation data for a citation network analysis. I begin here by discussing the various options for citation network software before providing a justification for my choice of Gephi. Then, I will describe the algorithms used in this study to layout the citation network maps and to identify the densely connected bubbles of research. Lastly, I will describe my process for naming the research focus of each bubble before turning to the final section of this chapter detailing the static and dynamic map representations that are the result of my analysis. First, however, I provide the flowchart in Figure 3 as a visual summary of the process so far and the steps to be discussed next.
Figure 3. Process flowchart from webpage download to map generation.
The acquisition and processing of data comprised steps 1-5 of Figure 3: (1) identify the decades of interest from JRME, ESM, and flm; (2) download the 1,617 .html files, (3) compile a list of .html filenames, (4) input the list of filename to the JSTORrefextract tool, extract the references for each file, and store the data as a tab-separated plaintext file, and (5) import the plaintext file into Excel. Then, as I outline next, I (6) imported the Excel file into Gephi, (7) ran the ForceAtlas2 algorithm to generate the map layout, (8) applied the Louvain Modularity algorithm to identify research bubbles, and (9) exported static and dynamic representations of the maps.
After the Louvain Modularity algorithm identifies which nodes are in which modularity class, I used a Wordle (Feinberg, 2014), a word cloud generator, on the titles of the references corresponding to each node included in each modularity class. Wordle displays the most frequently occurring words in a random layout with the size of the word proportional to the number of times it occurs. Based on this quantitative information, I use the most frequently occurring words to name each modularity class. One advantage to this approach is that it provides a visual summary of quantitative data: the more frequently a word appears in the titles, it will be larger in the word cloud. This approach, however, excludes words that are absent from the title but may be helpful for the classification of the bubble. For example, an article on students understanding of equality might take a cognitive or sociocultural approach to understanding but those words may be absent from the title.
Two sample wordles are included in Figure 4. The left image is the wordle for cluster 4 of the complete JRME network (1970s-2010s): Arithmetic. The right image is the wordle for cluster 5 of the complete JRME network (1970s-2010s): Problem Solving. See Chapter 3 for a further discussion of the clusters for each decade of the JRME (1970s, 1980s, 1990s, 2000s, 2010s) and the complete JRME network (1970s-2010s).
Figure 4. Wordles for two modularity classes from the JRME 2010s: addition and subtraction (left) and problem solving (right).