POP data - where does it come from?

“Writing is a movement of imagination. On the page we take our readers to places, often to situations where our research has led us, to things we have seen and people we have listened to.”

– Les Back

This blog documents the developing POP project and the team’s work-in-progress analysis of 200 years of global clothing patent data. We aim to share and show our workings and thinkings. We will ask questions, show examples and share initial ideas and point to POP events, activities and publications.

We start with a piece on where POP data comes from.

Work Package 1 (WP1) lays the ground work for the POP project with the generation and analysis of a dataset of 200 years of global clothing patents. While it is not our only source of data, it is by far the largest body of material we are working with. The POP dataset is currently topping 297,654 files.

POP starts on a very big scale and gradually gets refined – our focus and analysis thickens, becomes layered and textured. Prior to starting the project, I would have added ‘embodied’ to this list as a later aspect of the project (in Work Package 3 – WP3 – we start to make and wear historic clothing inventions), but we have discovered in practice, and discuss further in a following post, just how embodied quantitative analysis of big data analysis can be.

Data takes multiple forms in POP: (WP1) quantitative analysis of global patent datasets, (WP2) qualitative analysis of emerging key themes using full detailed and illustrated patents, (WP3) ethnographic fieldnotes on the making and wearing of 50 x full sized garments reconstructed from the patents and (WP4) 50 x interviews with inventors. More about the project structure and methods are here. We started the project in March 2019 – this blog post focuses on developments in WP1, and overlaps into WP2.

But first, why patents?

Patents turn ideas into a legal form of property – that is recognisable, defensible, consumable, sellable.

‘A patent turns an idea into a form of property; the person who has a new idea, a patent asserts, can own it in the same way that he or she may own land or money’ (Schwartz Cowan 1997, 120).

Why are they interesting to social scientists?

Inventions are considered central by many to ideas around modernity and progress. Patenting is a key way of studying inventions. Even though not all great ideas are recognized as inventions, not all inventors have had access to patent privilege and not all inventions are patented, patent archives provide a means through which we can systematically map and examine transnational invention over time and in relation to socio-political happenings.

Inventors tell us about themselves in their patents – name, address, self-identified vocation and or married status – they identify a problem and then tell us in detail how they have attempted to solved it – who it is for, what materials and technology it involves, who they expect to use it, where and the potential uses.

Patents provide ways to glimpse sociotechnical imaginaries. Inventors draw on the past in order to claim something in the present and imagine the future. Jasanoff and Kim define sociotechnical imaginaries as ‘collectively held, institutionally stabilized, and publicly performed visions of desirable futures, animated by shared understandings of forms of social life and social order attainable though, and supportive of, advances in science and technology’ (2015, 2).

Patents tell different stories over time. Clothing patents are particularly good record of lesser-known inventors activities, who have focused on more ordinary and mundane things. (Not just famous – big, loud or heroic stories.) Patents get us closer to people who lived over a century ago. They give voice to groups often absent or silenced in the histories of technology. Writing about women patentees, she argues, that patents ‘provide a continuous source of information about market-related activities of women’ (2000, 163). Patents also fill gaps in other data sources, such as domestic technology records where ‘the paucity of relevant data in an era when women were rendered “invisible” by legal and social conventions’ (ibid).

Patents also provide step-by-step instructions for future users to replicate their ideas – thereby providing a means to re-engage three dimensionally with artefacts no longer available. This is central to Work Package 3 where we make and wear a collection of historic garments from the archives.

How are we accessing them?

In order to examine a data set of global clothing patents archived from 1820-2020, we started our search for patents in one of the European Patent Office (EPO) Worldwide Patent Statistical Database, PATSTAT GLOBAL. The EPO explains:

PATSTAT Global contains bibliographical data relating to more than 100 million patent documents from leading industrialised and developing countries. It also includes the legal event data from more than 40 patent authorities contained in the EPO worldwide legal event data (INPADOC).

Accessing PATSTAT data assumes a general (and developing) knowledge of Structured Query Language (SQL) to run queries. This can be intimidating at first, but the EPO have several useful guidebooks and de Rassenfosse et al (2014, 2017) provide a ‘Patstat cookbook’ to assist new users.

We sought patents classified via the International Patent Classification/ Cooperative Patent Classification (IPC/CPC) system for “A41 – Wearing Apparel”. IPCs are standards adopted by countries worldwide to organise, structure information and make patent data searchable. A useful IPC/CPC guide is here.

We included:

A41B– Shirts; Underwear; Baby linen; Hankerchiefs
A41C– Corsets; Brassieres
A41D– Outerwear; Protective Garments; Accessories
A41F– Garment accessories; Suspenders

We mostly excluded (with the exception of masks):

A41G– Artificial flowers; wigs; masks; feathers

We excluded:

A41H– Appliances or methods for making clothes e.g for dress-making, for tailoring, not ptherwise provided for (machines, appliances, or methods for making particular articles of apparel, see the relevant groups for these articles in A41B – A41F; cutting tools or machines in general B26; weaving, braiding, lace-making, knitting, tufting, treating of textiles D03 – D06; sewing-machines, sewing appliances, seam-ripping devices D05B; cutting or otherwise severing textile materials D06H7/00)

The patent data from PATSTAT can be downloaded as PDFs or in rich text format (RTF). We chose the latter for this Work Package, as each RTF includes patent number, link to PDF, year, country, title and abstract. While the PDF form is much more detailed and lengthier (and includes illustrations/diagrams), for this WP we wanted truncated data so to be able to conduct quantitative analysis of a big data set. Much more in-depth analysis of a smaller body of PDF patents is undertaken in Work Package 2 (WP2).

Example of abstracted patent data:
————————————

YEAR folder: 1933
COUNTRY folder: DE
RFT file name: DE L0079790 D 19311114.rtf

————————————

Title (en): Swimsuit with an inflatable bag attached only to the back
Application: DE L0079790 D 19311114
Publication: DE 560514 C 19330822
IPC: A 41D 7/00 (2006.01)

The invention relates to a swimsuit with an inflatable bag, which is only attached to the back and is stiffened by internal curved ribs. It has already been proposed to place inflatable bags in swimsuits around half the height of the upper body around the back and to connect them with the front parts over the body.

The present invention now fulfils all of these requirements in that the upper portion of the pouch covering the back to the armpits is pulled down deeply in the part lying between the shoulder blades and extends in an accurate manner on the sides of the pouch to suit the arms. This has the further advantage that moving the bag upward when swimming is excluded in a simple manner.

The invention has the further advantage that the bag can be made very small, which saves weight and is not uncomfortable to carry, so that there are economic advantages in addition to the technical.

What are we doing with the data?

The aim of WP1 is to quantitatively identify key themes emerging in the patents across 200 years, and to trace common or divergent trends across the globe. We are conducting in-depth critical analysis of these themes and related research materials in WP2.

Paul, POPs Research Fellow, has developed a methodology for statistical clustering of large quantities of qualitative data to detect and classify underlying discourse and narrative structures (Stoneman, Sturgis, and Allum 2013). Rather than using the clustering techniques built into Computer Assisted Qualitative Data Analysis Software (CAQDAS) packages systems, he has been writing customized code using ‘R’ software to match the challenges of this unique dataset. More on this in future blog posts…..

What’s next?

As many researchers will attest, big data research is hard work. This is not only the result of the scale of material but the constant decisions, physical labour and investment of time. Archives are never static and are rarely perfectly formed datasets ready to be used. The EPO PATSTAT database is an amazing resource of an enormous amount of global data. The POP team (and additional helpers) worked for months to download, categorise, organise, clean, format, add missing titles and abstracts, read and translate data – as it comes from over 95 countries. The resulting POP dataset is a unique archive of global clothing patents, with currently more complete, cleaned and translated into a single language patent data and with easier access and search functionality than any other source. More about the experience is in the next post – Learning, Living and Feeling with Big Data

References

Back, Les. (n.d) Take your reader there: some notes on writing qualitative research, Anthropology Dept: Writing across boundaries, Writing on Writing, Durham University, Accessed at: https://www.dur.ac.uk/writingacrossboundaries/writingonwriting/lesback/

de Rassenfosse, Gaétan and Dernis, Kracker, Martin and Tarasconi, Gianluca. 2017. Getting started with PATSTAT register. Data Survey. The Australian Economic Review, 50 (1): 110–20. Available at https://onlinelibrary.wiley.com/doi/pdf/10.1111/1467-8462.12214

de Rassenfosse, Gaétan and Dernis, Hélène and Boedt, Geert. 2014. An Introduction to the Patstat Database with Example Queries, Available at SSRN: https://ssrn.com/abstract=2418745 or http://dx.doi.org/10.2139/ssrn.2418745

Jasanoff, Sheila, and Sang-Hyun Kim. 2015. Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power. University of Chicago Press.

Schwartz-Cowan, R. 1997. A Social History of American Technology. Oxford University Press.

Stoneman, P, Sturgis, P. and Allum, P. 2013. Exploring public discourses about emerging technologies through statistical clustering of open-ended survey questions, Public Understanding of Science, 22(7): 850-868.

Zorina Khan, B. 2000. ‘“Not for Ornament”: Patenting Activity by Nineteenth-Century Women Inventors’. Journal of Interdisciplinary History, 31(2): 159–95.

[1]It was established by the Strasbourg Agreement in 1971 and entered into practice in 1975.

POP data – where does it come from?

“Writing is a movement of imagination. On the page we take our readers to places, often to situations where our research has led us, to things we have seen and people we have listened to.”

But first, why patents?

Why are they interesting to social scientists?

How are we accessing them?

What are we doing with the data?

What’s next?