Skip to content

feat: update all fixtures and custom parsers to match#713

Merged
sdoire merged 26 commits into
mainfrom
fixture-and-custom-extractor-updates
Dec 13, 2022
Merged

feat: update all fixtures and custom parsers to match#713
sdoire merged 26 commits into
mainfrom
fixture-and-custom-extractor-updates

Conversation

@sdoire

@sdoire sdoire commented Nov 10, 2022

Copy link
Copy Markdown
Contributor

This PR includes the majority of changes from the original move-fixtures PR. It includes everything but the changes to the scripts, which will be done in a future PR. From the original PR:

"This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.

Finally, all fixtures have been updated."

In addition, the updating of fixtures exposed that a number of custom extractors are now out of sync with the current website layouts for their domains, so all of those have been updated so that the custom extractor unit tests are now passing again.

postlight-org and others added 17 commits September 14, 2022 12:37
This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.

Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.

Finally, all fixtures have been updated.
@sdoire sdoire force-pushed the fixture-and-custom-extractor-updates branch from 69f79b0 to 9dec49e Compare November 10, 2022 22:43
@sdoire sdoire marked this pull request as ready for review November 10, 2022 22:48
@sdoire sdoire self-assigned this Nov 28, 2022
@flbn

flbn commented Dec 6, 2022

Copy link
Copy Markdown
Contributor

wondering how i can help you close this ticket so we can move onto the second half of the PR?

@sdoire

sdoire commented Dec 6, 2022

Copy link
Copy Markdown
Contributor Author

@flbn Thanks for finding that issue with the .DS_Store! I could use a second set of eyes to do a review of the PR and do a formal approval (if it looks OK to you, of course). I know it's a lot to review, but it's mostly just updated fixtures, alongside updates of custom parsers and their related unit tests. If you have the bandwidth to take a review on, that would be amazing, but I totally get it if you have too much other stuff going on!

@flbn

flbn commented Dec 6, 2022

Copy link
Copy Markdown
Contributor

can definitely give it a look over!

Comment thread src/mercury.test.js
Comment thread src/extractors/generic/next-page-url/scoring/score-links.test.js
Comment thread src/extractors/generic/next-page-url/extractor.test.js
Comment thread src/extractors/custom/mashable.com/index.test.js
Comment thread src/extractors/custom/ma.ttias.be/index.test.js
Comment thread src/extractors/custom/jvndb.jvn.jp/index.test.js
Comment thread src/extractors/custom/japan.zdnet.com/index.test.js
Comment thread src/extractors/custom/japan.cnet.com/index.test.js
Comment thread src/extractors/custom/ici.radio-canada.ca/index.test.js
Comment thread src/extractors/custom/hellogiggles.com/index.test.js
Comment thread src/extractors/custom/gothamist.com/index.test.js
Comment thread src/extractors/custom/github.com/index.test.js
Comment thread src/extractors/custom/getnews.jp/index.test.js
Comment thread src/extractors/custom/genius.com/index.test.js
Comment thread src/extractors/custom/forward.com/index.test.js
Comment thread src/extractors/custom/fortune.com/index.test.js
Comment thread src/extractors/custom/www.nationalgeographic.com/index.test.js

@flbn flbn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits and one file that appears to be unused but i'd say this has had a pretty thorough look over 😅 😅

@flbn

flbn commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

the name of the file fixtures/www.washingtonpost.com/1480364838420.html was changed to fixtures/www.vulture.com--content-test.html, but that file seems to never be used?

@flbn

flbn commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

all references to deleted files were removed 👍

@flbn

flbn commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

all references to files that were renamed are correctly referenced to 👍

@flbn

flbn commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

all files added are referenced in their respective extractors/tests 👍

@sdoire

sdoire commented Dec 12, 2022

Copy link
Copy Markdown
Contributor Author

the name of the file fixtures/www.washingtonpost.com/1480364838420.html was changed to fixtures/www.vulture.com--content-test.html, but that file seems to never be used?

This is showing up as being changed to fixtures/www.washingtonpost.com--other.html for me. fixtures/www.vulture.com--content-test.html is being used for a unit test in score-content.test.js.

Screen Shot 2022-12-12 at 1 38 59 PM

@flbn

flbn commented Dec 12, 2022

Copy link
Copy Markdown
Contributor

This is showing up as being changed to fixtures/www.washingtonpost.com--other.html for me. fixtures/www.vulture.com--content-test.html is being used for a unit test in score-content.test.js.

ahh yes, that's what i meant! the file fixtures/www.washingtonpost.com--other.html is created but i couldn't find it used anywhere (like you mentioned, a similar file fixtures/www.vulture.com--content-test.html is used for a unit test, but not the washington post alternative)

@sdoire

sdoire commented Dec 12, 2022

Copy link
Copy Markdown
Contributor Author

This is showing up as being changed to fixtures/www.washingtonpost.com--other.html for me. fixtures/www.vulture.com--content-test.html is being used for a unit test in score-content.test.js.

ahh yes, that's what i meant! the file fixtures/www.washingtonpost.com--other.html is created but i couldn't find it used anywhere (like you mentioned, a similar file fixtures/www.vulture.com--content-test.html is used for a unit test, but not the washington post alternative)

Ah, I see now! That unused fixture has been removed. Thanks!

@sdoire sdoire merged commit c0364ec into main Dec 13, 2022
@sdoire sdoire deleted the fixture-and-custom-extractor-updates branch December 13, 2022 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants